# 12 Pandas Pivot Tables

* pivot and crosstab examples

In [1]:
import numpy as np
import pandas as pd

import random

### Pivot

Create a random dataset.

In [2]:
size = 200
data = {
    "date": pd.date_range("2000-01-01", periods=size),
    "name": [random.choice('ABCD') for _ in range(size)],
    "value": np.random.uniform(10, size=size),
}

In [3]:
df = pd.DataFrame(data)

In [4]:
df.head()

Unnamed: 0,date,name,value
0,2000-01-01,B,2.737759
1,2000-01-02,A,1.917634
2,2000-01-03,A,7.722935
3,2000-01-04,D,9.493217
4,2000-01-05,A,4.799974


Selecting rows by column value.

In [5]:
df[df['name'] == 'A'].head()

Unnamed: 0,date,name,value
1,2000-01-02,A,1.917634
2,2000-01-03,A,7.722935
4,2000-01-05,A,4.799974
5,2000-01-06,A,7.86244
9,2000-01-10,A,6.713501


Suppose, we want to work with names directly. We could create new columns for each name and move the date to the index.

In [6]:
df.pivot(index='date', columns='name', values='value')

name,A,B,C,D
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2000-01-01,,2.737759,,
2000-01-02,1.917634,,,
2000-01-03,7.722935,,,
2000-01-04,,,,9.493217
2000-01-05,4.799974,,,
2000-01-06,7.862440,,,
2000-01-07,,3.139803,,
2000-01-08,,,,3.828047
2000-01-09,,,,8.265365
2000-01-10,6.713501,,,


The pivot function might result in an hierarchical index.

### Stack and unstack

In [7]:
df = pd.DataFrame(np.random.randn(10, 10), index=list('abcdefghij'))

In [8]:
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
a,0.027492,0.080971,-0.777725,0.932309,-0.210329,0.317821,-1.074687,1.937847,0.122263,0.432749
b,-1.099883,-1.342841,-0.382474,-1.076681,-1.128104,-0.492914,-0.23597,-0.823993,0.467516,2.647605
c,2.304245,1.560943,-1.14939,0.323775,0.338084,-0.696035,-0.709139,-1.834803,-0.202435,-0.365455
d,-0.449839,0.358916,0.364363,-0.942727,0.613209,0.29936,-0.128331,-2.198496,-0.584545,1.206864
e,0.015615,0.398587,1.066957,0.28139,-1.863451,-0.392679,-0.530881,0.408764,-0.127198,-1.364809
f,-1.235403,1.7272,-0.120263,0.549897,-1.520097,0.591054,0.196807,-0.430146,0.63542,-0.826591
g,-0.470555,-1.175253,1.276342,1.387625,-0.813752,0.336275,0.638341,-1.918054,1.202335,0.767233
h,-1.518743,-0.106998,-0.152967,1.988816,0.356198,0.792416,0.292287,0.162272,-0.592675,-1.813356
i,2.207042,0.439085,0.566616,-0.413137,-0.222956,1.321255,0.928263,-0.925733,-0.314538,0.244104
j,0.040745,0.023125,-0.294541,-0.005384,1.774168,-1.108982,-0.259427,-2.117126,-1.389757,-1.018322


With stack, we can take a data frame and create a series representation, which uses an hierarchical index. The original index becomes the outermost level, wheres the original columns are stacked on the second index level.

In [9]:
df.stack()

a  0    0.027492
   1    0.080971
   2   -0.777725
   3    0.932309
   4   -0.210329
   5    0.317821
   6   -1.074687
   7    1.937847
   8    0.122263
   9    0.432749
b  0   -1.099883
   1   -1.342841
   2   -0.382474
   3   -1.076681
   4   -1.128104
   5   -0.492914
   6   -0.235970
   7   -0.823993
   8    0.467516
   9    2.647605
c  0    2.304245
   1    1.560943
   2   -1.149390
   3    0.323775
   4    0.338084
   5   -0.696035
   6   -0.709139
   7   -1.834803
   8   -0.202435
   9   -0.365455
          ...   
h  0   -1.518743
   1   -0.106998
   2   -0.152967
   3    1.988816
   4    0.356198
   5    0.792416
   6    0.292287
   7    0.162272
   8   -0.592675
   9   -1.813356
i  0    2.207042
   1    0.439085
   2    0.566616
   3   -0.413137
   4   -0.222956
   5    1.321255
   6    0.928263
   7   -0.925733
   8   -0.314538
   9    0.244104
j  0    0.040745
   1    0.023125
   2   -0.294541
   3   -0.005384
   4    1.774168
   5   -1.108982
   6   -0.259427
   7   -2.1171

Similarly, the reverse is also possible.

In [10]:
df.stack().unstack()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
a,0.027492,0.080971,-0.777725,0.932309,-0.210329,0.317821,-1.074687,1.937847,0.122263,0.432749
b,-1.099883,-1.342841,-0.382474,-1.076681,-1.128104,-0.492914,-0.23597,-0.823993,0.467516,2.647605
c,2.304245,1.560943,-1.14939,0.323775,0.338084,-0.696035,-0.709139,-1.834803,-0.202435,-0.365455
d,-0.449839,0.358916,0.364363,-0.942727,0.613209,0.29936,-0.128331,-2.198496,-0.584545,1.206864
e,0.015615,0.398587,1.066957,0.28139,-1.863451,-0.392679,-0.530881,0.408764,-0.127198,-1.364809
f,-1.235403,1.7272,-0.120263,0.549897,-1.520097,0.591054,0.196807,-0.430146,0.63542,-0.826591
g,-0.470555,-1.175253,1.276342,1.387625,-0.813752,0.336275,0.638341,-1.918054,1.202335,0.767233
h,-1.518743,-0.106998,-0.152967,1.988816,0.356198,0.792416,0.292287,0.162272,-0.592675,-1.813356
i,2.207042,0.439085,0.566616,-0.413137,-0.222956,1.321255,0.928263,-0.925733,-0.314538,0.244104
j,0.040745,0.023125,-0.294541,-0.005384,1.774168,-1.108982,-0.259427,-2.117126,-1.389757,-1.018322
