# 12 Pandas Pivot Tables

* pivot and crosstab examples

In [1]:
import numpy as np
import pandas as pd

import random

### Pivot

Create a random dataset.

In [2]:
size = 200
data = {
    "date": pd.date_range("2000-01-01", periods=size),
    "name": [random.choice('ABCD') for _ in range(size)],
    "value": np.random.uniform(10, size=size),
}

In [3]:
df = pd.DataFrame(data)

In [4]:
df.head()

Unnamed: 0,date,name,value
0,2000-01-01,B,5.141599
1,2000-01-02,D,6.458034
2,2000-01-03,A,2.181406
3,2000-01-04,D,2.15657
4,2000-01-05,C,8.720442


Selecting rows by column value.

In [5]:
df[df['name'] == 'A'].head()

Unnamed: 0,date,name,value
2,2000-01-03,A,2.181406
5,2000-01-06,A,9.759022
19,2000-01-20,A,2.019657
24,2000-01-25,A,7.158998
38,2000-02-08,A,6.690088


Suppose, we want to work with names directly. We could create new columns for each name and move the date to the index.

In [6]:
df.pivot(index='date', columns='name', values='value')

name,A,B,C,D
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2000-01-01,,5.141599,,
2000-01-02,,,,6.458034
2000-01-03,2.181406,,,
2000-01-04,,,,2.156570
2000-01-05,,,8.720442,
...,...,...,...,...
2000-07-14,,6.408070,,
2000-07-15,2.061491,,,
2000-07-16,7.536712,,,
2000-07-17,,7.518542,,


The pivot function might result in an hierarchical index.

### Stack and unstack

In [7]:
df = pd.DataFrame(np.random.randn(10, 10), index=list('abcdefghij'))

In [8]:
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
a,0.997451,-1.009053,0.598962,0.148184,0.598002,2.167868,0.189185,1.287612,-1.763958,1.046892
b,-0.467487,-1.073405,0.396867,-1.48443,-0.682016,-0.880689,-1.036613,-0.142505,0.503654,0.474992
c,-0.517056,0.825937,-0.018062,0.281883,1.648866,-0.229215,0.099954,-0.0587,-0.491831,0.464925
d,0.403526,1.301329,0.226798,0.697165,1.188251,2.225298,0.130357,-0.215001,1.633281,-1.091777
e,-0.004716,-0.034002,-0.513981,-1.098579,-0.330912,-0.360757,-0.734659,-0.756608,0.22397,-0.452246
f,-0.870159,0.884413,0.300512,-1.98524,-1.063916,1.014156,1.263174,-1.569194,-1.005647,0.265354
g,0.581199,-1.450497,-0.34407,-0.050679,-0.368321,-0.568941,0.119771,1.926526,-0.735789,-0.851045
h,0.665562,-0.799043,-0.767665,0.490498,0.983548,0.365186,-0.745588,0.547748,0.415528,1.249906
i,-0.198836,-0.641619,-1.24798,1.175694,-0.092593,1.157458,-0.553025,-1.215261,-0.123966,-1.160273
j,0.321602,-1.326635,0.697277,0.191497,0.464532,-1.29977,0.492551,0.746299,0.633203,-0.069929


With stack, we can take a data frame and create a series representation, which uses an hierarchical index. The original index becomes the outermost level, wheres the original columns are stacked on the second index level.

In [9]:
df.stack()

a  0    0.997451
   1   -1.009053
   2    0.598962
   3    0.148184
   4    0.598002
          ...   
j  5   -1.299770
   6    0.492551
   7    0.746299
   8    0.633203
   9   -0.069929
Length: 100, dtype: float64

Similarly, the reverse is also possible.

In [10]:
df.stack().unstack()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
a,0.997451,-1.009053,0.598962,0.148184,0.598002,2.167868,0.189185,1.287612,-1.763958,1.046892
b,-0.467487,-1.073405,0.396867,-1.48443,-0.682016,-0.880689,-1.036613,-0.142505,0.503654,0.474992
c,-0.517056,0.825937,-0.018062,0.281883,1.648866,-0.229215,0.099954,-0.0587,-0.491831,0.464925
d,0.403526,1.301329,0.226798,0.697165,1.188251,2.225298,0.130357,-0.215001,1.633281,-1.091777
e,-0.004716,-0.034002,-0.513981,-1.098579,-0.330912,-0.360757,-0.734659,-0.756608,0.22397,-0.452246
f,-0.870159,0.884413,0.300512,-1.98524,-1.063916,1.014156,1.263174,-1.569194,-1.005647,0.265354
g,0.581199,-1.450497,-0.34407,-0.050679,-0.368321,-0.568941,0.119771,1.926526,-0.735789,-0.851045
h,0.665562,-0.799043,-0.767665,0.490498,0.983548,0.365186,-0.745588,0.547748,0.415528,1.249906
i,-0.198836,-0.641619,-1.24798,1.175694,-0.092593,1.157458,-0.553025,-1.215261,-0.123966,-1.160273
j,0.321602,-1.326635,0.697277,0.191497,0.464532,-1.29977,0.492551,0.746299,0.633203,-0.069929
