In [1]:
import seaborn as sns
import pandas as pd
import numpy as np

http://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html

There comes a time in the life of any data scientist when he or she needs to transform the set of columns in a dataset into rows and vice versa.

This is not a common operation, but it does happen every now and then. Pandas has two set of methods to do this:

* stack and unstack
* pivot and melt

Again these sets of methods basically do the same thing, where stack and unstack are a bit more stable and a bit less powerful. So we are going to go over stack and unstack today.

TODO:http://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#computing-indicator-dummy-variables

In [2]:
tips = sns.load_dataset('tips')
tips.head(3)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3


so let's say you have a dataset like below. It is a bit hard to tell the number of male and female visitors by looking at it, and you might want to do more columnwise operations over just the male data.

In [89]:
tips_gb = tips.groupby(['day', 'sex'])[['total_bill']].count()
tips_gb

Unnamed: 0_level_0,Unnamed: 1_level_0,total_bill
day,sex,Unnamed: 2_level_1
Thur,Male,30
Thur,Female,32
Fri,Male,10
Fri,Female,9
Sat,Male,59
Sat,Female,28
Sun,Male,58
Sun,Female,18


So what you might want to do is take the values in the column sex and make them into column:

In [93]:
tips_us = tips_gb.unstack()
tips_us

Unnamed: 0_level_0,total_bill,total_bill
sex,Male,Female
day,Unnamed: 1_level_2,Unnamed: 2_level_2
Thur,30,32
Fri,10,9
Sat,59,28
Sun,58,18


In [92]:
# you could do the same with the days of the week
tips_gb[['total_bill']].unstack(0)

Unnamed: 0_level_0,total_bill,total_bill,total_bill,total_bill
day,Thur,Fri,Sat,Sun
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Male,30,10,59,58
Female,32,9,28,18


The problem is that now we have this odd new object as the columns:

In [94]:
tips_us.columns

MultiIndex(levels=[['total_bill'], ['Male', 'Female']],
           codes=[[0, 0], [0, 1]],
           names=[None, 'sex'])

And while you can do things with it:

In [96]:
tips_us[[('total_bill', 'Male')]]

Unnamed: 0_level_0,total_bill
sex,Male
day,Unnamed: 1_level_2
Thur,30
Fri,10
Sat,59
Sun,58


I find it a bit annoying to memorize a separate set of syntax, so I always convert it with a line of code like so (ps I wish this were in pandas core):

In [97]:
tips_us.columns = ['__'.join(col).strip() for col in y.columns.values]

In [98]:
tips_us

Unnamed: 0_level_0,total_bill__Male,total_bill__Female
day,Unnamed: 1_level_1,Unnamed: 2_level_1
Thur,30,32
Fri,10,9
Sat,59,28
Sun,58,18


You can of course repeat that operation as many times as you need to get the desired granularity of columns. 

But now let's try out the reverse operation. This is useful if somebody gives you data in pivot form:

In [99]:
tips_us.stack()

day                     
Thur  total_bill__Male      30
      total_bill__Female    32
Fri   total_bill__Male      10
      total_bill__Female     9
Sat   total_bill__Male      59
      total_bill__Female    28
Sun   total_bill__Male      58
      total_bill__Female    18
dtype: int64

Notice that this puts us into a series (because we no longer have multiple columns. But if you have multi index columns this operation will take you into a dataframe. 

That is about it when it comes to stacking and unstacking. Anything you can do with melting and pivoting can be done with stacking and unstacking. Let's do a single example from pandas:

In [105]:
cheese = pd.DataFrame({'first': ['John', 'Mary'],
                        'last': ['Doe', 'Bo'],
                        'height': [5.5, 6.0],
                        'weight': [130, 150]})
cheese

Unnamed: 0,first,last,height,weight
0,John,Doe,5.5,130
1,Mary,Bo,6.0,150


In [106]:
cheese.melt(id_vars=['first', 'last'])

Unnamed: 0,first,last,variable,value
0,John,Doe,height,5.5
1,Mary,Bo,height,6.0
2,John,Doe,weight,130.0
3,Mary,Bo,weight,150.0


To do this with stacking we just need to do it in two steps:

In [107]:
cheese.set_index(['first', 'last'], inplace=True)
cheese.stack()

first  last        
John   Doe   height      5.5
             weight    130.0
Mary   Bo    height      6.0
             weight    150.0
dtype: float64

I have used melt and pivot before, but after getting a better understanding of stack and unstack I have found them more versitile and stable than the former. So why learn both!