### stack() and unstack()
- stack() and unstack() are very helpful, especially when used in conjunction with group by.
- The stack function allows you to move the inner columns to the rows for the dataframe (i.e. The stack function helps you to reshape the dataframe.)
- the unstack function does the reverse.

In [1]:
import pandas as pd
oo = pd.read_csv('data/olympics.csv',skiprows=4)
oo.head()

Unnamed: 0,City,Edition,Sport,Discipline,Athlete,NOC,Gender,Event,Event_gender,Medal
0,Athens,1896,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100m freestyle,M,Gold
1,Athens,1896,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100m freestyle,M,Silver
2,Athens,1896,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100m freestyle for sailors,M,Bronze
3,Athens,1896,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100m freestyle for sailors,M,Gold
4,Athens,1896,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100m freestyle for sailors,M,Silver


### Athletes taking part in the 2008 olympics in 100m or 200m Track events

In [7]:
men_women = oo[(oo.Edition==2008)&((oo.Event=='100m')|(oo.Event=='200m'))]
men_women.head()

Unnamed: 0,City,Edition,Sport,Discipline,Athlete,NOC,Gender,Event,Event_gender,Medal
27551,Beijing,2008,Athletics,Athletics,"DIX, Walter",USA,Men,100m,M,Bronze
27552,Beijing,2008,Athletics,Athletics,"BOLT, Usain",JAM,Men,100m,M,Gold
27553,Beijing,2008,Athletics,Athletics,"THOMPSON, Richard",TRI,Men,100m,M,Silver
27554,Beijing,2008,Athletics,Athletics,"FRASER, Shelly-ann",JAM,Women,100m,W,Gold
27555,Beijing,2008,Athletics,Athletics,"SIMPSON, Sherone",JAM,Women,100m,W,Silver


#### The next thing we want to do is to group by the country they represent, the gender, the discipline they are taking part in and the event.

In [12]:
group = men_women.groupby(['NOC','Gender','Discipline','Event']).size()
group

NOC  Gender  Discipline  Event
JAM  Men     Athletics   100m     1
                         200m     1
     Women   Athletics   100m     3
                         200m     2
TRI  Men     Athletics   100m     1
USA  Men     Athletics   100m     1
                         200m     2
     Women   Athletics   200m     1
dtype: int64

#### to get the same view as the original dataframe, we need to unstack this

In [14]:
frame = group.unstack(['Discipline','Event'])
frame

Unnamed: 0_level_0,Discipline,Athletics,Athletics
Unnamed: 0_level_1,Event,100m,200m
NOC,Gender,Unnamed: 2_level_2,Unnamed: 3_level_2
JAM,Men,1.0,1.0
JAM,Women,3.0,2.0
TRI,Men,1.0,
USA,Men,1.0,2.0
USA,Women,,1.0


#### we can see that we have the dataframe we want

#### let's look at stacking
### stack()
- When using the stack function, the stack function returns a data frame or a series.
- And notice that you will never have any NaN values or missing data because by default the dropna flag is equal to true.
- The inner levels of a stack function are sorted. So when we do a stack, we are returning a data frame or series with a new innermost level of rows. i.e. stack() pivots a level of the column labels, returning a DataFrame or Series with a new innermost level of row labels. 

In [15]:
#original dataframe
frame

Unnamed: 0_level_0,Discipline,Athletics,Athletics
Unnamed: 0_level_1,Event,100m,200m
NOC,Gender,Unnamed: 2_level_2,Unnamed: 3_level_2
JAM,Men,1.0,1.0
JAM,Women,3.0,2.0
TRI,Men,1.0,
USA,Men,1.0,2.0
USA,Women,,1.0


In [16]:
# let's use stack()
frame.stack()

Unnamed: 0_level_0,Unnamed: 1_level_0,Discipline,Athletics
NOC,Gender,Event,Unnamed: 3_level_1
JAM,Men,100m,1.0
JAM,Men,200m,1.0
JAM,Women,100m,3.0
JAM,Women,200m,2.0
TRI,Men,100m,1.0
USA,Men,100m,1.0
USA,Men,200m,2.0
USA,Women,200m,1.0


#### - So the hundred meters and the two hundred meters which are the inner most columns are dragged down
#### - and if there's 100 or 200 meter that corresponds to each male or female athlete then this is listed.
#### - To help you remember the stack function, I try and visualize whether I want to make this data frame taller or wider.
#### - If I want to make the data frames taller, then I need to use the stack function because stacking gives you a taller data frame.

### Now in the real world, I prefer to actually specify which column I'm going to be using in the stacking function.

In [17]:
frame.stack('Event')

Unnamed: 0_level_0,Unnamed: 1_level_0,Discipline,Athletics
NOC,Gender,Event,Unnamed: 3_level_1
JAM,Men,100m,1.0
JAM,Men,200m,1.0
JAM,Women,100m,3.0
JAM,Women,200m,2.0
TRI,Men,100m,1.0
USA,Men,100m,1.0
USA,Men,200m,2.0
USA,Women,200m,1.0


### and I will get exactly the same dataframe as with frame.stack(). But frame.stack('Event') is a better approach.

### unstack()