<img src="https://pandas.pydata.org/static/img/pandas.svg" width="250">

## <center> Reshaping Dataframes

In [1]:
import pandas as pd

<img src="https://pandas.pydata.org/pandas-docs/stable/_images/reshaping_pivot.png">

+ Allows you to take a **variable separating your rows and *pivot* that to your columns**.
+ **requires a unique index**
+ **does not support aggregation** (meaning no aggregation method can be used or no aggregation is performed).

--------

In [2]:
df = pd.DataFrame({
    "Region":['North','West','East','South','North','West','East','South'],
    "Team":['One','One','One','One','Two','Two','Two','Two'],
    "Revenue":[7500,5500,2750,6400,2300,3750,1900,575],
    "Cost":[5200,5100,4400,5300,1250,1300,2100,50]
})

In [3]:
df

Unnamed: 0,Region,Team,Revenue,Cost
0,North,One,7500,5200
1,West,One,5500,5100
2,East,One,2750,4400
3,South,One,6400,5300
4,North,Two,2300,1250
5,West,Two,3750,1300
6,East,Two,1900,2100
7,South,Two,575,50


---------

## Total Revenue per each Team per Region

In [4]:
df.pivot(index='Region', columns='Team', values='Revenue')

Team,One,Two
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
East,2750,1900
North,7500,2300
South,6400,575
West,5500,3750


In [5]:
df.pivot(index='Region', columns=['Team'], values=['Revenue', 'Cost'])

Unnamed: 0_level_0,Revenue,Revenue,Cost,Cost
Team,One,Two,One,Two
Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
East,2750,1900,4400,2100
North,7500,2300,5200,1250
South,6400,575,5300,50
West,5500,3750,5100,1300


--------

# Reshaping Stack


<img src="https://pandas.pydata.org/pandas-docs/stable/_images/reshaping_stack.png">

+ Pivot a level of **column labels =>  rows**. 
+ Work with a **multiindex**.

In [15]:
# let's say we have df2, which have multiindex

df2 = df.set_index(['Region','Team'])

df2

Unnamed: 0_level_0,Unnamed: 1_level_0,Revenue,Cost
Region,Team,Unnamed: 2_level_1,Unnamed: 3_level_1
North,One,7500,5200
West,One,5500,5100
East,One,2750,4400
South,One,6400,5300
North,Two,2300,1250
West,Two,3750,1300
East,Two,1900,2100
South,Two,575,50


Now we gonna create stack dataframe which is stacked. `Revenue` and `Cost` are transfomed from columns labels to rows.

In [8]:
stacked = pd.DataFrame(df2.stack())

In [9]:
stacked

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,0
Region,Team,Unnamed: 2_level_1,Unnamed: 3_level_1
North,One,Revenue,7500
North,One,Cost,5200
West,One,Revenue,5500
West,One,Cost,5100
East,One,Revenue,2750
East,One,Cost,4400
South,One,Revenue,6400
South,One,Cost,5300
North,Two,Revenue,2300
North,Two,Cost,1250


-------

<img src = "https://pandas.pydata.org/pandas-docs/stable/_images/reshaping_unstack.png">

+ Opposite of `stack` - pivots level of **row labels to columns**.

#### by default `level=-1, innermost layer`

In [17]:
stacked.unstack() 

Unnamed: 0_level_0,Unnamed: 1_level_0,0,0
Unnamed: 0_level_1,Unnamed: 1_level_1,Revenue,Cost
Region,Team,Unnamed: 2_level_2,Unnamed: 3_level_2
East,One,2750,4400
East,Two,1900,2100
North,One,7500,5200
North,Two,2300,1250
South,One,6400,5300
South,Two,575,50
West,One,5500,5100
West,Two,3750,1300


# 1) Unstack using `level` parameters

#### Using `level=-2`, second inner most layer, we can see Team One, Two became columns.

In [16]:
stacked.unstack(level=-2)

Unnamed: 0_level_0,Unnamed: 1_level_0,0,0
Unnamed: 0_level_1,Team,One,Two
Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
East,Revenue,2750,1900
East,Cost,4400,2100
North,Revenue,7500,2300
North,Cost,5200,1250
South,Revenue,6400,575
South,Cost,5300,50
West,Revenue,5500,3750
West,Cost,5100,1300


#### Using `level=-3`, third inner most layer, we can see Regions became columns.

In [18]:
stacked.unstack(level=-3)

Unnamed: 0_level_0,Unnamed: 1_level_0,0,0,0,0
Unnamed: 0_level_1,Region,East,North,South,West
Team,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
One,Revenue,2750,7500,6400,5500
One,Cost,4400,5200,5300,5100
Two,Revenue,1900,2300,575,3750
Two,Cost,2100,1250,50,1300


# 2) Unstack using `direct index name`

#### Can also specify which index we want to unstack
Example, unstacking Region.

In [21]:
stacked.unstack('Region') # same as above result

Unnamed: 0_level_0,Unnamed: 1_level_0,0,0,0,0
Unnamed: 0_level_1,Region,East,North,South,West
Team,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
One,Revenue,2750,7500,6400,5500
One,Cost,4400,5200,5300,5100
Two,Revenue,1900,2300,575,3750
Two,Cost,2100,1250,50,1300


------

<img src="https://pandas.pydata.org/pandas-docs/stable/_images/reshaping_melt.png">

+ Melt allows you to **reformat your dataframe to identify columns as "ID variables"**, 
+ while transforming **all other columns, or "measure variables" to the row level**.

In the above example we can see Height and Weight are melted from columns to Row level.

In [11]:
df.head()

Unnamed: 0,Region,Team,Revenue,Cost
0,North,One,7500,5200
1,West,One,5500,5100
2,East,One,2750,4400
3,South,One,6400,5300
4,North,Two,2300,1250


In [24]:
df.melt(id_vars=['Region', 'Team'], var_name='Custom Value Type') # var_name is customed value name

Unnamed: 0,Region,Team,Custom Value Type,value
0,North,One,Revenue,7500
1,West,One,Revenue,5500
2,East,One,Revenue,2750
3,South,One,Revenue,6400
4,North,Two,Revenue,2300
5,West,Two,Revenue,3750
6,East,Two,Revenue,1900
7,South,Two,Revenue,575
8,North,One,Cost,5200
9,West,One,Cost,5100


-----

# Supporting aggregation with `pivot_table`
+ by default pivot_table uses `mean` aggregation
+ `aggfunc=['min', 'max', 'mean', 'sum']`

In [26]:
df.head()

Unnamed: 0,Region,Team,Revenue,Cost
0,North,One,7500,5200
1,West,One,5500,5100
2,East,One,2750,4400
3,South,One,6400,5300
4,North,Two,2300,1250


In [27]:
df.pivot_table(index='Team', values='Revenue')

Unnamed: 0_level_0,Revenue
Team,Unnamed: 1_level_1
One,5537.5
Two,2131.25


In [31]:
df.groupby('Team')['Revenue'].mean() # the above result is exactly same as this group by

Team
One    5537.50
Two    2131.25
Name: Revenue, dtype: float64

### including columns level

In [13]:
df.pivot_table(index='Team', columns='Region', values='Revenue')

Region,East,North,South,West
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
One,2750,7500,6400,5500
Two,1900,2300,575,3750


### using `aggfunc`

In [34]:
df.pivot_table(index='Team', values='Revenue', aggfunc=['min', 'max', 'mean', 'sum'])

Unnamed: 0_level_0,min,max,mean,sum
Unnamed: 0_level_1,Revenue,Revenue,Revenue,Revenue
Team,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
One,2750,7500,5537.5,22150
Two,575,3750,2131.25,8525
