# Dataframe Pivot

* [DataFrame.pivot(index=None, columns=None, values=None](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot)
* [pandas.wide_to_long(df, stubnames, i, j, sep='', suffix='\\d+')](https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html)

* [DataFrame.stack(level=- 1, dropna=True)](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.stack.html)


## Long vs Wide Format

* Wide format represents a class instance as a row where the instance attributes are columns.
* Long format represents a class instance by stacking the ```(id, attribute, value)``` of the instance.

Hence ```unstack``` is ```long -> wide``` transformation.

<img src="./image/pivot.jpg" align="left"/>

In [1]:
import datetime as dt

import numpy as np
import pandas as pd

pd.set_option('precision', 2)

In [2]:
%%html
<style>
table {float:left}
</style>

In [8]:
def make_long_aapl():
    day_1 = dt.date(2015, 12, 29)
    day_2 = dt.date(2015, 12, 30)
    
    col_close = 'close'
    col_open = 'open'
    
    cols = ['date', 'ticker', 'attribute', 'value']
    
    rv = pd.DataFrame([
      {'ticker': 'AAPL', 'date': day_1, 'attribute': col_open,  'value': 106.96},
      {'ticker': 'AAPL', 'date': day_1, 'attribute': col_close, 'value': 108.74},
      {'ticker': 'AAPL', 'date': day_2, 'attribute': col_open,  'value': 108.58},
      {'ticker': 'AAPL', 'date': day_2, 'attribute': col_close, 'value': 107.32},
      {'ticker': 'MSFT', 'date': day_1, 'attribute': col_open,  'value': 106.96},
      {'ticker': 'MSFT', 'date': day_1, 'attribute': col_close, 'value': 108.74},
      {'ticker': 'MSFT', 'date': day_2, 'attribute': col_open,  'value': 108.58},
      {'ticker': 'MSFT', 'date': day_2, 'attribute': col_close, 'value': 107.32},
    ], columns=cols)
    
    return rv

In [9]:
long_format = make_long_aapl()
long_format

Unnamed: 0,date,ticker,attribute,value
0,2015-12-29,AAPL,open,106.96
1,2015-12-29,AAPL,close,108.74
2,2015-12-30,AAPL,open,108.58
3,2015-12-30,AAPL,close,107.32
4,2015-12-29,MSFT,open,106.96
5,2015-12-29,MSFT,close,108.74
6,2015-12-30,MSFT,open,108.58
7,2015-12-30,MSFT,close,107.32


# Identify the 'value' for (attribute, value) pair



In [12]:
intermediate = long_format.set_index(['date', 'ticker', 'attribute'])
intermediate

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,value
date,ticker,attribute,Unnamed: 3_level_1
2015-12-29,AAPL,open,106.96
2015-12-29,AAPL,close,108.74
2015-12-30,AAPL,open,108.58
2015-12-30,AAPL,close,107.32
2015-12-29,MSFT,open,106.96
2015-12-29,MSFT,close,108.74
2015-12-30,MSFT,open,108.58
2015-12-30,MSFT,close,107.32


# Long to Wide via unstack

Long format to wide

In [13]:
intermediate = intermediate.unstack(level=['attribute'])
print(intermediate.index)
intermediate

MultiIndex([(2015-12-29, 'AAPL'),
            (2015-12-29, 'MSFT'),
            (2015-12-30, 'AAPL'),
            (2015-12-30, 'MSFT')],
           names=['date', 'ticker'])


Unnamed: 0_level_0,Unnamed: 1_level_0,value,value
Unnamed: 0_level_1,attribute,close,open
date,ticker,Unnamed: 2_level_2,Unnamed: 3_level_2
2015-12-29,AAPL,108.74,106.96
2015-12-29,MSFT,108.74,106.96
2015-12-30,AAPL,107.32,108.58
2015-12-30,MSFT,107.32,108.58


In [14]:
wide_format = intermediate.reset_index()
wide_format.columns = ['date', 'ticker', 'close', 'open']
wide_format

Unnamed: 0,date,ticker,close,open
0,2015-12-29,AAPL,108.74,106.96
1,2015-12-29,MSFT,108.74,106.96
2,2015-12-30,AAPL,107.32,108.58
3,2015-12-30,MSFT,107.32,108.58
