# Dataframe Pivot

* [DataFrame.pivot(index=None, columns=None, values=None](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot)
* [pandas.wide_to_long(df, stubnames, i, j, sep='', suffix='\\d+')](https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html)

* [DataFrame.stack(level=- 1, dropna=True)](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.stack.html)


## Long Format 

Long format represents a class instance by stacking the ```(id, attribute, value)``` of the instance. You can add additional attribute without changing the schema.

## Wide Format
Wide format represents a class instance as a row where the instance attributes are columns. Each row represents a self sufficient instance ready to feed as a record. To add attributes, need to change the schema.

## Unpack
```unstack``` is ```long -> wide``` transformation.

<img src="./image/pivot.jpg" align="left"/>

<img src="./image/wide_long_formats.jpg" align="left" width="800"/>

In [1]:
import datetime as dt

import numpy as np
import pandas as pd

pd.set_option('precision', 2)

In [2]:
%%html
<style>
table {float:left}
</style>

In [3]:
def make_long_aapl():
    day_1 = dt.date(2015, 12, 29)
    day_2 = dt.date(2015, 12, 30)
    
    col_close = 'close'
    col_open = 'open'
    
    cols = ['date', 'ticker', 'attribute', 'value']
    
    rv = pd.DataFrame([
      {'ticker': 'AAPL', 'date': day_1, 'attribute': col_open,  'value': 106.96},
      {'ticker': 'AAPL', 'date': day_1, 'attribute': col_close, 'value': 108.74},
      {'ticker': 'AAPL', 'date': day_2, 'attribute': col_open,  'value': 108.58},
      {'ticker': 'AAPL', 'date': day_2, 'attribute': col_close, 'value': 107.32},
      {'ticker': 'MSFT', 'date': day_1, 'attribute': col_open,  'value': 106.96},
      {'ticker': 'MSFT', 'date': day_1, 'attribute': col_close, 'value': 108.74},
      {'ticker': 'MSFT', 'date': day_2, 'attribute': col_open,  'value': 108.58},
      {'ticker': 'MSFT', 'date': day_2, 'attribute': col_close, 'value': 107.32},
    ], columns=cols)
    
    return rv

In [4]:
aapl_long_format = make_long_aapl()
aapl_long_format

Unnamed: 0,date,ticker,attribute,value
0,2015-12-29,AAPL,open,106.96
1,2015-12-29,AAPL,close,108.74
2,2015-12-30,AAPL,open,108.58
3,2015-12-30,AAPL,close,107.32
4,2015-12-29,MSFT,open,106.96
5,2015-12-29,MSFT,close,108.74
6,2015-12-30,MSFT,open,108.58
7,2015-12-30,MSFT,close,107.32


# Pivot


## Long to Wide Format

Represent the class instance **AAPL daily stock data** as a row in a wide format datafarme by aggregating the (attribute, value) of **AAPL** in the long format dataframe.

| ID (GROUP BY)    | *Attributes  | 
|------------------|--------------|
|  (date, ticker)  |  open-price, close-price |


In [8]:
aapl_wide_format_via_pivot = aapl_long_format.pivot(
    index=['date','ticker'],     # Group BY keys (ID keys to identify an instance)
    columns=['attribute'],       # Tell which column is 'attribute' in (attribute, value) pair
    values='value'               # Tell which column is 'value' in (attribute, value) pair
)
aapl_wide_format_via_pivot

Unnamed: 0_level_0,attribute,close,open
date,ticker,Unnamed: 2_level_1,Unnamed: 3_level_1
2015-12-29,AAPL,108.74,106.96
2015-12-29,MSFT,108.74,106.96
2015-12-30,AAPL,107.32,108.58
2015-12-30,MSFT,107.32,108.58


In [9]:
# Convert the df indices to columns
aapl_wide_format_via_pivot = aapl_wide_format_via_pivot.reset_index()
aapl_wide_format_via_pivot.columns.name = None
aapl_wide_format_via_pivot

Unnamed: 0,date,ticker,close,open
0,2015-12-29,AAPL,108.74,106.96
1,2015-12-29,MSFT,108.74,106.96
2,2015-12-30,AAPL,107.32,108.58
3,2015-12-30,MSFT,107.32,108.58


## Wide to Long Format

Opposite of ```pivot``` function is ```melt```.

* [pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)](https://pandas.pydata.org/docs/reference/api/pandas.melt.html)

> Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

In [15]:
aapl_wide_format = aapl_wide_format_via_pivot
aapl_wide_format.melt(
    id_vars=['date','ticker'],     # ID keys to identify an instance
    value_vars=['close', 'open'],  # Tell which columns are 'value' in (attribute, value) pair in long format
    var_name='attribute',
    value_name='value'
)

Unnamed: 0,date,ticker,attribute,value
0,2015-12-29,AAPL,close,108.74
1,2015-12-29,MSFT,close,108.74
2,2015-12-30,AAPL,close,107.32
3,2015-12-30,MSFT,close,107.32
4,2015-12-29,AAPL,open,106.96
5,2015-12-29,MSFT,open,106.96
6,2015-12-30,AAPL,open,108.58
7,2015-12-30,MSFT,open,108.58


# Unstack

* [pandas - explanation of unstack method description](https://stackoverflow.com/questions/71169005/pandas-explanation-of-unstack-method-description)

In [None]:
aapl_long_format.unstack()