# Dataframe pivot/unpivot

* [Reshaping and pivot tables](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html)

In [1]:
import pandas as pd

In [2]:
%%html
<style>
table {float:left}
</style>

# Constant

In [3]:
DATA_DIR: str = "../data/titanic"

# Dataframe

In [4]:
df = pd.read_csv(
    f"{DATA_DIR}/train.csv"
)

# Drop columns

In [5]:
df.drop(labels=["Name", "Ticket", "Cabin", "SibSp", "Parch", "Fare", "Embarked", "Age"], axis=1, inplace=True)

---
# Pivot = Wide/Instance Format

Pivoting is creating a row that is a class instance with multiple keys (attributes or members).

| id | key/Survived | key/pclass | key/sex    |
|----|--------------|------------|------------|
| 1  | value/0      | value/3    | value/male |

In [6]:
df[df["PassengerId"] == 1]

Unnamed: 0,PassengerId,Survived,Pclass,Sex
0,1,0,3,male


In [56]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Sex
0,1,0,3,male
1,2,1,1,female
2,3,1,3,female
3,4,1,1,female
4,5,0,3,male


# Unpivot = Long/(key, value) format

Unpivoting is creating rows where each row is ```(id, key, value)``` decomposing an instance into its attribute per row.

| id | key      | value |
|----|----------|-------|
| 1  | Survived | 0     |
| 1  | Pclass   | 3     |
| 1  | Sex      | male  |

In [19]:
df_unpivoted = df.melt(
    id_vars="PassengerId"
)

In [20]:
df_unpivoted[df_unpivoted["PassengerId"] == 1]

Unnamed: 0,PassengerId,variable,value
0,1,Survived,0
891,1,Pclass,3
1782,1,Sex,male


In [23]:
df_unpivoted.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2673 entries, 0 to 2672
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   PassengerId  2673 non-null   int64 
 1   variable     2673 non-null   object
 2   value        2673 non-null   object
dtypes: int64(1), object(2)
memory usage: 62.8+ KB


# Revert (Pivot)



In [66]:
df_pivoted = df_unpivoted.pivot(
    index="PassengerId",
    columns="variable",
    values="value"
)
df_pivoted.reset_index(inplace=True, drop=False)
df_pivoted.columns.name = None
df_pivoted.head()

Unnamed: 0,PassengerId,Pclass,Sex,Survived
0,1,3,male,0
1,2,1,female,1
2,3,3,female,1
3,4,1,female,1
4,5,3,male,0
