# Reshape

## Working with pandas DataFrames

**Prerequisites**
- pandas intro
- pandas basics
- Importance of index

**Outcomes**
- Understand and apply `melt`/`stack`/`unstack`/`pivot` methods
- Practice transformations of indices
- Understand tidy data

In [None]:
import numpy as np
import pandas as pd

%matplotlib inline

## Tidy Data

The concept of "tidy data" is helpful in understanding the objectives for reshaping data.

> A dataset is a collection of values, usually either numbers (if quantitative) or strings (if qualitative). Values are organized in two ways. Every value belongs to a variable and an observation.
>
> -- Hadley Wickham, Tidy Data (2013)

## Principles of Tidy Data

In tidy data:

1. **Each variable forms a column**
2. **Each observation forms a row**
3. **Each type of observational unit forms a table**

These map directly to:
- pandas columns
- pandas rows
- pandas DataFrames

## Loading Example Data

We'll use basketball statistics data for our examples.

In [None]:
url = "https://datascience.quantecon.org/assets/data/bball.csv"
bball = pd.read_csv(url)
bball.info()
bball

## Long vs Wide Format

Many reshaping operations change between **long** and **wide** DataFrames.

- **Wide format**: More columns, fewer rows
- **Long format**: Fewer columns, more rows

## Long Format Example

In [None]:
bball_long = bball.melt(id_vars=["Year", "Player", "Team", "TeamName"])
bball_long

## Wide Format Example

In [None]:
bball_wide = bball_long.pivot_table(
    index="Year",
    columns=["Player", "variable", "Team"],
    values="value"
)
bball_wide

## Basic Reshaping Methods

Three fundamental methods:

1. **`set_index`**: Move columns into the index
2. **`reset_index`**: Move index levels to columns
3. **`T`**: Transpose (swap rows and columns)

## `set_index` Example

In [None]:
bball2 = bball.set_index(["Player", "Year"])
bball2.head()

## Transpose Example

In [None]:
bball3 = bball2.T
bball3.head()

## `stack` and `unstack`

These methods operate on index and column labels:

- **`stack`**: Move column labels to index (wide ‚Üí long)
- **`unstack`**: Move index labels to columns (long ‚Üí wide)

üí° **Mnemonic**: **U**nstack moves levels **U**p

## `stack` Example

Let's compute average stats per player across all years and teams.

In [None]:
bball_wide

In [None]:
bball_wide.stack()

In [None]:
player_stats = bball_wide.stack().mean()
player_stats

## `stack` with Specific Level

Average by team and stat (across years and players):

In [None]:
bball_wide.stack(level="Player").mean()

## `unstack` Example

Prepare data for visualization:

In [None]:
player_stats

In [None]:
player_stats.unstack()

## Visualizing Player Stats

In [None]:
player_stats.unstack().plot.bar()

## Alternative View: Stats by Variable

In [None]:
player_stats.unstack(level="Player")

In [None]:
player_stats.unstack(level="Player").plot.bar()

## `melt` Method

**Purpose**: Transform wide format to long format

‚ö†Ô∏è **Warning**: Any existing index will be deleted

**Result**: Creates two new columns:
- `variable`: Former column names
- `value`: Former values

## `melt` Example

In [None]:
bball

In [None]:
bball.melt(id_vars=["Year", "Player", "Team", "TeamName"])

## `pivot` Method

The `pivot` method:
1. Takes unique values of one column ‚Üí places along **index**
2. Takes unique values of another column ‚Üí places along **columns**
3. Takes values from a third column ‚Üí fills DataFrame values

‚ö†Ô∏è **Requirement**: Index/column pairs must be **unique**

## `pivot` Example

In [None]:
bball.head(6).pivot(index="Year", columns="Player", values="Pts")

## Replicating `pivot` with Fundamental Operations

In [None]:
# 1. set_index  2. extract column  3. unstack
bball.head(6).set_index(["Year", "Player"])["Pts"].unstack(level="Player")

## `pivot_table` Method

Generalization of `pivot` that overcomes two limitations:

1. ‚úÖ Allows **multiple columns** for index/columns/values
2. ‚úÖ Handles **duplicate entries** via aggregation

Default aggregation: **mean**

## Basic `pivot_table` Example

In [None]:
bball.head(6).pivot_table(index="Year", columns="Player", values="Pts")

## Multiple Columns as Index

In [None]:
bball.pivot_table(index=["Year", "Team"], columns="Player", values="Pts")

## Handling Duplicates

In [None]:
# Full dataset has duplicates (Ibaka traded mid-season)
bball_pivoted = bball.pivot_table(index="Year", columns="Player", values="Pts")
bball_pivoted

## Custom Aggregation Functions

In [None]:
# Use max instead of mean
bball.pivot_table(index="Year", columns="Player", values="Pts", aggfunc=max)

In [None]:
# Count values
bball.pivot_table(index="Year", columns="Player", values="Pts", aggfunc=len)

## Multiple Aggregation Functions

In [None]:
bball.pivot_table(index="Year", columns="Player", values="Pts", aggfunc=[max, len])

## Generic Example Data

Let's visualize reshaping with simpler data:

In [None]:
# A and B are "identifiers", C, D, and E are variables
df = pd.DataFrame({
    "A": [0, 0, 1, 1],
    "B": "x y x z".split(),
    "C": [1, 2, 1, 4],
    "D": [10, 20, 30, 20,],
    "E": [2, 1, 5, 4,]
})

df

## Setting Multi-Level Index

In [None]:
df2 = df.set_index(["A", "B"])
df2

## Transposing

In [None]:
df3 = df2.T
df3

## Stacking Operation

![stack.gif](stack.gif)

In [None]:
df2_stack = df2.stack()
df2_stack

## Unstacking Operation

![unstack_level0.gif](unstack_level0.gif)

In [None]:
df2.unstack()

## Melting Operation

![melt.gif](melt.gif)

In [None]:
df_melted = df.melt(id_vars=["A", "B"])
df_melted

## Summary: Reshaping Methods

| Method | Direction | Purpose |
|--------|-----------|----------|
| `set_index` | - | Move columns to index |
| `reset_index` | - | Move index to columns |
| `stack` | Wide ‚Üí Long | Move column labels to index |
| `unstack` | Long ‚Üí Wide | Move index labels to columns |
| `melt` | Wide ‚Üí Long | Unpivot DataFrame |
| `pivot` | Long ‚Üí Wide | Reshape with unique pairs |
| `pivot_table` | Long ‚Üí Wide | Reshape with aggregation |

## Key Takeaways

1. **Tidy data** principles guide effective reshaping
2. **`stack`/`unstack`** are inverses (remember: **U**nstack moves **U**p)
3. **`pivot_table`** is more flexible than `pivot`
4. **`melt`** is the most straightforward wide ‚Üí long transformation
5. Choose the right tool based on:
   - Data structure (unique vs duplicate index/column pairs)
   - Desired output format
   - Need for aggregation

## Practice Exercises

1. **Challenge**: Recreate `bball_wide` from `bball` using `set_index`, `T`, `stack`, and `unstack`

2. Experiment with `melt`:
   - What happens with different `id_vars`?
   - How does `value_vars` parameter work?

3. Create a pivot table:
   - Index: `Player`
   - Columns: `TeamName`
   - Values: `[Rebound, Assist]`
   - Try multiple aggregation functions

# Questions?

## Thank you!