# **Data Cleaning**

# **12. Unpivoting or Pivoting (Reshaping) Data in pandas**

In [2]:
import numpy as np
import pandas as pd 

These techniques help you reshape your DataFrame, either to **summarize** (pivot) or **flatten** (unpivot) your data, based on your analysis needs.

---

## 🧭 When Do You Reshape Data?

| Operation   | When to Use                                                                          |
| ----------- | ------------------------------------------------------------------------------------ |
| `pivot()`   | When you want to **summarize** data by creating a table layout with rows and columns |
| `melt()`    | When you want to **flatten** wide data into long format                              |
| `stack()`   | When you want to **compress columns into rows** (multi-indexed DF)                   |
| `unstack()` | When you want to **expand row index into columns** (multi-indexed DF)                |


In [3]:
df = pd.DataFrame({
    'Date': ['2023-01', '2023-01', '2023-02', '2023-02'],
    'Region': ['North', 'South', 'North', 'South'],
    'Sales': [200, 150, 220, 180],
    'Profit': [50, 40, 60, 55]
})

df

Unnamed: 0,Date,Region,Sales,Profit
0,2023-01,North,200,50
1,2023-01,South,150,40
2,2023-02,North,220,60
3,2023-02,South,180,55


## 🔄 1. `pivot()` — Convert Long to Wide Format

In [4]:
pivot_df = df.pivot(index='Date', columns='Region', values='Sales')
pivot_df

Region,North,South
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01,200,150
2023-02,220,180


✅ **Use Case**: Reporting monthly regional sales — easier for business summary.

🧠 **Why `pivot()`?**
You’re organizing **existing data** into a more readable layout.

## 🔄 2. `melt()` — Convert Wide to Long Format

In [5]:
df

Unnamed: 0,Date,Region,Sales,Profit
0,2023-01,North,200,50
1,2023-01,South,150,40
2,2023-02,North,220,60
3,2023-02,South,180,55


In [6]:
df.melt(id_vars=['Date', 'Region'], value_vars=['Sales', 'Profit'],
        var_name='Metric', value_name='Value')

Unnamed: 0,Date,Region,Metric,Value
0,2023-01,North,Sales,200
1,2023-01,South,Sales,150
2,2023-02,North,Sales,220
3,2023-02,South,Sales,180
4,2023-01,North,Profit,50
5,2023-01,South,Profit,40
6,2023-02,North,Profit,60
7,2023-02,South,Profit,55


✅ **Use Case**: Preparing for **visualization** or feeding into **ML models** which need a tidy long format.

🧠 **Why `melt()`?**
You reduce multiple metric columns into rows to improve analysis compatibility (e.g., Seaborn plots or stats libraries).

## 🔄 3. `stack()` — Stack Columns into MultiIndex Rows

In [7]:
df

Unnamed: 0,Date,Region,Sales,Profit
0,2023-01,North,200,50
1,2023-01,South,150,40
2,2023-02,North,220,60
3,2023-02,South,180,55


In [8]:
df.set_index(['Date', 'Region'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales,Profit
Date,Region,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-01,North,200,50
2023-01,South,150,40
2023-02,North,220,60
2023-02,South,180,55


In [10]:
stacked_df = df.set_index(['Date', 'Region']).stack()
stacked_df

Date     Region        
2023-01  North   Sales     200
                 Profit     50
         South   Sales     150
                 Profit     40
2023-02  North   Sales     220
                 Profit     60
         South   Sales     180
                 Profit     55
dtype: int64

✅ **Use Case**: When you want to **compress** your DataFrame or **prepare for hierarchical grouping**.

🧠 **Why `stack()`?**
It’s useful for data processing pipelines where compact structure or hierarchical analysis is needed.

## 🔄 4. `unstack()` — Convert Index Level to Columns

In [11]:
stacked_df.unstack()

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales,Profit
Date,Region,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-01,North,200,50
2023-01,South,150,40
2023-02,North,220,60
2023-02,South,180,55



✅ **Use Case**: You’re **reversing** the effect of `stack()` to restore a table-like layout.

🧠 **Why `unstack()`?**
Makes it easier to plot or aggregate along different dimensions.

## 🔁 Advanced Use Case: Reshape with Multiple Variables

In [12]:
df

Unnamed: 0,Date,Region,Sales,Profit
0,2023-01,North,200,50
1,2023-01,South,150,40
2,2023-02,North,220,60
3,2023-02,South,180,55


In [13]:
df.pivot_table(index='Date', columns='Region', values=['Sales', 'Profit'])

Unnamed: 0_level_0,Profit,Profit,Sales,Sales
Region,North,South,North,South
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2023-01,50.0,40.0,200.0,150.0
2023-02,60.0,55.0,220.0,180.0


✅ **Use Case**: Dashboards or multi-variable reports.

🧠 **Why `pivot_table()`?**
Supports **aggregation**, handles **duplicate entries**, and works with **multiple value columns**.

## 📦 Real-Life Use Cases

| Industry     | Use Case                                          | Function        |
| ------------ | ------------------------------------------------- | --------------- |
| Retail       | Compare regional monthly sales                    | `pivot()`       |
| Finance      | Convert column-wise metrics to tidy format for ML | `melt()`        |
| Healthcare   | Collapse multi-level diagnosis data               | `stack()`       |
| Analytics    | Expand patient data from row-wise to column-wise  | `unstack()`     |
| BI Reporting | Multi-variable reports per region per time        | `pivot_table()` |


## ✅ Summary Table

| Method          | Use Case                              | Converts    |
| --------------- | ------------------------------------- | ----------- |
| `pivot()`       | Summarize values with unique keys     | Long → Wide |
| `pivot_table()` | Summarize with aggregation            | Long → Wide |
| `melt()`        | Normalize/flatten for ML/tidy data    | Wide → Long |
| `stack()`       | Collapse columns into MultiIndex rows | Wide → Long |
| `unstack()`     | Expand index into columns             | Long → Wide |


<center><b>Thanks</b></center>