# **Data Transformation**

## **9. Pivot/melt for long/wide formats**

In [1]:
import numpy as np 
import pandas as pd 

### 1. **What it does and When to Use it**

**Pivoting and melting** are techniques for **reshaping data** between **wide** and **long** formats.

* **Pivoting**: Converts data from long format to wide format.
* **Melting**: Converts data from wide format to long format.

#### 🔍 **When to use:**

* To **prepare data for machine learning**, where long format is often preferred.
* For **data visualization**, where tools like seaborn/matplotlib prefer tidy (long) format.
* During **data aggregation**, cleaning, or reporting to reorient rows/columns as needed.


### 2. **Syntax and Core Parameters**

#### `pd.melt()` – Wide to Long Format

```python
pd.melt(
    frame, 
    id_vars=None, 
    value_vars=None, 
    var_name=None, 
    value_name='value'
)
```

* `frame`: DataFrame to unpivot.
* `id_vars`: Columns to keep fixed (identifiers).
* `value_vars`: Columns to unpivot (become variable names).
* `var_name`: Name of the ‘variable’ column.
* `value_name`: Name of the ‘value’ column.

#### `df.pivot()` – Long to Wide Format (strict)

```python
df.pivot(index='...', columns='...', values='...')
```

* `index`: Column to make new index.
* `columns`: Column to make new columns.
* `values`: Column to fill values in cells.

#### `df.pivot_table()` – More flexible with aggregation

```python
df.pivot_table(index='...', columns='...', values='...', aggfunc='mean')
```


### 3. **Different Methods and Techniques**

#### ✅ Wide to Long:

* `pd.melt()`
* `df.stack()`

#### ✅ Long to Wide:

* `df.pivot()` (fails on duplicate index)
* `df.pivot_table()` (aggregates duplicates)
* `df.unstack()`


### 4. **Common Pitfalls and Best Practices**

| Pitfall                                                                               | Best Practice                                         |
| ------------------------------------------------------------------------------------- | ----------------------------------------------------- |
| **Duplicate entries in pivot** cause `ValueError`.                                    | Use `pivot_table()` with aggregation.                 |
| `melt()` may lose column names if `id_vars` or `value_vars` aren't specified clearly. | Always explicitly specify `id_vars` and `value_vars`. |
| Misuse of `pivot` for grouped summaries.                                              | Use `pivot_table` with `aggfunc`.                     |
| Wrong index/column settings after transformation.                                     | Use `reset_index()` if needed to flatten index.       |


### 5. **Examples on Real/Pseudo Data**

#### 📌 Example 1: `melt()` – Wide to Long

In [2]:
df = pd.DataFrame({
    'ID': [1, 2],
    'Math': [90, 80],
    'Science': [85, 88]
})

df

Unnamed: 0,ID,Math,Science
0,1,90,85
1,2,80,88


In [3]:
melted = pd.melt(df, id_vars='ID', var_name='Subject', value_name='Marks')
melted

Unnamed: 0,ID,Subject,Marks
0,1,Math,90
1,2,Math,80
2,1,Science,85
3,2,Science,88


#### 📌 Example 2: `pivot()` – Long to Wide

In [4]:
df = pd.DataFrame({
    'ID': [1, 1, 2, 2],
    'Subject': ['Math', 'Science', 'Math', 'Science'],
    'Marks': [90, 85, 80, 88]
})

df

Unnamed: 0,ID,Subject,Marks
0,1,Math,90
1,1,Science,85
2,2,Math,80
3,2,Science,88


In [5]:
pivoted = df.pivot(index='ID', columns='Subject', values='Marks')
pivoted

Subject,Math,Science
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,90,85
2,80,88


#### 📌 Example 3: `pivot_table()` with aggregation

In [6]:
df = pd.DataFrame({
    'Student': ['A', 'A', 'B', 'B'],
    'Subject': ['Math', 'Math', 'Math', 'Math'],
    'Score': [90, 85, 80, 82]
})

df

Unnamed: 0,Student,Subject,Score
0,A,Math,90
1,A,Math,85
2,B,Math,80
3,B,Math,82


In [7]:
# Intentional duplicate: two scores per subject
pivoted = df.pivot_table(index='Student', columns='Subject', values='Score', aggfunc='mean')
pivoted

Subject,Math
Student,Unnamed: 1_level_1
A,87.5
B,81.0


### 6. **Real World Use Cases**

#### ✅ Use Case 1: **Survey Data**

* Melt responses to put all answers in a single column with respondent ID and question.
* Wide: One column per question
* Long: One row per question per respondent

#### ✅ Use Case 2: **Time Series Stock Prices**

* Each column is a stock symbol → melt to have date, symbol, price.

#### ✅ Use Case 3: **Sensor Data or Logs**

* Pivot for displaying hourly/daily metrics from long logs.

#### ✅ Use Case 4: **ML Input Preparation**

* Reshape encoded data (e.g., one-hot, multi-feature) into normalized long format.


### ✅ Summary Table

| Operation                 | Tool            | Use When                         |
| ------------------------- | --------------- | -------------------------------- |
| Wide → Long               | `pd.melt()`     | Normalize repeated columns       |
| Long → Wide               | `pivot()`       | Structure repeated rows          |
| Long → Wide w/ duplicates | `pivot_table()` | Aggregate duplicate index values |


<center><b>Thanks</b></center>