
## **Data Transformation with melt(), pivot(), and pivot\_table()**

### **melt() — Wide to Long Transformation**

The `melt()` function in Pandas is used to reshape a DataFrame from **wide format** to **long format**. It is especially useful for data normalization and visualization, converting multiple columns into two: one for variable names and one for values.

#### **When to Use**

* When each row represents an observation, and each column is a separate variable or measurement.
* To convert datasets into a tidy format where each variable forms a column and each observation forms a row.

#### **Key Concepts**

* **id\_vars**: Columns to retain as identifier variables.
* **value\_vars**: Columns to transform into rows.
* **var\_name**: The name of the new column containing variable names.
* **value\_name**: The name of the new column containing corresponding values.

#### **Why Use melt()?**

* Makes data compatible with visualization tools.
* Simplifies analysis by converting data to a long, tidy format.
* Enables easier manipulation and aggregation.

---

### **pivot() — Long to Wide Transformation**

The `pivot()` function reshapes data from **long format** to **wide format**, essentially reversing the effect of `melt()`.

#### **When to Use**

* When you have a tidy DataFrame and want to spread unique values from a column into multiple new columns.

#### **Key Concepts**

* **index**: Column to use as the new row labels.
* **columns**: Column to use as new column headers.
* **values**: Column containing values to fill the resulting table.

#### **Why Use pivot()?**

* Organizes data for readability and analysis.
* Suitable for creating data summaries where each variable is a separate column.
* Ideal for reshaping data to align with pivot table or reporting requirements.

#### **Important Note**

* `pivot()` requires unique combinations of index and column values. If duplicates exist, it will raise an error.

---

### **pivot\_table() — Handling Duplicate Entries**

The `pivot_table()` function is an advanced version of `pivot()` that can handle duplicate entries by applying an aggregation function.

#### **When to Use**

* When there are multiple entries for the same combination of index and column values.

#### **Key Concepts**

* **aggfunc**: Function to apply to aggregate duplicate values (e.g., mean, sum, count).
* Prevents errors caused by duplicate data during reshaping.

#### **Why Use pivot\_table()?**

* Provides flexibility in summarizing data.
* Handles duplicates gracefully by aggregating them.
* Useful for data reporting and analytics.

---

### **Summary**

* Use **`melt()`** to reshape wide-format data into long-format.
* Use **`pivot()`** to reshape long-format data into wide-format.
* Use **`pivot_table()`** when you need to aggregate duplicate entries during the pivot process.



In [1]:
import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Math': [85, 78, 92],
    'Science': [90, 82, 89],
    'English': [88, 85, 94]
}

df = pd.DataFrame(data)

# Display the DataFrame
print(df)


      Name  Math  Science  English
0    Alice    85       90       88
1      Bob    78       82       85
2  Charlie    92       89       94


In [2]:
df

Unnamed: 0,Name,Math,Science,English
0,Alice,85,90,88
1,Bob,78,82,85
2,Charlie,92,89,94


In [7]:
df.melt(id_vars=["Name"], value_vars=["Math", "Science", "English"], var_name="Subject", value_name="Score")

Unnamed: 0,Name,Subject,Score
0,Alice,Math,85
1,Bob,Math,78
2,Charlie,Math,92
3,Alice,Science,90
4,Bob,Science,82
5,Charlie,Science,89
6,Alice,English,88
7,Bob,English,85
8,Charlie,English,94


In [14]:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Math': [85, 78, 92],
    'Science': [90, 82, 89],
    'English': [88, 85, 94]
}
df = pd.DataFrame(data)

# Reshape using melt() to create 'Subject' and 'Score' columns
df_melted = df.melt(id_vars=["Name"], var_name="Subject", value_name="Score")

# Pivot to get subjects as columns with scores
df_pivoted = df_melted.pivot(index="Name", columns="Subject", values="Score")

print(df_pivoted)


Subject  English  Math  Science
Name                           
Alice         88    85       90
Bob           85    78       82
Charlie       94    92       89


In [15]:
df_melted = df.melt(id_vars=["Name"], var_name="Subject", value_name="Score")


In [16]:
df_melted

Unnamed: 0,Name,Subject,Score
0,Alice,Math,85
1,Bob,Math,78
2,Charlie,Math,92
3,Alice,Science,90
4,Bob,Science,82
5,Charlie,Science,89
6,Alice,English,88
7,Bob,English,85
8,Charlie,English,94


In [25]:
df_pivoted = df_melted.pivot_table(index="Name", columns="Subject", values="Score", aggfunc="mean")


In [26]:
df_pivoted ##Duplicate Entries: If you have multiple rows with the same combination of index and columns, pivot() will raise an error. In such cases, you should use pivot_table() (which can handle duplicate entries by aggregating them).

Subject,English,Math,Science
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Alice,88.0,85.0,90.0
Bob,85.0,78.0,82.0
Charlie,94.0,92.0,89.0
