# <font color="#418FDE" size="6.5" uppercase>**Reshaping DataFrames**</font>

>Last update: 20251225.
    
By the end of this Lecture, you will be able to:
- Transform DataFrames between wide and long formats using pd.melt() and pivot or pivot_table in Pandas 2.3.1. 
- Use set_index(), reset_index(), stack(), and unstack() to work effectively with hierarchical indexes. 
- Choose appropriate reshaping strategies to prepare datasets for downstream visualization or modeling tasks. 


## **1. Wide and long reshaping**

### **1.1. Melt Identifier Columns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas (2.3.1) A-Z/Module_03/Lecture_B/image_01_01.jpg?v=1766712888" width="250">



>* Keep identifier columns fixed when melting data
>* Melt measured columns to tidy, long format

>* Identifier choice shapes how data is structured
>* Wrong identifiers break comparisons and trend analysis

>* Use multiple identifiers to define each observation
>* Melt scores while keeping context for rich analysis



In [None]:
#@title Python Code - Melt Identifier Columns

# Show how identifier columns stay fixed when melting wide data.
# Demonstrate choosing correct id_vars for pd.melt in Pandas.
# Compare correct and incorrect identifier choices using simple printed outputs.

import pandas as pandas_library

# Create a simple wide DataFrame representing store quarterly revenue.
store_data = {
    "store_id": [1, 2],
    "city_name": ["Boston", "Dallas"],
    "revenue_q1_usd": [12000, 15000],
    "revenue_q2_usd": [13000, 16000],
}

wide_frame = pandas_library.DataFrame(store_data)

# Show the original wide DataFrame with separate quarterly revenue columns.
print("Original wide DataFrame with quarterly revenue columns:")
print(wide_frame)

# Melt using correct identifier columns, keeping store and city information fixed.
long_correct = pandas_library.melt(
    wide_frame,
    id_vars=["store_id", "city_name"],
    var_name="quarter_label",
    value_name="revenue_usd",
)

print("\nLong format with correct identifier columns chosen:")
print(long_correct)

# Melt using incorrect identifier choice, mistakenly treating one revenue column as identifier.
long_incorrect = pandas_library.melt(
    wide_frame,
    id_vars=["store_id", "revenue_q1_usd"],
    var_name="mixed_column",
    value_name="value_usd",
)

print("\nLong format with incorrect identifier including revenue_q1_usd:")
print(long_incorrect)



### **1.2. Pivot and Pivot Table**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas (2.3.1) A-Z/Module_03/Lecture_B/image_01_02.jpg?v=1766712909" width="250">



>* Pivot reverses melt, rebuilding wide tables
>* Map ids to rows, variables to columns, values

>* Pivot works best with uniquely keyed data
>* Reorients rows into columns for easy analysis

>* Pivot tables summarize duplicates with aggregations
>* They create clean wide tables for analysis



In [None]:
#@title Python Code - Pivot and Pivot Table

# Demonstrate pivot and pivot_table reshaping long data to wide format.
# Show simple weather measurements reshaped into separate temperature and rainfall columns.
# Compare pivot behavior with pivot_table when duplicate entries require aggregation.

import pandas as pd

# Create a small long table with city, year, measure type, and numeric value.
data = {
    "city": ["Denver", "Denver", "Denver", "Denver", "Austin", "Austin"],
    "year": [2023, 2023, 2023, 2023, 2023, 2023],
    "measure": ["temp_F", "rain_in", "temp_F", "rain_in", "temp_F", "rain_in"],
    "value": [75, 2.5, 80, 3.0, 90, 1.2],
}

# Build the DataFrame and display the original long tidy structure.
df_long = pd.DataFrame(data)
print("Original long table:\n", df_long)

# Use pivot to spread measure values into separate columns for each city and year.
df_pivot = df_long.pivot(index=["city", "year"], columns="measure", values="value")
print("\nPivot wide table without duplicates:\n", df_pivot)

# Add a duplicate Denver temperature row to show pivot_table aggregation behavior.
extra_row = {"city": "Denver", "year": 2023, "measure": "temp_F", "value": 78}
df_with_duplicate = pd.concat([df_long, pd.DataFrame([extra_row])], ignore_index=True)

# Use pivot_table with mean aggregation to handle duplicate Denver temperature entries.
df_pivot_table = df_with_duplicate.pivot_table(index=["city", "year"], columns="measure", values="value", aggfunc="mean")
print("\nPivot_table with mean aggregation:\n", df_pivot_table)



### **1.3. Managing Duplicate Pivot Data**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas (2.3.1) A-Z/Module_03/Lecture_B/image_01_03.jpg?v=1766712932" width="250">



>* Duplicate key pairs often appear when pivoting
>* You must decide how to combine repeated values

>* Define how detailed each wide row is
>* Add extra identifiers so each combination is unique

>* Use pivot with aggregation to combine duplicates
>* Choose summary stats that match your analysis goals



In [None]:
#@title Python Code - Managing Duplicate Pivot Data

# Demonstrate duplicate pivot issues and simple aggregation handling.
# Show pivot error with repeated index and column combinations.
# Use pivot_table aggregation to summarize duplicate measurement values.

import pandas as pd

# Create simple long format data with duplicate day and city combinations.
data = {
    "city": ["Boston", "Boston", "Boston", "Denver", "Denver", "Denver"],
    "day": ["Mon", "Mon", "Tue", "Mon", "Mon", "Tue"],
    "temperature_f": [70, 72, 68, 65, 67, 64],
}

long_df = pd.DataFrame(data)

# Show the original long format DataFrame with duplicate key combinations.
print("Long format data with duplicates:")
print(long_df)

# Attempt pivot, which fails because duplicates create ambiguous wide cells.
try:
    wide_bad = long_df.pivot(index="city", columns="day", values="temperature_f")
except ValueError as error:
    print("\nPivot failed because duplicates exist:")
    print(error)

# Use pivot_table with mean aggregation to handle duplicate measurements.
wide_mean = long_df.pivot_table(
    index="city", columns="day", values="temperature_f", aggfunc="mean",
)

print("\nWide format using mean aggregation:")
print(wide_mean)



## **2. Indexing and Hierarchies**

### **2.1. Index setup and reset**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas (2.3.1) A-Z/Module_03/Lecture_B/image_02_01.jpg?v=1766712957" width="250">



>* Index setup defines how rows are identified
>* Good index choices simplify selection, aggregation, hierarchy

>* Promote multiple columns into a tree-like index
>* Use index levels for navigation, grouping, selection

>* Resetting the index turns levels into columns
>* Helps flatten data for merging and visualization



In [None]:
#@title Python Code - Index setup and reset

# Show how set_index creates meaningful hierarchical indexes.
# Show how reset_index flattens indexes back into columns.
# Use a tiny sales example with store and date indexes.

import pandas as pd

# Create a simple DataFrame with store, date, and sales columns.
data = {
    "store": ["North", "North", "South", "South"],
    "date": ["2024-01-01", "2024-01-02", "2024-01-01", "2024-01-02"],
    "sales_dollars": [1200, 1500, 900, 1100],
}

# Build the DataFrame and display its original flat structure.
df = pd.DataFrame(data)
print("Original flat DataFrame structure:\n", df, "\n")

# Set a hierarchical index using store and date columns together.
df_indexed = df.set_index(["store", "date"])
print("After set_index with store and date:\n", df_indexed, "\n")

# Access one store using the hierarchical index for quick selection.
print("Sales for North store only using index:\n", df_indexed.loc["North"], "\n")

# Reset the index to turn index levels back into normal columns.
df_reset = df_indexed.reset_index()
print("After reset_index flattened back to columns:\n", df_reset)



### **2.2. MultiIndex Column Structures**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas (2.3.1) A-Z/Module_03/Lecture_B/image_02_02.jpg?v=1766712979" width="250">



>* Columns can also use hierarchical MultiIndex levels
>* Tuples label related measures, aiding complex datasets

>* Pivot and unstack can create MultiIndex columns
>* Column hierarchies encode relationships, guiding later transformations

>* Think in levels and labels when selecting
>* Reorder levels carefully to avoid unintended drops



In [None]:
#@title Python Code - MultiIndex Column Structures

# Demonstrate simple MultiIndex columns structure with small weather example.
# Show how pivot creates hierarchical column levels automatically.
# Show selecting one top level and one specific column pair.

import pandas as pd

# Create tidy weather data with city, time, and measurement type.
data = {
    "city": ["NYC", "NYC", "LA", "LA"],
    "time": ["morning", "evening", "morning", "evening"],
    "measurement": ["temp_F", "temp_F", "temp_F", "temp_F"],
    "value": [70, 60, 75, 65],
}

# Build DataFrame from dictionary and display original flat structure.
df = pd.DataFrame(data)
print("Original tidy DataFrame:")
print(df)

# Pivot time and measurement into columns creating MultiIndex columns.
wide = df.pivot(index="city", columns=["measurement", "time"], values="value")
print("\nDataFrame with MultiIndex columns:")
print(wide)

# Show column index levels and labels for better understanding.
print("\nColumn index levels:", wide.columns.names)
print("Column index labels:", list(wide.columns))

# Select all temperature columns across both times using top level label.
print("\nAll temp_F columns across times:")
print(wide["temp_F"])

# Select specific column pair using full MultiIndex tuple key.
print("\nSingle column temp_F, morning only:")
print(wide[("temp_F", "morning")])



### **2.3. MultiIndex Levels Access**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas (2.3.1) A-Z/Module_03/Lecture_B/image_02_03.jpg?v=1766713000" width="250">



>* Treat each index level as a dimension
>* Select levels directly for fast, clean analysis

>* Select and isolate index levels for focused analysis
>* Move index levels into columns for easier modeling

>* Rename and reorder index levels for clarity
>* Aggregate levels to switch between detailed and summary views



In [None]:
#@title Python Code - MultiIndex Levels Access

# Demonstrate accessing MultiIndex levels for focused data selection and reshaping.
# Show selecting by one level while keeping other levels available.
# Show moving index levels into columns for easier grouping or plotting.

import pandas as pd

# Create simple sales data with city, store, and product levels.
data = {
    "city": ["Boston", "Boston", "Dallas", "Dallas"],
    "store": ["North", "South", "East", "West"],
    "product": ["Soda", "Chips", "Soda", "Chips"],
    "sales_dollars": [120, 80, 150, 90],
}

# Build DataFrame and set a MultiIndex using city and store.
df = pd.DataFrame(data)

df = df.set_index(["city", "store"])

# Show the full MultiIndex DataFrame for reference understanding.
print("Full MultiIndex DataFrame:\n", df, "\n")

# Select all rows for one city level while keeping store and product.
city_slice = df.xs("Boston", level="city")

print("All Boston rows across stores:\n", city_slice, "\n")

# Move the inner index level back into columns for flexibility.
reset_df = df.reset_index(level="store")

print("Store level moved into columns:\n", reset_df, "\n")

# Rename index level and show result for clearer structure understanding.
renamed = df.rename_axis(index={"city": "region"})

print("Index level renamed to region:\n", renamed)



## **3. Stacking for Analysis**

### **3.1. Stack and Unstack Essentials**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas (2.3.1) A-Z/Module_03/Lecture_B/image_03_01.jpg?v=1766713075" width="250">



>* Stack moves columns into index, making data long
>* Unstack spreads index into columns for comparisons

>* Stacking condenses many product-month columns into index
>* Unstacking spreads index categories into comparison-friendly columns

>* Use stack or unstack as workflow choices
>* Stack for tidy analysis, unstack for comparisons



In [None]:
#@title Python Code - Stack and Unstack Essentials

# Demonstrate basic stack and unstack reshaping operations with simple sales data.
# Show how wide layout becomes long layout using stack operation in Pandas.
# Show how long layout becomes wide layout again using unstack operation.

import pandas as pd

# Create simple wide sales DataFrame with regions and monthly product sales.
data = {"Region": ["North", "South"], "Jan_Toys": [120, 90], "Jan_Tools": [80, 60]}
wide_df = pd.DataFrame(data)

# Set Region as index so stack can move remaining columns into a new index level.
wide_df = wide_df.set_index("Region")
print("Wide layout with products as columns:\n", wide_df, "\n")

# Stack columns into a single column, creating a longer layout with MultiIndex rows.
stacked = wide_df.stack()
print("Stacked layout with product labels in index:\n", stacked, "\n")

# Unstack the last index level, returning to a wide layout with separate columns.
unstacked = stacked.unstack()
print("Unstacked layout returning to wide format:\n", unstacked)



### **3.2. Handling Reshape Missing Values**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas (2.3.1) A-Z/Module_03/Lecture_B/image_03_02.jpg?v=1766713096" width="250">



>* Stacking changes layout and reveals hidden gaps
>* Decide if gaps mean no data or zeros

>* Unstacking creates new columns and missing cells
>* Misreading missingness as zero can distort conclusions

>* Match missing-value strategy to data meaning
>* Align imputation or flagging with downstream analysis



In [None]:
#@title Python Code - Handling Reshape Missing Values

# Demonstrate stacking reshapes and missing values handling clearly.
# Show how missing values appear after unstacking reshaped data.
# Compare leaving missing values versus filling them with zeros.

import pandas as pd

# Create simple monthly sales data with some missing region months.
data = {
    "month": ["Jan", "Jan", "Feb", "Mar"],
    "region": ["East", "West", "East", "West"],
    "sales_dollars": [100.0, 120.0, 90.0, None],
}

sales = pd.DataFrame(data)

# Show original wide style table before any reshaping operations.
print("Original sales table:")
print(sales)

# Set index then unstack to create wide format with regions as columns.
wide = sales.set_index(["month", "region"]).unstack("region")

print("\nWide table after unstack operation:")
print(wide)

# Fill missing values with zeros when missing means truly no sales.
wide_filled = wide.fillna(0.0)

print("\nWide table after filling missing values:")
print(wide_filled)

# Compare total sales before and after filling missing values.
print("\nTotal sales by region after filling:")
print(wide_filled.sum())



### **3.3. Grouped Reshape Workflows**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas (2.3.1) A-Z/Module_03/Lecture_B/image_03_03.jpg?v=1766713115" width="250">



>* Group data, summarize within meaningful dimensions
>* Reshape summaries for easier comparison and modeling

>* Choose groups as index levels or columns
>* Rotate diagnoses between rows and columns for analysis

>* Balance readable layouts with tool-specific data formats
>* Plan grouped reshapes as a reusable, flexible pipeline



In [None]:
#@title Python Code - Grouped Reshape Workflows

# Demonstrate grouped reshaping workflows using simple sales data example.
# Show grouping, aggregation, and reshaping for analysis and modeling tasks.
# Keep example beginner friendly and ready for Google Colab execution.

import pandas as pd

# Create small sales DataFrame with campaigns, weeks, and revenue values.
data = {
    "campaign": ["Email", "Email", "Search", "Search", "Social", "Social"],
    "week": ["2024-01", "2024-02", "2024-01", "2024-02", "2024-01", "2024-02"],
    "revenue_usd": [1200, 1500, 3000, 2800, 900, 1100],
}

sales = pd.DataFrame(data)

# Group by campaign and week, then compute total revenue per group.
grouped = sales.groupby(["campaign", "week"], as_index=False)["revenue_usd"].sum()

print("Long grouped layout, good for filtering and joins:")
print(grouped)

# Pivot to wide layout, campaigns as columns, weeks as index rows.
wide = grouped.pivot(index="week", columns="campaign", values="revenue_usd")

print("\nWide layout, good for models needing matrix features:")
print(wide)

# Stack wide layout back to long, preparing for visualization tools.
stacked = wide.stack().reset_index(name="revenue_usd")

print("\nStacked layout, tidy format for plotting libraries:")
print(stacked)



# <font color="#418FDE" size="6.5" uppercase>**Reshaping DataFrames**</font>


In this lecture, you learned to:
- Transform DataFrames between wide and long formats using pd.melt() and pivot or pivot_table in Pandas 2.3.1. 
- Use set_index(), reset_index(), stack(), and unstack() to work effectively with hierarchical indexes. 
- Choose appropriate reshaping strategies to prepare datasets for downstream visualization or modeling tasks. 

<font color='yellow'>Congratulations on completing this course!</font>