# **üìÖ Day 9 ‚Äì Reshaping DataFramesüêº**

#### **Goal:** Transform data between wide ‚Üî long formats using Pandas. 
#### **Topics To Cover:** pivot, melt, stack, unstack, wide ‚Üî long format transformations and tidy data principles.

----

## **Introduction üå±**

**Reshaping** is the process of transforming the structure of your DataFrame, such as changing the number of rows and columns, without altering the actual **data** itself. Think of it as rearranging the furniture in your room to make it more functional.

You'll often need to transform data between two primary formats:

* **Wide Format:** Each row represents a single, unique identifier, and the data for different attributes or time points are spread across multiple columns. This format is intuitive for human reading but can be difficult for plotting and some machine learning models.
    * **Example:** A DataFrame where each row is a student, and columns are `Q1_Score`, `Q2_Score`, `Q3_Score`.

* **Long Format:** Each row represents a single **observation**, and all attributes or time points are collected into one or more new columns. This format is often called "tidy data" and is ideal for plotting with libraries like Seaborn and for many data analysis tasks.
    * **Example:** A DataFrame where each row is a single student's score for a specific quarter, with columns like `Student_ID`, `Quarter`, and `Score`.

### **Importance for an AI/ML Student**

Data for machine learning models is almost always required in a specific format. Most models work best with long-format data, also known as tidy data.

- **Model Compatibility:** Many scikit-learn functions and other libraries require data to be in a tidy format where each variable is a column, each observation is a row, and each cell is a single value.

- **Data Analysis:** Tidy data makes it much easier to perform data analysis, aggregation, and visualization. For example, plotting time-series data is far simpler when dates and values are in separate, tidy columns.

### **The Tidy Data Principles**
The concept of tidy data, introduced by Hadley Wickham, provides a consistent way to think about data structure. A dataset is considered tidy if:

1. Each variable forms a column.

2. Each observation forms a row.

3. Each type of observational unit forms a table.

Understanding these principles is the key to mastering data reshaping.


***

# Let's Begin üöÄ

In [1]:
# Importing necessary libraries
import pandas as pd
import numpy as np

# Load the data
data = pd.read_csv(r'..\data\placement_pressure_pulse.csv')

# Create DataFrame
df = pd.DataFrame(data).head(10)
df

Unnamed: 0,Student_ID,Branch,Confidence_Level,Applications_Sent,Interview_Calls,Sleep_Hours,Anxiety_Level,Motivation_Level,Support_System,Placement_Status
0,STU001,E&TC,4,21,2,7.9,6,4,No,Not Placed
1,STU002,E&TC,7,49,3,7.4,7,8,Yes,Still Preparing
2,STU003,Statistics,8,57,6,5.9,7,9,Yes,Not Placed
3,STU004,Mechanical,3,35,10,4.3,5,3,Yes,Still Preparing
4,STU005,CSE,6,58,9,4.1,8,8,Yes,Still Preparing
5,STU006,Mechanical,10,18,5,6.6,7,9,Yes,Placed
6,STU007,IT,4,45,0,4.8,8,10,Yes,Not Placed
7,STU008,Mechanical,9,47,4,7.7,7,10,No,Not Placed
8,STU009,E&TC,8,46,10,5.2,9,4,Yes,Placed
9,STU010,Mechanical,4,25,0,7.9,5,5,Yes,Still Preparing


#### Key Reshaping Methods

| Method             | Purpose                                                      | Syntax                                                        |
|--------------------|--------------------------------------------------------------|---------------------------------------------------------------|
| `pd.pivot()`       | Reshapes data from long format to wide format.               | `df.pivot(index='id', columns='var', values='val')`           |
| `pd.melt()`        | Reshapes data from wide format to long format.               | `pd.melt(df, id_vars=['id'])`                                 |
| `df.stack()`       | Stacks the columns of a DataFrame into a multi-level index.  | `df.stack()`                                                  |
| `df.unstack()`     | Unstacks a level of the DataFrame's index to a column.       | `df.unstack()`                                                |
| `pd.wide_to_long()`| Converts wide-format data with stubnames into long format.   | `pd.wide_to_long(df, stubnames='val', i='id', j='time')`      |


***

#### **9.1 `.pivot()`: Reshaping from Long to Wide**

`df.pivot()` is a powerful method used to transform a "**long**" DataFrame into a "**wide**" one. It reshapes a table by taking unique values from one column and turning them into new column headers. The result is a summary table where each row represents a unique value from your `index` column, each column represents a unique value from your `columns` column, and the cells are filled with values from your `values` column.

**Key Parameters**

- index: The column whose unique values will form the new index of the wide DataFrame. This is what you want to pivot "on."

- columns: The column whose unique values will become the new column headers of the wide DataFrame.

- values: The column containing the numerical or categorical values that will fill the cells of the new DataFrame.

**Important to Note:**

`df.pivot()` **vs** `df.pivot_table()`: `df.pivot()` requires that each combination of `index` and `columns` values be unique. If your data has multiple entries for a single combination, pd.pivot() will raise an error. In such cases, you must use `df.pivot_table()`, which can handle duplicate entries by applying an aggregation function (like mean, sum, etc.).

**üëâ When to use `.pivot()`:** Use it when you are confident that the combination of your index and columns is unique.

In [2]:
# Let's practice it:
pivoted_df = df.pivot(index='Student_ID', columns='Placement_Status', values='Confidence_Level')
pivoted_df.head()

Placement_Status,Not Placed,Placed,Still Preparing
Student_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
STU001,4.0,,
STU002,,,7.0
STU003,8.0,,
STU004,,,3.0
STU005,,,6.0


---


#### **9.2 `pd.melt()`: Reshaping from Wide to Long**

`pd.melt()` is used to transform a **"wide"** DataFrame into a **"long"** one. It is the perfect tool for taking multiple columns and "un-pivoting" them into a single column of variable names and a single column of corresponding values. This transformation is a key step in creating **tidy data**.


**Key Parameters**

* **`id_vars`**: The column or columns that you want to keep as **identifier variables**. These columns will remain in their original form.
* **`value_vars`**: The column or columns you want to **un-pivot** or "melt." The names of these columns will become the values in the new `variable` column, and their data will go into the new `value` column.
* **`var_name`**: The name for the new column that will hold the names of the melted columns. Defaults to `variable`.
* **`value_name`**: The name for the new column that will hold the values of the melted columns. Defaults to `value`.


**Important to Note:**

`pd.melt()` is the primary method for making data "tidy." While `pd.pivot()` and `df.pivot_table()` transform data into a wide, summary view, `pd.melt()` transforms it back into a clean, long format that is ideal for plotting and many machine learning models.

**Using `pd.melt()` on main Dataframe:**

In [3]:
# If you want to unpivot main dataframe
melted_df = pd.melt(df, id_vars='Branch', var_name='Metric', value_name='Value')
melted_df

Unnamed: 0,Branch,Metric,Value
0,E&TC,Student_ID,STU001
1,E&TC,Student_ID,STU002
2,Statistics,Student_ID,STU003
3,Mechanical,Student_ID,STU004
4,CSE,Student_ID,STU005
...,...,...,...
85,Mechanical,Placement_Status,Placed
86,IT,Placement_Status,Not Placed
87,Mechanical,Placement_Status,Not Placed
88,E&TC,Placement_Status,Placed


In [4]:
# If you want to unpivot specific columns
melted_df = pd.melt(df, id_vars='Branch', value_vars='Confidence_Level', var_name='Metric', value_name='Value')
melted_df

Unnamed: 0,Branch,Metric,Value
0,E&TC,Confidence_Level,4
1,E&TC,Confidence_Level,7
2,Statistics,Confidence_Level,8
3,Mechanical,Confidence_Level,3
4,CSE,Confidence_Level,6
5,Mechanical,Confidence_Level,10
6,IT,Confidence_Level,4
7,Mechanical,Confidence_Level,9
8,E&TC,Confidence_Level,8
9,Mechanical,Confidence_Level,4


In [5]:
# If you want to melt main dataframe with multiple identifiers and specific columns to unpivot
melted_df = pd.melt(df, id_vars=['Student_ID', 'Branch'], value_vars=['Confidence_Level', 'Anxiety_Level'], var_name='Metric', value_name='Value')
melted_df

Unnamed: 0,Student_ID,Branch,Metric,Value
0,STU001,E&TC,Confidence_Level,4
1,STU002,E&TC,Confidence_Level,7
2,STU003,Statistics,Confidence_Level,8
3,STU004,Mechanical,Confidence_Level,3
4,STU005,CSE,Confidence_Level,6
5,STU006,Mechanical,Confidence_Level,10
6,STU007,IT,Confidence_Level,4
7,STU008,Mechanical,Confidence_Level,9
8,STU009,E&TC,Confidence_Level,8
9,STU010,Mechanical,Confidence_Level,4


**Using `pd.melt()` to unpivot the pivoted dataframe build using `.pivot_table()` & `.pivot()`**

In [6]:
# Let's practice it with .pivot():
pivoted_df = df.pivot(index='Student_ID', columns='Placement_Status', values='Confidence_Level')
pivoted_df
melted_df = pd.melt(pivoted_df.reset_index(), id_vars='Student_ID', var_name='Placement_Status', value_name='Confidence_Level')
melted_df

Unnamed: 0,Student_ID,Placement_Status,Confidence_Level
0,STU001,Not Placed,4.0
1,STU002,Not Placed,
2,STU003,Not Placed,8.0
3,STU004,Not Placed,
4,STU005,Not Placed,
5,STU006,Not Placed,
6,STU007,Not Placed,4.0
7,STU008,Not Placed,9.0
8,STU009,Not Placed,
9,STU010,Not Placed,


In [7]:
# Let's practice it with .pivot_table():
# Step 1: Create the pivoted DataFrame using pivot_table()
pivoted_df = df.pivot_table(index='Branch', columns='Placement_Status', values='Confidence_Level')

# Step 2: Reset the index to turn the 'Branch' index into a column
pivoted_df = pivoted_df.reset_index()

# # Step 3: Melt the pivoted DataFrame back to a long format
melted_df = pd.melt(pivoted_df, id_vars=['Branch'], var_name='Metric', value_name='Value')
melted_df

Unnamed: 0,Branch,Metric,Value
0,CSE,Not Placed,
1,E&TC,Not Placed,4.0
2,IT,Not Placed,4.0
3,Mechanical,Not Placed,9.0
4,Statistics,Not Placed,8.0
5,CSE,Placed,
6,E&TC,Placed,8.0
7,IT,Placed,
8,Mechanical,Placed,10.0
9,Statistics,Placed,


In [8]:
# If DataFrame contains multi-level columns:
pivoted_df = df.pivot_table(index='Branch', columns='Placement_Status', values=['Confidence_Level', 'Anxiety_Level'])

# Step 1. Reset the index to make 'Branch' a column
pivoted_df_reset = pivoted_df.reset_index()

# Step 2. Flatten the MultiIndex columns
pivoted_df_reset.columns = ['_'.join(col).strip() for col in pivoted_df_reset.columns.values]

# Step 3. Melt the flattened DataFrame
melted_df = pd.melt(pivoted_df_reset,
                    id_vars=['Branch_'], # Use 'Branch_' as the id_var
                    var_name='Metric',
                    value_name='Value')

# Let's clean up the 'Metric' column for better readability
melted_df[['Value_Type', 'Placement_Status']] = melted_df['Metric'].str.split('_', n=1, expand=True)
melted_df.drop('Metric', axis=1, inplace=True)
melted_df

Unnamed: 0,Branch_,Value,Value_Type,Placement_Status
0,CSE,,Anxiety,Level_Not Placed
1,E&TC,6.0,Anxiety,Level_Not Placed
2,IT,8.0,Anxiety,Level_Not Placed
3,Mechanical,7.0,Anxiety,Level_Not Placed
4,Statistics,7.0,Anxiety,Level_Not Placed
5,CSE,,Anxiety,Level_Placed
6,E&TC,9.0,Anxiety,Level_Placed
7,IT,,Anxiety,Level_Placed
8,Mechanical,7.0,Anxiety,Level_Placed
9,Statistics,,Anxiety,Level_Placed


***

#### **9.3 `df.stack()`: Reshaping a MultiIndex**

`df.stack()` is a powerful method used to **transform** or **pivot** columns from a DataFrame with a **MultiIndex** into a new inner-most row index. Essentially, it rotates the column labels into row labels. This is particularly useful for converting data from a wide format to a long one, making it easier for analysis and visualization.


**How It Works**

`df.stack()` works on one or more levels of a MultiIndex in the columns. It moves the specified column level(s) to become the inner-most level(s) of the row index. This is the **inverse** of the `unstack()` method.

**Example:**
Imagine you have a DataFrame where the columns represent countries and years. `df.stack()` can take the years and stack them, turning them into a new level of the row index. The resulting DataFrame will have two row index levels (e.g., `['Country', 'Year']`).


**Key Parameters**

  * **`level`**: (int, string, or list) Specifies the level(s) of the column MultiIndex to stack. By default, it stacks the inner-most level. You can use an integer (e.g., `1`), a string (e.g., `'Year'`), or a list of integers/strings.
  * **`dropna`**: (boolean) If `True` (default), rows with `NaN` values will be dropped. If `False`, `NaN` values are kept.


**Important to Note:**

  * `df.stack()` is specifically designed for DataFrames with a **MultiIndex** in the columns. If you try to use it on a DataFrame with a single-level column index, it will not perform any stacking.
  * It's the most straightforward method for reshaping a wide-to-long format when your data has a hierarchical column structure, such as that created by a `pivot_table()` with multiple `values` columns.

**Detect the Level of Index**

1. **Check for a MultiIndex:** The simplest way to check if your DataFrame has a MultiIndex is to use `isinstance()`.

In [9]:
# Check the columns Multi-index or not
print(isinstance(df.columns, pd.MultiIndex))

# Check the index Multi-index or not
print(isinstance(df.index, pd.MultiIndex))

False
False


2. **View the Levels:** If it is a MultiIndex, you can inspect the levels directly.

In [10]:
# To view the names of the levels in the columns
df.columns.names

# To view the names of the levels in the row index
df.index.names

# To view the total number of levels
df.columns.nlevels
df.index.nlevels

1

##### **Let's Practice**

In [11]:
# Single level index
df.stack(level=0)
df.set_index('Student_ID').stack(level=0)
df.set_index(['Student_ID', 'Branch']).stack(level=0)

Student_ID  Branch                       
STU001      E&TC        Confidence_Level                   4
                        Applications_Sent                 21
                        Interview_Calls                    2
                        Sleep_Hours                      7.9
                        Anxiety_Level                      6
                                                  ...       
STU010      Mechanical  Sleep_Hours                      7.9
                        Anxiety_Level                      5
                        Motivation_Level                   5
                        Support_System                   Yes
                        Placement_Status     Still Preparing
Length: 80, dtype: object

In [12]:
# multi-level index
pivoted_df = df.pivot_table(index='Branch', columns='Placement_Status', values=['Confidence_Level', 'Anxiety_Level'])
# Check columns Multi-index or not
print(isinstance(pivoted_df.columns, pd.MultiIndex))
print(pivoted_df.columns.names)
print(pivoted_df.columns.nlevels)

# important when using .stack() put future_stack=True to escape FutureWarning
stacked = pivoted_df.stack(future_stack=True) # now it is not multi-index
# print(isinstance(stacked.columns, pd.MultiIndex))
# print(stacked.columns.names)
# print(stacked.columns.nlevels)
stacked

True
[None, 'Placement_Status']
2


Unnamed: 0_level_0,Unnamed: 1_level_0,Anxiety_Level,Confidence_Level
Branch,Placement_Status,Unnamed: 2_level_1,Unnamed: 3_level_1
CSE,Not Placed,,
CSE,Placed,,
CSE,Still Preparing,8.0,6.0
E&TC,Not Placed,6.0,4.0
E&TC,Placed,9.0,8.0
E&TC,Still Preparing,7.0,7.0
IT,Not Placed,8.0,4.0
IT,Placed,,
IT,Still Preparing,,
Mechanical,Not Placed,7.0,9.0


In [13]:
pivoted_df = df.pivot_table(index='Branch', columns=['Placement_Status', 'Support_System'], values=['Confidence_Level', 'Interview_Calls', 'Sleep_Hours'])

# Check columns Multi-index or not
print(isinstance(pivoted_df.columns, pd.MultiIndex))
print(pivoted_df.columns.names)
print(pivoted_df.columns.nlevels)

pivoted_df

stacked = pivoted_df.stack(level=1, future_stack=True)
stacked

True
[None, 'Placement_Status', 'Support_System']
3


Unnamed: 0_level_0,Unnamed: 1_level_0,Confidence_Level,Confidence_Level,Interview_Calls,Interview_Calls,Sleep_Hours,Sleep_Hours
Unnamed: 0_level_1,Support_System,No,Yes,No,Yes,No,Yes
Branch,Placement_Status,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
CSE,Not Placed,,,,,,
CSE,Placed,,,,,,
CSE,Still Preparing,,6.0,,9.0,,4.1
E&TC,Not Placed,4.0,,2.0,,7.9,
E&TC,Placed,,8.0,,10.0,,5.2
E&TC,Still Preparing,,7.0,,3.0,,7.4
IT,Not Placed,,4.0,,0.0,,4.8
IT,Placed,,,,,,
IT,Still Preparing,,,,,,
Mechanical,Not Placed,9.0,,4.0,,7.7,


In [14]:
pivoted_df = df.pivot_table(index='Student_ID', columns=['Placement_Status', 'Branch'], values=['Confidence_Level', 'Interview_Calls', 'Sleep_Hours'])

# Check columns Multi-index or not
print(isinstance(pivoted_df.columns, pd.MultiIndex))
print(pivoted_df.columns.names)
print(pivoted_df.columns.nlevels)

pivoted_df

stacked = pivoted_df.stack(level=2, future_stack=True)
stacked

True
[None, 'Placement_Status', 'Branch']
3


Unnamed: 0_level_0,Unnamed: 1_level_0,Confidence_Level,Confidence_Level,Confidence_Level,Interview_Calls,Interview_Calls,Interview_Calls,Sleep_Hours,Sleep_Hours,Sleep_Hours
Unnamed: 0_level_1,Placement_Status,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing
Student_ID,Branch,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
STU001,E&TC,4.0,,,2.0,,,7.9,,
STU001,IT,,,,,,,,,
STU001,Mechanical,,,,,,,,,
STU001,Statistics,,,,,,,,,
STU001,CSE,,,,,,,,,
STU002,E&TC,,,7.0,,,3.0,,,7.4
STU002,IT,,,,,,,,,
STU002,Mechanical,,,,,,,,,
STU002,Statistics,,,,,,,,,
STU002,CSE,,,,,,,,,


***

#### **9.4 `df.unstack()`: Reshaping from Long to Wide**

`df.unstack()` is used to **transform** or **pivot** a DataFrame's inner-most row index level into a new column axis. It's the **inverse** of the `stack()` method and is a go-to tool for converting data from a long format to a wide one. 


**How It Works**

`df.unstack()` takes a level from a **MultiIndex** in the rows and rotates it to become a new level in the column index. This is a common operation after melting or stacking data to make it more readable for certain types of analysis, such as comparing values across different categories in a single row.

**Example:**
Imagine a long DataFrame with a MultiIndex for rows like `('Country', 'Year')`. `df.unstack()` can take the `Year` level and "unstack" it, turning each year into a new column. The resulting DataFrame will have a single row index (`'Country'`) and a MultiIndex for columns (`'Year'`, `'Metric'`).


**Key Parameters**

* **`level`**: (int, string, or list) Specifies the level(s) of the row MultiIndex to unstack. By default, it unstacks the inner-most level. You can use an integer (e.g., `1`), a string (e.g., `'Year'`), or a list of integers/strings.
* **`fill_value`**: (scalar, optional) When unstacking creates new combinations of row and column indices that have no corresponding data, the `fill_value` is used to fill in those `NaN` entries. Defaults to `NaN`.


**Important to Note:**

* `df.unstack()` is specifically designed for DataFrames with a **MultiIndex** in the rows. If you try to use it on a DataFrame with a single-level row index, it will result in a `ValueError`.
* It's the ideal method for moving data from a long to a wide format when your data has a hierarchical row structure. It's also the perfect complement to `df.stack()` and `pd.melt()`.

##### **Let's Practice**

In [15]:
# Single level index
df.unstack(level=0)
df.set_index('Student_ID').unstack(level=0)
df.set_index(['Student_ID', 'Branch']).unstack()

Unnamed: 0_level_0,Confidence_Level,Confidence_Level,Confidence_Level,Confidence_Level,Confidence_Level,Applications_Sent,Applications_Sent,Applications_Sent,Applications_Sent,Applications_Sent,...,Support_System,Support_System,Support_System,Support_System,Support_System,Placement_Status,Placement_Status,Placement_Status,Placement_Status,Placement_Status
Branch,CSE,E&TC,IT,Mechanical,Statistics,CSE,E&TC,IT,Mechanical,Statistics,...,CSE,E&TC,IT,Mechanical,Statistics,CSE,E&TC,IT,Mechanical,Statistics
Student_ID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
STU001,,4.0,,,,,21.0,,,,...,,No,,,,,Not Placed,,,
STU002,,7.0,,,,,49.0,,,,...,,Yes,,,,,Still Preparing,,,
STU003,,,,,8.0,,,,,57.0,...,,,,,Yes,,,,,Not Placed
STU004,,,,3.0,,,,,35.0,,...,,,,Yes,,,,,Still Preparing,
STU005,6.0,,,,,58.0,,,,,...,Yes,,,,,Still Preparing,,,,
STU006,,,,10.0,,,,,18.0,,...,,,,Yes,,,,,Placed,
STU007,,,4.0,,,,,45.0,,,...,,,Yes,,,,,Not Placed,,
STU008,,,,9.0,,,,,47.0,,...,,,,No,,,,,Not Placed,
STU009,,8.0,,,,,46.0,,,,...,,Yes,,,,,Placed,,,
STU010,,,,4.0,,,,,25.0,,...,,,,Yes,,,,,Still Preparing,


In [16]:
# multi-level index
pivoted_df = df.pivot_table(index='Branch', columns='Placement_Status', values=['Confidence_Level', 'Anxiety_Level'])
# Check row Multi-index or not
print(isinstance(pivoted_df.index, pd.MultiIndex))
print(pivoted_df.index.names)
print(pivoted_df.index.nlevels)

unstacked = pivoted_df.unstack() # now it is multi-index
print(isinstance(stacked.index, pd.MultiIndex))
print(stacked.index.names)
print(stacked.index.nlevels)
unstacked

False
['Branch']
1
True
['Student_ID', 'Branch']
2


                  Placement_Status  Branch    
Anxiety_Level     Not Placed        CSE            NaN
                                    E&TC           6.0
                                    IT             8.0
                                    Mechanical     7.0
                                    Statistics     7.0
                  Placed            CSE            NaN
                                    E&TC           9.0
                                    IT             NaN
                                    Mechanical     7.0
                                    Statistics     NaN
                  Still Preparing   CSE            8.0
                                    E&TC           7.0
                                    IT             NaN
                                    Mechanical     5.0
                                    Statistics     NaN
Confidence_Level  Not Placed        CSE            NaN
                                    E&TC           4.0
                  

In [17]:
pivoted_df = df.pivot_table(index=['Branch', 'Placement_Status'], columns='Support_System', values=['Confidence_Level', 'Interview_Calls', 'Sleep_Hours'])

# Check columns Multi-index or not
print(isinstance(pivoted_df.index, pd.MultiIndex))
print(pivoted_df.index.names)
print(pivoted_df.index.nlevels)

pivoted_df

unstacked = pivoted_df.unstack(level=1)
unstacked

True
['Branch', 'Placement_Status']
2


Unnamed: 0_level_0,Confidence_Level,Confidence_Level,Confidence_Level,Confidence_Level,Confidence_Level,Confidence_Level,Interview_Calls,Interview_Calls,Interview_Calls,Interview_Calls,Interview_Calls,Interview_Calls,Sleep_Hours,Sleep_Hours,Sleep_Hours,Sleep_Hours,Sleep_Hours,Sleep_Hours
Support_System,No,No,No,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes
Placement_Status,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing
Branch,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3
CSE,,,,,,6.0,,,,,,9.0,,,,,,4.1
E&TC,4.0,,,,8.0,7.0,2.0,,,,10.0,3.0,7.9,,,,5.2,7.4
IT,,,,4.0,,,,,,0.0,,,,,,4.8,,
Mechanical,9.0,,,,10.0,3.5,4.0,,,,5.0,5.0,7.7,,,,6.6,6.1
Statistics,,,,8.0,,,,,,6.0,,,,,,5.9,,


In [18]:
pivoted_df = df.pivot_table(index=['Student_ID', 'Placement_Status', 'Branch'], columns='Support_System', values=['Confidence_Level', 'Interview_Calls', 'Sleep_Hours'])

# Check columns Multi-index or not
print(isinstance(pivoted_df.index, pd.MultiIndex))
print(pivoted_df.index.names)
print(pivoted_df.index.nlevels)

pivoted_df

unstacked = pivoted_df.unstack(level=1)
unstacked

True
['Student_ID', 'Placement_Status', 'Branch']
3


Unnamed: 0_level_0,Unnamed: 1_level_0,Confidence_Level,Confidence_Level,Confidence_Level,Confidence_Level,Confidence_Level,Confidence_Level,Interview_Calls,Interview_Calls,Interview_Calls,Interview_Calls,Interview_Calls,Interview_Calls,Sleep_Hours,Sleep_Hours,Sleep_Hours,Sleep_Hours,Sleep_Hours,Sleep_Hours
Unnamed: 0_level_1,Support_System,No,No,No,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes
Unnamed: 0_level_2,Placement_Status,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing,Not Placed,Placed,Still Preparing
Student_ID,Branch,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3
STU001,E&TC,4.0,,,,,,2.0,,,,,,7.9,,,,,
STU002,E&TC,,,,,,7.0,,,,,,3.0,,,,,,7.4
STU003,Statistics,,,,8.0,,,,,,6.0,,,,,,5.9,,
STU004,Mechanical,,,,,,3.0,,,,,,10.0,,,,,,4.3
STU005,CSE,,,,,,6.0,,,,,,9.0,,,,,,4.1
STU006,Mechanical,,,,,10.0,,,,,,5.0,,,,,,6.6,
STU007,IT,,,,4.0,,,,,,0.0,,,,,,4.8,,
STU008,Mechanical,9.0,,,,,,4.0,,,,,,7.7,,,,,
STU009,E&TC,,,,,8.0,,,,,,10.0,,,,,,5.2,
STU010,Mechanical,,,,,,4.0,,,,,,0.0,,,,,,7.9


***

#### **9.5 `pd.wide_to_long()`: Advanced Reshaping**

`pd.wide_to_long()` is a specialized function used for a very specific type of reshaping. Unlike `melt()`, which is a general-purpose tool, `wide_to_long()` is designed to handle DataFrames where columns are named with a consistent pattern, such as `column_name.suffix`. It's particularly useful for converting "wide" panel data (e.g., time-series data where each year is a column) into a "long" format. 



**How It Works**

`wide_to_long()` identifies columns based on a shared "stub" (the prefix) and a "suffix" (the part that varies, often a number or time period). It then combines the data from these columns into a single new column, with the `suffix` becoming a new variable column.

**Example:**
If you have columns like `Applications_Sent_Q1`, `Applications_Sent_Q2`, etc., `wide_to_long()` can intelligently reshape this. It would use `"Applications_Sent_"` as the `stubnames` and `Q1`, `Q2` as the `i` or `j` identifiers, transforming the DataFrame to a long format.



**Key Parameters**

* **`df`**: The DataFrame you want to reshape.
* **`stubnames`**: (string or list) The common prefix of the columns you want to reshape. For example, in `Applications_Sent_Q1`, the `stubnames` would be `'Applications_Sent'`.
* **`i`**: (string or list) The column(s) you want to use as identifier variables. These will remain in the long DataFrame.
* **`j`**: (string) The name for the new column that will contain the suffix part of the original column names.
* **`sep`**: (string) The separator between the stubname and the suffix. Defaults to `''` (empty string). You can specify `_` if your columns are separated by an underscore.
* **`suffix`**: (string or regex) The pattern that describes the suffix. The default is `'\d+'`, which matches one or more digits.



**Important to Note:**

* `wide_to_long()` is less flexible than `melt()` but is highly efficient for its specific use case. It assumes a structured and consistent naming convention in your column headers.
* It requires that all `stubnames` exist in the DataFrame for all `j` values. If a combination is missing, it will fill with `NaN`.
* The `j` parameter is for the name of the new variable column that will hold the suffixes, while the `i` parameter is for the columns that you want to keep as identifiers.

##### **Let's Practice:**

In [19]:
# Create the DataFrame
data = {
    'Student_ID': ['STU001', 'STU002', 'STU003', 'STU004', 'STU005', 'STU006', 'STU007', 'STU008', 'STU009', 'STU010'],
    'Branch': ['E&TC', 'E&TC', 'Statistics', 'Mechanical', 'CSE', 'Mechanical', 'IT', 'Mechanical', 'E&TC', 'Mechanical'],
    'Applications_Sent_Q1_2024': [12, 14, 15, 13, 16, 11, 10, 15, 14, 12],
    'Applications_Sent_Q2_2024': [15, 18, 17, 16, 19, 14, 13, 18, 17, 15],
    'Applications_Sent_Q3_2024': [18, 20, 21, 19, 22, 17, 16, 21, 20, 18],
    'Interviews_Q1_2024': [2, 3, 4, 3, 5, 2, 1, 4, 3, 2],
    'Interviews_Q2_2024': [3, 4, 5, 4, 6, 3, 2, 5, 4, 3],
    'Interviews_Q3_2024': [4, 5, 6, 5, 7, 4, 3, 6, 5, 4]
}

df_wide = pd.DataFrame(data)
df_wide

Unnamed: 0,Student_ID,Branch,Applications_Sent_Q1_2024,Applications_Sent_Q2_2024,Applications_Sent_Q3_2024,Interviews_Q1_2024,Interviews_Q2_2024,Interviews_Q3_2024
0,STU001,E&TC,12,15,18,2,3,4
1,STU002,E&TC,14,18,20,3,4,5
2,STU003,Statistics,15,17,21,4,5,6
3,STU004,Mechanical,13,16,19,3,4,5
4,STU005,CSE,16,19,22,5,6,7
5,STU006,Mechanical,11,14,17,2,3,4
6,STU007,IT,10,13,16,1,2,3
7,STU008,Mechanical,15,18,21,4,5,6
8,STU009,E&TC,14,17,20,3,4,5
9,STU010,Mechanical,12,15,18,2,3,4


In [20]:
# Unpivot the DataFrame using wide_to_long() with a corrected suffix
long_df = pd.wide_to_long(df_wide,
                          stubnames=['Applications_Sent', 'Interviews'],
                          i=['Student_ID', 'Branch'],
                          j='Quarter_Year',
                          sep='_',
                          suffix='(Q\\d_\\d{4})')

# Reset the index to make the new 'Quarter_Year' column a regular column
long_df = long_df.reset_index()

long_df

Unnamed: 0,Student_ID,Branch,Quarter_Year,Applications_Sent,Interviews
0,STU001,E&TC,Q1_2024,12,2
1,STU001,E&TC,Q2_2024,15,3
2,STU001,E&TC,Q3_2024,18,4
3,STU002,E&TC,Q1_2024,14,3
4,STU002,E&TC,Q2_2024,18,4
5,STU002,E&TC,Q3_2024,20,5
6,STU003,Statistics,Q1_2024,15,4
7,STU003,Statistics,Q2_2024,17,5
8,STU003,Statistics,Q3_2024,21,6
9,STU004,Mechanical,Q1_2024,13,3


***

## **Summary & Key Takeaways üöÄ**

* **Reshaping is not a one-size-fits-all process.** Pandas provides several tools (`melt`, `pivot`, `pivot_table`, `stack`, `unstack`, `wide_to_long`), each with a specific purpose.
* **Tidy data is key.** Transforming data into a long, tidy format is essential for plotting and many data analysis techniques. The `melt()` and `stack()` methods are your primary tools for this.
* **MultiIndex matters.** `stack()` and `unstack()` are specifically designed for DataFrames with hierarchical indexes, while `melt()` and `pivot()` work on single-level indexes and columns.
* **Choose the right tool.** `pivot_table()` is your go-to for complex pivoting with aggregation, while `pivot()` is for simpler, non-aggregating pivots. `wide_to_long()` is a special-purpose method for very consistently-named columns.

***

### **The Common Confusion: `pivot()` vs. `pivot_table()`** üòï

This is one of the most frequent points of confusion for new pandas users. The core difference lies in their functionality and what they can handle.

* **`df.pivot()`**: This is a simpler, more restrictive method. It can only be used when the combinations of the `index` and `columns` values are **unique**. It cannot handle duplicates or perform any kind of aggregation.

* **`df.pivot_table()`**: This is the more versatile and robust method. It can handle **duplicate entries** by using an `aggfunc` (aggregation function) to combine them (e.g., `mean`, `sum`, `count`). It is the primary tool for creating summary tables and can also create a MultiIndex in its columns.

***

### **The Common Confusion: `stack()` vs. `unstack()`** üîÑ

These two methods are direct opposites and are designed for working with a **MultiIndex**.

* **`df.stack()`**: Pivots a specified level of the **columns** into a new **row** level. It transforms a DataFrame from a wide format to a long one.
* **`df.unstack()`**: Pivots a specified level of the **rows** into a new **column** level. It transforms a DataFrame from a long format to a wide one.

***

### **Rule of Thumb** ‚úÖ

To keep it simple:

* Use **`pd.melt()`** to go from wide to long, especially when columns are not part of a MultiIndex.
* Use **`df.stack()`** to go from wide to long when your DataFrame has a MultiIndex in its columns.
* Use **`df.unstack()`** to go from long to wide when your DataFrame has a MultiIndex in its rows.
* Use **`pd.wide_to_long()`** for a very specific use case: when column names follow a consistent pattern that you can describe with `stubnames` and a `suffix`.
* Use **`df.pivot_table()`** when you need to create a summary table and may need to perform an aggregation on your data.
* Use **`df.pivot()`** when you have a clean, wide dataset and want to quickly re-index it without any aggregation.