# **`Data Science Learners Hub`**

**Module : Python**

**email** : [datasciencelearnershub@gmail.com](mailto:datasciencelearnershub@gmail.com)

## **`#4: Advanced Data Manipulation`**
10. **Merging and Concatenating DataFrames**
    - Combining DataFrames
    - Concatenation and merging operations

11. **Reshaping Data**
    - Pivoting and melting
    - Stacking and unstacking

12. **Time Series Data**
    - Handling time and date data
    - Resampling and frequency conversion

### **`11. Reshaping Data`**

#### **`Pivoting and Melting in Pandas for Data Reshaping`**

#### Pivoting in Pandas:

Pivoting involves reshaping data to rearrange or reshape the structure of the DataFrame, typically by changing the layout of data in the columns.

In [1]:
import pandas as pd

# Sample DataFrame
data = {'Date': ['2022-01-01', '2022-01-01', '2022-01-02', '2022-01-02'],
        'Category': ['A', 'B', 'A', 'B'],
        'Value': [10, 20, 30, 40]}

df = pd.DataFrame(data)

# Pivoting DataFrame
pivot_result = df.pivot(index='Date', columns='Category', values='Value')

# Displaying the pivoted DataFrame
print("Pivoted DataFrame:")
print(pivot_result)

# Result - Rows with the same 'Date' are combined, and 'Category' values become separate columns.

Pivoted DataFrame:
Category     A   B
Date              
2022-01-01  10  20
2022-01-02  30  40


#### Melting in Pandas:

Melting involves transforming a DataFrame from wide format to long format, unpivoting it.

In [2]:
# Melting DataFrame
melted_result = pd.melt(df, id_vars='Date', value_vars='Value', var_name='Category', value_name='Value')

# Displaying the melted DataFrame
print("\nMelted DataFrame:")
print(melted_result)

# Result:
# Columns 'A' and 'B' from the previous DataFrame become rows, with a new 'Category' column.


Melted DataFrame:
         Date Category  Value
0  2022-01-01    Value     10
1  2022-01-01    Value     20
2  2022-01-02    Value     30
3  2022-01-02    Value     40


  melted_result = pd.melt(df, id_vars='Date', value_vars='Value', var_name='Category', value_name='Value')


***Explanation:***

The above code demonstrates how to use the `pd.melt` function in pandas to transform (melt) a DataFrame from a wide format to a long format. Melting a DataFrame is a process where columns are unpivoted and turned into rows, resulting in a more compact representation of the data.

Here's a breakdown of the code:

1. **Melting DataFrame:**
   ```python
   melted_result = pd.melt(df, id_vars='Date', value_vars='Value', var_name='Category', value_name='Value')
   ```
   - `df`: The original DataFrame to be melted.
   - `id_vars='Date'`: The column(s) to be retained as identifier variables. In this case, the 'Date' column is kept fixed.
   - `value_vars='Value'`: The column(s) to be melted. Here, the 'Value' column is melted.
   - `var_name='Category'`: The name to be assigned to the new column that will contain the melted variable names. It will be named 'Category'.
   - `value_name='Value'`: The name to be assigned to the new column that will contain the melted values. It will be named 'Value'.
   - The result (`melted_result`) is a new DataFrame in long format.

2. **Displaying the Result:**
   ```python
   print("\nMelted DataFrame:")
   print(melted_result)
   ```
   The melted DataFrame is displayed.

3. **Result:**
   The resulting DataFrame will have three columns: 'Date', 'Category', and 'Value'. The 'Category' column will contain the melted column names (previously 'Value'), and the 'Value' column will contain the corresponding values. Each row in the original DataFrame corresponds to multiple rows in the melted DataFrame, where each melted row represents a unique combination of 'Date' and 'Category' with the corresponding 'Value'.

In summary, the `pd.melt` function is used to transform a DataFrame from wide to long format, making it more suitable for certain types of analyses or visualizations. The example provided suggests that columns 'A' and 'B' from the original DataFrame have been melted into rows with a new 'Category' column.

#### Applications:

- **Pivoting:**
  - Convert data for better presentation or visualization.
  - Facilitate analysis by organizing data for specific requirements.

- **Melting:**
  - Convert aggregated or summarized data into a long format.
  - Prepare data for specific analyses or visualizations.

#### Use Cases:

1. **Pivoting Example:**
   - Convert sales data with columns for each product category into a format where each row represents a sale with product category and quantity.

2. **Melting Example:**
   - Transform a DataFrame with a multi-level column index into a long format for easier analysis.

#### Considerations:

- **Unique Index Values:**
  - Ensure that the combination of index and columns in a pivoted DataFrame results in unique index values.

- **Melting Wide Data:**
  - Specify columns to be preserved as identifier variables and those to be melted.

#### Tips:

- **Multi-level Index:**
  - When pivoting, use `reset_index()` if the DataFrame has a multi-level index.

- **Handling NaN Values:**
  - Check for NaN values after pivoting, especially if using hierarchical indexing.

Pivoting and melting are powerful tools in reshaping data to meet specific analysis or visualization needs. Mastering these operations allows for efficient manipulation and exploration of diverse datasets in Pandas.


#### **`Stacking and Unstacking in Pandas for Hierarchical Index Reshaping`**

#### Stacking and Unstacking Concepts:

In Pandas, stacking and unstacking are operations used to manipulate DataFrames with hierarchical index structures, particularly those with multi-level indices.

#### Stacking:

Stacking involves "compressing" a level in the DataFrame's columns to produce a new level in the index.

In [3]:
import pandas as pd

# Sample DataFrame with Multi-level Index
arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Letter', 'Number'))

df = pd.DataFrame({'Value': [10, 20, 30, 40]}, index=index)

# Stacking DataFrame
stacked_result = df.stack()

# Displaying the stacked DataFrame
print("Stacked DataFrame:")
print(stacked_result)

# Result:
# The DataFrame is compressed, and a new level is created in the index.

Stacked DataFrame:
Letter  Number       
A       1       Value    10
        2       Value    20
B       1       Value    30
        2       Value    40
dtype: int64


#### Unstacking:

Unstacking is the inverse operation of stacking. It involves "expanding" a level in the DataFrame's index to produce a new level in the columns.

In [4]:
# Unstacking DataFrame
unstacked_result = df.unstack()

# Displaying the unstacked DataFrame
print("\nUnstacked DataFrame:")
print(unstacked_result)

# Result : The DataFrame is expanded, and a new level is created in the columns.


Unstacked DataFrame:
       Value    
Number     1   2
Letter          
A         10  20
B         30  40


#### Applications:

- **Stacking:**
  - Transform a DataFrame with a multi-level column index into a long format.
  - Facilitate analysis or visualization requiring a simpler column structure.

- **Unstacking:**
  - Convert data with a multi-level index into a wide format for better presentation.
  - Facilitate analysis by organizing data in a way that simplifies access to information.

#### Use Cases:

1. **Stacking Example:**
   - Convert sales data with a multi-level column index (products, regions) into a long format for easy analysis.

2. **Unstacking Example:**
   - Transform a DataFrame with a multi-level index representing time series data into a wide format with columns for each time point.

#### Considerations:

- **Level Selection:**
  - Specify the level to be stacked or unstacked.

- **NaN Values:**
  - Check for NaN values after unstacking, especially if the original DataFrame had missing data.

#### Tips:

- **Multiple Levels:**
  - Stack or unstack multiple levels by passing a list of level names or level numbers.

- **Naming Levels:**
  - Assign names to levels for clarity using the `names` parameter in `MultiIndex`.

Stacking and unstacking are essential operations for reshaping hierarchical index structures in Pandas. Understanding when and how to use these operations allows for efficient manipulation and exploration of multi-level index DataFrames.
