**Table of contents**<a id='toc0_'></a>    
- [Structuring Data with Pivot, Stack/Unstack, and Melt](#toc1_)    
    - [Pivot](#toc1_1_1_)    
    - [Melt](#toc1_1_2_)    
    - [Summary](#toc1_1_3_)    
    - [💡 Check for understanding](#toc1_1_4_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Structuring Data with Pivot, Stack/Unstack, and Melt](#toc0_)

These methods are useful for restructuring, aggregating, and reshaping data to better analyze and visualize it.

### <a id='toc1_1_1_'></a>[Pivot](#toc0_)

- Pivot is used to create a new derived table from another one. 
- Allows us to reshape a DataFrame based on column values.
- Converts unique values from one column into multiple columns.

![](https://github.com/data-bootcamp-v4/lessons/blob/main/img/pivot.png?raw=true)

In [None]:
import pandas as pd

# Load Titanic dataset from an online source
url = 'https://raw.githubusercontent.com/data-bootcamp-v4/data/main/worldstats.csv'
df = pd.read_csv(url)

In [None]:
# Check df

In [None]:
# Check country unique values

In [None]:
# Check year unique values

In [None]:
# Pivot the DataFrame to see the GDP based on the country and year
pivot_df = df.pivot_table(index='country', columns='year', values=['GDP'])

In [None]:
# If there were more than 1 GDP values per year & country, I would need to use aggfunc

### Stack and Unstack

In pandas, `stack()` and `unstack()` are two methods used to transform data between "wide" and "long" formats in a DataFrame.

- `stack()`: This method "stacks" the data, converting the **columns into rows**, and results in a multi-level index. It is useful when you have a DataFrame with multiple columns representing similar data, and you want to combine them into a single column.

- `unstack()`: This method does the opposite of `stack()`. It "unstacks" the data, converting the **index back into columns**, and results in a more "wide" format. It is useful when you have a DataFrame with multi-level index and you want to separate the levels into separate columns.


![](https://github.com/data-bootcamp-v4/lessons/blob/main/img/stack.png?raw=true)

In [None]:
# Create a multi-index DataFrame using set_index with 'country' and 'year' as the index columns

In [None]:
# Stack the DataFrame to convert columns into rows and create a Series

In [None]:
# Unstack the Series back into a DataFrame with the 'year' level as columns

### <a id='toc1_1_2_'></a>[Melt](#toc0_)

The melt() function in pandas is used to transform a DataFrame from a **wide format to a long format**, which is often more suitable for certain data analysis tasks. In the wide format, each row represents a unique observation, and each column represents a different variable. However, in the long format, multiple rows may represent the same observation, and a new column is introduced to distinguish between the different variables.

![](https://github.com/data-bootcamp-v4/lessons/blob/main/img/melt.png?raw=true)

In [72]:
# Melt the DataFrame, keeping 'country' and 'year' as identifier variables, and 'Population' and 'GDP' as value variables
melted_data = pd.melt(df, id_vars=['country', 'year'], value_vars=['Population', 'GDP'], var_name='Indicator', value_name='Value')
melted_data.head()

Unnamed: 0,country,year,Indicator,Value
0,Arab World,2015,Population,392022276.0
1,Arab World,2014,Population,384222592.0
2,Arab World,2013,Population,376504253.0
3,Arab World,2012,Population,368802611.0
4,Arab World,2011,Population,361031820.0


### <a id='toc1_1_3_'></a>[Summary](#toc0_)

- `pivot` is used to create a new derived table from an existing one by reshaping a DataFrame based on column values and converting unique values from one column into multiple columns.
- `stack` and `unstack` are used to transform data between "wide" and "long" formats.
  - `stack` converts columns into rows, leading to a multi-level index. It's useful when multiple columns represent similar data that you want to combine into a single column.
  - `unstack` does the opposite of `stack`, converting the index back into columns and leading to a more "wide" format. It's useful when a DataFrame has a multi-level index that you want to separate into different columns.
- `melt` transforms a DataFrame from a wide format to a long format. It's useful for certain data analysis tasks where each row represents a unique observation in the wide format, but in the long format, multiple rows represent the same observation, and a new column is introduced to distinguish between different variables.

### <a id='toc1_1_4_'></a>[💡 Check for understanding](#toc0_)

You are given a DataFrame with sales data for a company. The DataFrame contains information about the sales of various products in different regions. Create a summary of the total sales for each product in each region.


Dataset:

```python
import pandas as pd

data = {
    'Product': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'Region': ['North', 'North', 'North', 'South', 'South', 'South', 'East', 'East', 'East'],
    'Sales': [100, 150, 200, 120, 180, 240, 80, 110, 160]
}

df = pd.DataFrame(data)
```

Expected output:

```python
Region   East  North  South

Product                    

A          80    100    120

B         110    150    180

C         160    200    240
```

In [None]:
# Your code here