#### Reshaping Review 

In [24]:
import pandas as pd
import numpy as np

# Sample Data


data = {'Date': ['2022-01-01', '2022-01-01', '2022-01-02', '2022-01-02'],
        'City': ['A', 'B', 'A', 'B'],
        'Temperature': [25, 30, 22, 28],
        'Humidity': [50, 40, 60, 55]}

df = pd.DataFrame(data)
df

Unnamed: 0,Date,City,Temperature,Humidity
0,2022-01-01,A,25,50
1,2022-01-01,B,30,40
2,2022-01-02,A,22,60
3,2022-01-02,B,28,55


### Pivot 

The pivot function  reshapes data by pivoting the values of one column into multiple columns. It's particularly useful when you have long-form data (e.g., data in a tidy format with rows for each observation) and want to convert it into wide-form data with a column for each unique value in another column.

In [3]:
# Pivot the data
pivot_df = df.pivot(index='Date', columns='City', values='Temperature')
pivot_df

City,A,B
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-01-01,25,30
2022-01-02,22,28


## Melt

The melt function doe the opposite it transforms or reshape wide-form data into long-form data. It  "melts" the DataFrame, converting columns into rows. This is useful when you have data in a wide format, and you want to gather or unpivot specific columns.

In [28]:
# Melted the data
melted_df_single_var = pd.melt(df, id_vars=['Date', 'City'], var_name='Variable', value_name='Value')
melted_df_single_var

# id_vars[tuple, list, or ndarray] : Column(s) to use as identifier variables.
# value_vars[tuple, list, or ndarrayl]: Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
# var_name[scalar]: Name to use for the ‘variable’ column. 
# value_name[scalar, default ‘value’]: Name to use for the ‘value’ column.
# col_level[int or string, optional]: If columns are a MultiIndex then use this level to melt.

Unnamed: 0,Date,City,Variable,Value
0,2023-01-01,New York,Temperature,32.0
1,2023-01-01,Los Angeles,Temperature,75.0
2,2023-01-02,New York,Temperature,30.0
3,2023-01-02,Los Angeles,Temperature,77.0
4,2023-01-03,New York,Temperature,
5,2023-01-03,Los Angeles,Temperature,76.0
6,2023-01-03,Chicago,Temperature,


This will result in a DataFrame where the ‘Temperature_A’ and ‘Temperature_B’ columns are melted into two columns, ‘City’ and ‘Temperature’

## Stack and Unstack

These functions are used for pivoting a level of the column labels.

In [6]:
# Stack - pivot the innermost column index to become the innermost row index
stacked_df = df.set_index(['Date', 'City']).stack()
stacked_df

Date        City             
2022-01-01  A     Temperature    25
                  Humidity       50
            B     Temperature    30
                  Humidity       40
2022-01-02  A     Temperature    22
                  Humidity       60
            B     Temperature    28
                  Humidity       55
dtype: int64

In [7]:
# Unstack - pivot the innermost row index to become the innermost column index
unstacked_df = stacked_df.unstack()
unstacked_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Temperature,Humidity
Date,City,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-01-01,A,25,50
2022-01-01,B,30,40
2022-01-02,A,22,60
2022-01-02,B,28,55


## Pivot Table
More advanced version of pivot that allows you to aggregate values based on some criteria.

In [25]:
# Pivot Table
pivot_table_df = df.pivot_table(index='Date', columns='City', values=['Temperature', 'Humidity'], aggfunc='mean')
pivot_table_df

# index: the column to use as row labels
# columns: the column that will be reshaped as columns
# values: the column(s) to use for the new DataFrame's values
# aggfunc: the function to use for aggregation, defaulting to 'mean'
# fill_value: value to replace missing values
# dropna: whether to exclude the columns whose entries are all NaN

Unnamed: 0_level_0,Humidity,Humidity,Temperature,Temperature
City,A,B,A,B
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2022-01-01,50.0,40.0,25.0,30.0
2022-01-02,60.0,55.0,22.0,28.0


When deciding between using the pivot or pivot_table method, ask yourself will the results have more than one entry in any index + column? If the answer to this question is “yes”, you must use the pivot_table method. If the answer to this question is “no”, you may use the pivot method.

Note that any use of pivot can be switched to pivot_table, but not reverse is not true. If you try to use the pivot method where there would be more than one entry in any index + column combination, it will throw a ValueError.


## You Try

In [16]:
import pandas as pd

# QUESTION 1
# Reshape using the pivot method to look at trading volume across dates and stock symbols

stocks = pd.read_csv('https://gist.githubusercontent.com/alexdebrie/b3f40efc3dd7664df5a20f5eee85e854/raw/ee3e6feccba2464cbbc2e185fb17961c53d2a7f5/stocks.csv')


Unnamed: 0,date,symbol,open,high,low,close,volume
0,2019-03-01,AMZN,1655.13,1674.26,1651.0,1671.73,4974877
1,2019-03-04,AMZN,1685.0,1709.43,1674.36,1696.17,6167358
2,2019-03-05,AMZN,1702.95,1707.8,1689.01,1692.43,3681522
3,2019-03-06,AMZN,1695.97,1697.75,1668.28,1668.95,3996001
4,2019-03-07,AMZN,1667.37,1669.75,1620.51,1625.95,4957017
5,2019-03-01,AAPL,174.28,175.15,172.89,174.97,25886167
6,2019-03-04,AAPL,175.69,177.75,173.97,175.85,27436203
7,2019-03-05,AAPL,175.94,176.0,174.54,175.53,19737419
8,2019-03-06,AAPL,174.67,175.49,173.94,174.52,20810384
9,2019-03-07,AAPL,173.87,174.44,172.02,172.5,24796374


In [26]:
# QUESTION 2

data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles'],
        'Temperature': [32, 75, 30, 77, 33, 78],
        'Humidity': [80, 10, 85, 5, 81, 7]}

df = pd.DataFrame(data)


# calculate mean temperature for each city using pivot_table()


In [None]:
# QUESTION 3

data2 = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles','Delhi', 'Chennai', 'Delhi', 'Chennai'],
        'Country': ['USA', 'USA', 'USA', 'USA', 'India', 'India', 'India', 'India'],
        'Temperature': [32, 75, 30, 77, 75, 80, 78, 79]}
df = pd.DataFrame(data2)


# create a pivot table with multiindex

In [27]:
# QUESTION 4

data3 = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03', '2023-01-03'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles', 'Chicago'],
        'Temperature': [32, 75, 30, 77, np.nan, 76, np.nan]}
df = pd.DataFrame(data3)


# Create a pivot table to remove missing values

In [None]:
# QUESTION 5

data4 = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles'],
        'Temperature': [32, np.nan, 30, 77, np.nan, 76]}
df = pd.DataFrame(data4)


# Create a pivot table to adding a value for missing values

In [29]:
# QUESTION 6

df_students = pd.DataFrame({'Name': {0: 'John', 1: 'Bob', 2: 'Shiela'},
                   'Course': {0: 'Masters', 1: 'Graduate', 2: 'Graduate'},
                   'Age': {0: 27, 1: 23, 2: 21}})
df_students


# Use pd.melt function to unpivot the ‘Course’ column  keeping ‘Name’ as the identifier variable
# Use pd.melt function to unpivot the ‘Course’ and ‘Age’ columns while using ‘Name’ as the identifier variable.

Unnamed: 0,Name,Course,Age
0,John,Masters,27
1,Bob,Graduate,23
2,Shiela,Graduate,21
