# **Guided Lab - How to use Pivot() function**

## **Lab Overview**

This lab will guide you through the process of using the `pivot()` function in Pandas to reshape and analyze data efficiently. We will apply the `pivot()` function to real-world scenarios, such as financial portfolio analysis.



## **Learning Objective:**

By completing this learning object, learners wil be able to:
* Describe the syntax and functionality of the `pivot()` function in Pandas.
* Learn how to use the `pivot()` function to reshape data for analysis.
* Apply the `pivot()` function to real-world scenarios, such as financial portfolio analysis.


## **Example One:**
Consider the following DataFrame:

In [1]:
import pandas as pd
import numpy as np
import datetime

In [2]:
data = {
    'Date': ['2022-01-01', '2022-01-01', '2022-01-02', '2022-01-02'],
    'Category': ['A', 'B', 'A', 'B'],
    'Value': [10, 20, 15, 25]
}

df = pd.DataFrame(data)
df


Unnamed: 0,Date,Category,Value
0,2022-01-01,A,10
1,2022-01-01,B,20
2,2022-01-02,A,15
3,2022-01-02,B,25


Now, let's use the pivot method:

In [4]:
pivot_df = df.pivot(index='Date', columns='Category', values='Value')
pivot_df


Category,A,B
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-01-01,10,20
2022-01-02,15,25


In the above example, the pivot() function transformed the original DataFrame into a new DataFrame where the unique values in the 'Date' column became the index, the unique values in the 'Category' column became the columns, and the 'Value' column provided the values for the new DataFrame.

**Note:**
If there are multiple rows for a unique combination of index and columns, you may need to use the pivot_table method, which allows you to specify an aggregation function to handle duplicate values.

The pivot() function is a specialized form of the more general pivot_table method. pivot_table provides additional flexibility and functionality for handling duplicate values and aggregations.

## **Example Two:**
Consider the following DataFrame:

In [5]:
data = {'Name': ['Alice', 'Bob', 'Alice', 'Charlie'],
        'Fruit': ['Apple', 'Banana', 'Orange', 'Apple'],
        'Quantity': [3, 2, 1, 4]}

df_2 = pd.DataFrame(data)
df_2

Unnamed: 0,Name,Fruit,Quantity
0,Alice,Apple,3
1,Bob,Banana,2
2,Alice,Orange,1
3,Charlie,Apple,4


In [6]:
pivoted_df_2 = df_2.pivot(index='Name', columns='Fruit', values='Quantity')
print(pivoted_df_2)

Fruit    Apple  Banana  Orange
Name                          
Alice      3.0     NaN     1.0
Bob        NaN     2.0     NaN
Charlie    4.0     NaN     NaN


## **Example Three:**

Consider the following DataFrame:

In [8]:
df_3 = pd.DataFrame({ 'course': ['CS2e1', 'C5201', 'CS201','DE659', 'DE659', 'DE659'],
                   'batch': [1, 2, 3, 1, 2, 3],
                    'instructor': ['xaviel', 'young', 'zachary','carla','wendy', 'allen'],
                    'grade': [88.7, 92, 95.2, 78.3, 96, 92.5]
                    }
                  )
# Display the original DataFrame
print("Original DataFrame:")
df_3

Original DataFrame:


Unnamed: 0,course,batch,instructor,grade
0,CS2e1,1,xaviel,88.7
1,C5201,2,young,92.0
2,CS201,3,zachary,95.2
3,DE659,1,carla,78.3
4,DE659,2,wendy,96.0
5,DE659,3,allen,92.5


**3.1 - The following line, pivots the DataFrame to show grades for each course within each batch.**



In [9]:
#  Pivot for Grades by Course and Batch:
df_3.pivot(index='batch', columns='course', values='grade')

course,C5201,CS201,CS2e1,DE659
batch,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,,,88.7,78.3
2,92.0,,,96.0
3,,95.2,,92.5


The above line uses the pivot() method to reshape the DataFrame based on the 'batch', 'course', and 'grade' columns. The resulting DataFrame will have 'batch' as the index, 'course' as columns, and 'grade' as values.

**3.2 - The following line,  pivots to show instructors for each course within each batch.**

In [10]:
#Pivot for Instructors by Course and Batch:
df_3.pivot(index= 'course', columns='batch', values='instructor')

batch,1,2,3
course,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
C5201,,young,
CS201,,,zachary
CS2e1,xaviel,,
DE659,carla,wendy,allen


The above line uses the pivot() method to reshape the DataFrame based on the 'course', 'batch', and 'instructor' columns. The resulting DataFrame will have 'course' as the index, 'batch' as columns, and 'instructor' as values.

**3.3 - The following lione, pivots without specifying values, so it uses all non-index/column values (here, instructor and grade).**

In [11]:
df_3.pivot(index= 'course', columns='batch')

Unnamed: 0_level_0,instructor,instructor,instructor,grade,grade,grade
batch,1,2,3,1,2,3
course,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
C5201,,young,,,92.0,
CS201,,,zachary,,,95.2
CS2e1,xaviel,,,88.7,,
DE659,carla,wendy,allen,78.3,96.0,92.5


The above line uses the pivot() method to reshape the DataFrame based on the 'course', 'batch', and 'instructor' columns. The resulting DataFrame will have 'course' as the index, 'batch' as columns, and 'instructor' as values.

Output:



---



## **Example Four: Real-world scenarios - Financial Portfolio Analysis:**
In this example, we will consider a DataFrame representing the daily returns of different stocks in a portfolio over a period of time. We'll then use the pivot method to reshape the data for better analysis.

In [15]:
# Create a DataFrame with daily returns for different stocks
np.random.seed(42)
start_date = datetime.date(2022, 1, 1)
end_date = datetime.date(2022, 1, 10)
date_range = pd.date_range(start_date, end_date, freq='D')
# Create a DataFrame with daily returns for different stocks
data = {
    'Date': np.repeat(date_range, 3),
    'Stock': ['AAPL', 'GOOGL', 'MSFT'] * len(date_range),
    'Return': np.random.normal(loc=0.001, scale=0.01, size=len(date_range) * 3)
}

returns_df = pd.DataFrame(data)

# Display the original DataFrame
print("Original DataFrame:")
print(returns_df.head())

# Use pivot to reshape the DataFrame for better analysis
pivot_df_3 = returns_df.pivot(index='Date', columns='Stock', values='Return')

# Display the reshaped DataFrame
print("\nReshaped DataFrame:")
print(pivot_df_3.head())


Original DataFrame:
        Date  Stock    Return
0 2022-01-01   AAPL  0.005967
1 2022-01-01  GOOGL -0.000383
2 2022-01-01   MSFT  0.007477
3 2022-01-02   AAPL  0.016230
4 2022-01-02  GOOGL -0.001342

Reshaped DataFrame:
Stock           AAPL     GOOGL      MSFT
Date                                    
2022-01-01  0.005967 -0.000383  0.007477
2022-01-02  0.016230 -0.001342 -0.001341
2022-01-03  0.016792  0.008674 -0.003695
2022-01-04  0.006426 -0.003634 -0.003657
2022-01-05  0.003420 -0.018133 -0.016249


- In the above example, we generate random daily returns for three stocks (AAPL, GOOGL, MSFT) over a period of 10 days. The original DataFrame has a long format with columns for 'Date', 'Stock', and 'Return'.

- The pivot method is then used to reshape the DataFrame, making it more suitable for financial portfolio analysis. The resulting DataFrame (pivot_df) has dates as the index, stock symbols as columns, and daily returns as values.

- This reshaped DataFrame makes it easier to perform various portfolio analyses, such as calculating cumulative returns, correlation matrices, and statistical summaries for each stock in the portfolio.

- Keep in mind that this is a simplified example, and in a real-world scenario, you might have more detailed data and perform more sophisticated analyses based on your specific requirements.