# Day 19 - Creating Pivot Tables


## Why Are Pivot Tables Important?

Pivot tables are invaluable for quickly summarizing and analyzing large datasets. They allow you to group, aggregate, and filter data in various ways, providing a clear and concise view of your data's underlying patterns. This makes them particularly useful in exploratory data analysis (EDA) and reporting.


## Tutorial: Building Pivot Tables for Summarizing Data

Pandas makes it straightforward to create pivot tables using the `pivot_table` function. Let’s explore how to build pivot tables with practical examples.


### Basic Pivot Table Creation

The `pivot_table` function in Pandas allows you to create a pivot table with just a few lines of code. Here's an example of how to summarize data using a pivot table:


In [None]:
!pip install pandas

In [None]:
import pandas as pd

# Sample DataFrame
data = {
    'Department': ['Sales', 'Sales', 'HR', 'HR', 'IT', 'IT'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
    'Salary': [70000, 80000, 60000, 65000, 75000, 77000],
    'Bonus': [5000, 7000, 2000, 3000, 4000, 4500]
}
df = pd.DataFrame(data)

# Creating a basic pivot table
pivot = pd.pivot_table(df, values='Salary', index='Department', aggfunc='mean')

print("Pivot Table - Average Salary by Department:")
print(pivot)


### Adding Multiple Aggregations

You can add multiple aggregation functions to your pivot table to get a more detailed summary:


In [None]:
# Pivot table with multiple aggregation functions
pivot_multi = pd.pivot_table(df, values=['Salary', 'Bonus'], index='Department', aggfunc={'Salary': ['mean', 'max'], 'Bonus': 'sum'})

print("\nPivot Table - Multiple Aggregations:")
print(pivot_multi)


## Use Case: Analyzing Survey Data

For this use case, we will analyze survey data to gain insights into the preferences and demographics of the respondents. We will use a sample dataset that includes responses from various departments in a company.


### Step 1: Loading the Survey Data

Let's start by loading a sample survey dataset into a DataFrame:


In [None]:
# Sample survey data
survey_data = {
    'Department': ['Sales', 'Sales', 'HR', 'HR', 'IT', 'IT', 'Marketing', 'Marketing'],
    'Satisfaction': [4, 3, 5, 4, 4, 5, 3, 2],
    'Work-Life Balance': [3, 2, 4, 4, 5, 5, 3, 3],
    'Years at Company': [5, 4, 6, 5, 7, 6, 2, 3]
}
survey_df = pd.DataFrame(survey_data)

print("Survey Data:")
print(survey_df.head())


### Step 2: Creating a Pivot Table

Now, let’s create a pivot table to summarize the average satisfaction and work-life balance scores by department:


In [None]:
# Pivot table to summarize average satisfaction and work-life balance by department
pivot_survey = pd.pivot_table(survey_df, values=['Satisfaction', 'Work-Life Balance'], index='Department', aggfunc='mean')

print("\nPivot Table - Average Satisfaction and Work-Life Balance by Department:")
print(pivot_survey)


### Step 3: Analyzing Tenure and Satisfaction

We can further analyze the relationship between the number of years at the company and employee satisfaction:


In [None]:
# Pivot table to analyze the relationship between tenure and satisfaction
pivot_tenure_satisfaction = pd.pivot_table(survey_df, values='Satisfaction', index='Years at Company', aggfunc='mean')

print("\nPivot Table - Average Satisfaction by Years at Company:")
print(pivot_tenure_satisfaction)
