# Homework Assignment: Data Aggregation with Pandas


## Introduction
In this assignment, you will apply data aggregation techniques using the Pandas library. You will perform groupings and apply window functions to a sales dataset.



## Objectives
- Practice using the `groupby` function for data aggregation.
- Understand and apply rolling window functions.
- Explore expanding window functions for cumulative statistics.


**Load Data**: `sales_region_hw.csv`

In [1]:
import pandas as pd

In [2]:
sales = pd.read_csv('/content/sales_region_hw.csv')

### Task 1: Group By Region and Category
- Perform the following operations and answer the questions below:

In [68]:
# 1. Group the data by 'Region' and 'Category'. What is the total sales amount for each group?
sales.groupby(['Region', 'Category'])['Sales'].sum()
# Region and Category are the Rows
# Sales is the column

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
Region,Category,Unnamed: 2_level_1
East,Clothing,18883
East,Electronics,15060
East,Furniture,12108
East,Groceries,17706
North,Clothing,14481
North,Electronics,15253
North,Furniture,18653
North,Groceries,13188
South,Clothing,14143
South,Electronics,19557


In [69]:
# 2. Which 'Region' and 'Category' combination has the highest average quantity sold?
region_category_max_avg = sales.groupby(['Region', 'Category'])['Quantity'].mean().idxmax()
# idxmax() finds the index label of a maximum value of an item in a data frame
max_avg = sales.groupby(['Region', 'Category'])['Quantity'].mean().max()
print("Region and Category Combination with Highest Avg Quantity Sold:")
print(f"{region_category_max_avg}: {max_avg}")

Region and Category Combination with Highest Avg Quantity Sold:
('North', 'Clothing'): 63.935483870967744


In [22]:
# 3. How many unique 'Category' entries are there for each 'Region'?
sales.groupby('Region')['Category'].nunique()

Unnamed: 0_level_0,Category
Region,Unnamed: 1_level_1
East,4
North,4
South,4
West,4


In [23]:
# 4. For each 'Region', what is the maximum sales value for 'Clothing'?
sales[sales['Category'] == 'Clothing'].groupby('Region')['Sales'].max()

Unnamed: 0_level_0,Sales
Region,Unnamed: 1_level_1
East,970
North,994
South,995
West,961


In [32]:
# 5. Calculate the total 'Quantity' for each 'Category' across all 'Regions'. Which 'Category' has the highest total quantity?
highest_quantity_category = sales.groupby('Category')['Quantity'].sum().idxmax()
highest_quantity_amount = sales.groupby('Category')['Quantity'].sum().max()

print(f'Category with the highest total quantity: {highest_quantity_category}')
print(f'Total quantity: {highest_quantity_amount}')

Category with the highest total quantity: Electronics
Total quantity: 6529


### Task 2: Rolling Window Function
- Perform the following operations and answer the questions below:

In [36]:
sales.head().sort_values('Date')

Unnamed: 0,Date,Region,Category,Sales,Quantity
0,2021-02-14,East,Furniture,715,33
1,2021-02-17,South,Electronics,59,15
3,2021-03-06,North,Furniture,353,88
4,2021-03-09,South,Electronics,579,33
2,2021-04-28,West,Furniture,955,62


In [66]:
# Perform the following operations and answer the questions below:
# 1. Calculate a 7-day rolling average of 'Sales'. On which date does the East region reach its highest 7-day rolling average of sales?

# Sorting sales by Region and Date. Putting them in chronological order by region
sales.sort_values(['Region', 'Date'], inplace=True)

sales['Rolling_Avg_7_Days'] = sales['Sales'].rolling(window=7, min_periods=7).mean()
# First 6 values will be NaN because min_periods=7

east_sales = sales[sales['Region'] == 'East']
# Create df for only East sales

# Create Variables for the date date of the highest max avg sales in the East region and the amount
east_sales_max_avg_id = east_sales['Rolling_Avg_7_Days'].idxmax()
east_sales_max_avg_amount = east_sales['Rolling_Avg_7_Days'].max()

print(f'Date with the highest 7-day rolling average of sales in the East region: {east_sales_max_avg_id}')
print(f'Highest 7-day rolling average of sales in the East region: {east_sales_max_avg_amount}')


Date with the highest 7-day rolling average of sales in the East region: 2021-03-14 00:00:00
Highest 7-day rolling average of sales in the East region: 747.0


In [67]:
# 2. What is the overall average of the 7-day rolling sales amounts for each region?
sales.groupby('Region')['Rolling_Avg_7_Days'].mean()

Unnamed: 0_level_0,Rolling_Avg_7_Days
Region,Unnamed: 1_level_1
East,483.732283
North,489.729143
South,464.026578
West,533.783818


In [70]:
!jupyter nbconvert --to html ""

[NbConvertApp] Converting notebook /content/pandas_data_aggregation_homework_IanGabrielEusebio.ipynb to html
[NbConvertApp] Writing 301661 bytes to /content/pandas_data_aggregation_homework_IanGabrielEusebio.html
