# Day 8: Grouping and Aggregation with Pandas

Welcome to Day 8! Today, you'll learn one of the most powerful features in Pandas: the ability to group data and perform aggregate calculations. This is fundamental to data analysis, allowing you to summarize vast amounts of data into meaningful insights.

As always, let's start by importing the necessary libraries.

In [None]:
import pandas as pd
import numpy as np

---

## Part 1: Basic Grouping and Aggregation

In this section, we'll perform simple grouping operations on a sample dataset.

First, let's create a DataFrame to work with.

In [None]:
data = {'Category': ['Fruit', 'Vegetable', 'Fruit', 'Fruit', 'Vegetable', 'Dairy'],
        'Product': ['Apple', 'Carrot', 'Banana', 'Orange', 'Broccoli', 'Milk'],
        'Sales': [100, 50, 120, 90, 60, 80],
        'Quantity': [10, 5, 12, 9, 6, 8]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

**Exercise 1.1:** Group the DataFrame by the 'Category' column and display the groups. (Note: This will create a `DataFrameGroupBy` object).

In [None]:
# Your code here

**Solution 1.1:**

In [None]:
# Solution
grouped_by_category = df.groupby('Category')
print(grouped_by_category)

**Exercise 1.2:** Now, use the `groupby` object to calculate the total `Sales` for each category.

In [None]:
# Your code here

**Solution 1.2:**

In [None]:
# Solution
total_sales_per_category = grouped_by_category['Sales'].sum()
print("Total Sales per Category:")
print(total_sales_per_category)

**Exercise 1.3:** Calculate the average `Quantity` sold for each category.

In [None]:
# Your code here

**Solution 1.3:**

In [None]:
# Solution
avg_quantity_per_category = df.groupby('Category')['Quantity'].mean()
print("Average Quantity per Category:")
print(avg_quantity_per_category)

**Exercise 1.4:** Count the number of products in each category.

In [None]:
# Your code here

**Solution 1.4:**

In [None]:
# Solution
product_count_per_category = df.groupby('Category')['Product'].count()
print("Product Count per Category:")
print(product_count_per_category)

---

## Part 2: Advanced Aggregation with .agg()

The `.agg()` method allows you to apply multiple aggregation functions at once, giving you more detailed summaries.

**Exercise 2.1:** For each 'Category', calculate the sum of 'Sales' and the mean of 'Quantity' in a single operation.

In [None]:
# Your code here

**Solution 2.1:**

In [None]:
# Solution
agg_results = df.groupby('Category').agg({
    'Sales': 'sum',
    'Quantity': 'mean'
})
print(agg_results)

**Exercise 2.2:** For the 'Sales' column, find the total sum, the average, and the number of products for each category.

In [None]:
# Your code here

**Solution 2.2:**

In [None]:
# Solution
sales_summary = df.groupby('Category')['Sales'].agg(['sum', 'mean', 'count'])
print(sales_summary)

**Exercise 2.3 (Challenge):** Group by 'Category' and apply different aggregations to different columns. For 'Sales', get the sum and max. For 'Quantity', get the mean and min.

In [None]:
# Your code here

**Solution 2.3:**

In [None]:
# Solution
detailed_agg = df.groupby('Category').agg(
    Sales_Sum=('Sales', 'sum'),
    Sales_Max=('Sales', 'max'),
    Quantity_Mean=('Quantity', 'mean'),
    Quantity_Min=('Quantity', 'min')
)
print(detailed_agg)

---

## Part 3: Grouping by Multiple Columns

You can also group by more than one column to create a multi-level summary.

Let's add a 'Region' column to our DataFrame.

In [None]:
df['Region'] = ['North', 'South', 'North', 'West', 'South', 'West']
print("DataFrame with Region:")
print(df)

**Exercise 3.1:** Group by both 'Region' and 'Category' and calculate the total sales for each combination.

In [None]:
# Your code here

**Solution 3.1:**

In [None]:
# Solution
multi_group_sales = df.groupby(['Region', 'Category'])['Sales'].sum()
print(multi_group_sales)

**Exercise 3.2:** Get the size (number of rows) of each group (Region and Category combination).

In [None]:
# Your code here

**Solution 3.2:**

In [None]:
# Solution
group_sizes = df.groupby(['Region', 'Category']).size()
print(group_sizes)

---

### Fantastic work!

You've now mastered the fundamentals of `groupby` and aggregation in Pandas. This is a skill you will use constantly in any data analysis task. Tomorrow, we'll look at how to combine different datasets together.