# ðŸ§© Lesson 4: GroupBy, Aggregations & Pivot Tables

This lesson covers:
- Summarizing datasets
- Aggregating grouped data
- Pivot tables for multi-dimensional analysis

These tools are core for reporting, dashboarding, and business analytics.


In [1]:
import pandas as pd

sales = {
    'Store': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
    'Product': ['Laptop', 'Keyboard', 'Mouse', 'Laptop', 'Mouse', 'Laptop', 'Keyboard', 'Mouse'],
    'Units_Sold': [10, 50, 80, 8, 40, 15, 60, 90],
    'Price': [80000, 1200, 600, 82000, 650, 78000, 1100, 550]
}

df = pd.DataFrame(sales)
df


Unnamed: 0,Store,Product,Units_Sold,Price
0,A,Laptop,10,80000
1,A,Keyboard,50,1200
2,A,Mouse,80,600
3,B,Laptop,8,82000
4,B,Mouse,40,650
5,C,Laptop,15,78000
6,C,Keyboard,60,1100
7,C,Mouse,90,550


Basic Grouping

In [5]:
df.groupby('Store')['Units_Sold'].sum()


Store
A    140
B     48
C    165
Name: Units_Sold, dtype: int64

In [4]:
df.groupby('Product')['Units_Sold'].sum()


Product
Keyboard    110
Laptop       33
Mouse       210
Name: Units_Sold, dtype: int64

### Why GroupBy?
GroupBy lets us answer questions like:
- Which store is performing best?
- Which product sells the most?


Aggregations (Multiple Calculations at Once)

In [8]:
df.groupby('Store').agg({
    'Units_Sold': 'sum',
    'Price': 'mean'
})

Unnamed: 0_level_0,Units_Sold,Price
Store,Unnamed: 1_level_1,Unnamed: 2_level_1
A,140,27266.666667
B,48,41325.0
C,165,26550.0


In [9]:
df.groupby(['Store', 'Product']).agg({
    'Units_Sold': ['sum', 'mean'],
    'Price': ['max', 'min']
})

Unnamed: 0_level_0,Unnamed: 1_level_0,Units_Sold,Units_Sold,Price,Price
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,mean,max,min
Store,Product,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
A,Keyboard,50,50.0,1200,1200
A,Laptop,10,10.0,80000,80000
A,Mouse,80,80.0,600,600
B,Laptop,8,8.0,82000,82000
B,Mouse,40,40.0,650,650
C,Keyboard,60,60.0,1100,1100
C,Laptop,15,15.0,78000,78000
C,Mouse,90,90.0,550,550


Compute Revenue Column

In [11]:
df['Revenue'] = df['Units_Sold'] * df['Price']
df

Unnamed: 0,Store,Product,Units_Sold,Price,Revenue
0,A,Laptop,10,80000,800000
1,A,Keyboard,50,1200,60000
2,A,Mouse,80,600,48000
3,B,Laptop,8,82000,656000
4,B,Mouse,40,650,26000
5,C,Laptop,15,78000,1170000
6,C,Keyboard,60,1100,66000
7,C,Mouse,90,550,49500


Now do the business questions (these go into your notebook)
Total revenue per store

In [13]:
df.groupby('Store')['Revenue'].sum()

Store
A     908000
B     682000
C    1285500
Name: Revenue, dtype: int64

Which store sells the most laptops?

In [14]:
df[df['Product'] == 'Laptop'].groupby('Store')['Units_Sold'].sum()

Store
A    10
B     8
C    15
Name: Units_Sold, dtype: int64

Average price per product

In [15]:
df.groupby('Product')['Price'].mean()

Product
Keyboard     1150.0
Laptop      80000.0
Mouse         600.0
Name: Price, dtype: float64

Top revenue product per store

In [16]:
df.groupby(['Store', 'Product'])['Revenue'].sum().sort_values(ascending=False)

Store  Product 
C      Laptop      1170000
A      Laptop       800000
B      Laptop       656000
C      Keyboard      66000
A      Keyboard      60000
C      Mouse         49500
A      Mouse         48000
B      Mouse         26000
Name: Revenue, dtype: int64

Pivot Tables (Excel-style analysis inside Pandas)

In [17]:
pd.pivot_table(df,
                values='Revenue',
                index='Store',
                columns='Product',
                aggfunc='sum',
                fill_value=0
)

Product,Keyboard,Laptop,Mouse
Store,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,60000,800000,48000
B,0,656000,26000
C,66000,1170000,49500


### Pivot Tables = Instant Business Summary
Pivot tables let us summarize metrics across multiple dimensions.
This is how analysts generate reports and dashboards.


### Exercise
1. Show Units_Sold by Product for each Store
2. Show Average Price across stores for each Product
3. Find which product generated the highest total revenue overall


In [25]:
pd.pivot_table(df,
                index="Store",
                columns="Product",
                values="Units_Sold",
                aggfunc="sum",
                fill_value=0
                )

Product,Keyboard,Laptop,Mouse
Store,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,50,10,80
B,0,8,40
C,60,15,90


In [26]:
pd.pivot_table(df,
                index="Product",
                values="Price",
                aggfunc="mean",
               
                )

Unnamed: 0_level_0,Price
Product,Unnamed: 1_level_1
Keyboard,1150.0
Laptop,80000.0
Mouse,600.0


In [27]:
df.groupby("Product")["Revenue"].sum().sort_values(ascending=False)

Product
Laptop      2626000
Keyboard     126000
Mouse        123500
Name: Revenue, dtype: int64

### âœ… Summary

- GroupBy lets us group data and compute summaries.
- Aggregations allow multiple metrics at once (sum, mean, max, etc.).
- Pivot tables provide multi-dimensional analytics similar to Excel.
- These operations allow us to answer real business questions.
