In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd

## 📊 GroupBy in Pandas

`groupby()` is a powerful Pandas method used to group data and perform aggregate operations like sum, mean, count, etc. It's extremely useful for summarizing and analyzing datasets.

**In this section, you will learn:**
- How to group data using one or more columns
- How to apply aggregation functions on grouped data
- How to interpret the results

In [2]:
# Create a simple dataset with 'Category', 'Store', and 'Sales'
data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
    'Store': ['X', 'X', 'Y', 'Y', 'X', 'Y', 'Y', 'X'],
    'Sales': [100, 200, 150, 120, 130, 160, 170, 190]
}

# Convert to DataFrame
df = pd.DataFrame(data)

In [3]:
# Preview the dataset
print(df)

  Category Store  Sales
0        A     X    100
1        B     X    200
2        A     Y    150
3        B     Y    120
4        A     X    130
5        B     Y    160
6        A     Y    170
7        B     X    190


In [4]:
# Group by 'Category' and calculate the total sales in each category
category_sales = df.groupby('Category')['Sales'].sum()
print(category_sales)

Category
A    550
B    670
Name: Sales, dtype: int64


In [5]:
# Group by 'Store' and calculate the total sales per store
store_sales = df.groupby('Store')['Sales'].sum()
print(store_sales)

Store
X    620
Y    600
Name: Sales, dtype: int64


In [6]:
# Group by both 'Category' and 'Store', and calculate the total sales
category_store_sales = df.groupby(['Category', 'Store'])['Sales'].sum()
print(category_store_sales)

Category  Store
A         X        230
          Y        320
B         X        390
          Y        280
Name: Sales, dtype: int64


## 📈 Aggregation

Once grouped, we can use aggregation functions to summarize the data.

**Common aggregation functions:**
- `sum()`: Total
- `mean()`: Average
- `min()`, `max()`: Minimum and maximum values
- `count()`: Number of entries
- `std()`: Standard deviation

In [7]:
# Calculate the mean (average) of all sales
average_sales = df['Sales'].mean()
print(f"Average Sales: {average_sales:.2f}")

Average Sales: 152.50


In [8]:
# Apply multiple aggregation functions on the 'Sales' column
aggregated = df['Sales'].agg(['sum', 'mean', 'min', 'max', 'count', 'std'])
print(aggregated)

sum      1220.000000
mean      152.500000
min       100.000000
max       200.000000
count       8.000000
std        34.537764
Name: Sales, dtype: float64


## 🚀 Next Steps
* Learn this file **6_PivotTables.ipynb**