In [13]:
import numpy as np
import pandas as pd

# GroupBy

🔁 What is groupby()?

groupby() lets you split your data into groups, apply some function, and then combine the results — it's known as:



**Split → Apply → Combine**

✅ Basic Syntax:

**df.groupby('column_name')**


But it becomes more useful with an aggregation like:

df.groupby('column_name')['target_column'].sum()



💡 Common Aggregation Functions:

| Function   | Description    |
| ---------- | -------------- |
| `.sum()`   | Total          |
| `.mean()`  | Average        |
| `.count()` | No. of entries |
| `.max()`   | Highest value  |
| `.min()`   | Lowest value   |


In [4]:
data = {
    'Category': ['A', 'A', 'B', 'B', 'A', 'B', 'A', 'B'],
    'Store': ['S1', 'S2', 'S1', 'S2', 'S1', 'S2', 'S2', 'S1'],
    'Sales': [100, 200, 150, 250, 120, 180, 200, 300],
    'Quantity': [10, 15, 12, 18, 8, 20, 15, 25],
    'Date': pd.date_range('2023-01-01', periods=8)
}

df = pd.DataFrame(data)


In [5]:
df

Unnamed: 0,Category,Store,Sales,Quantity,Date
0,A,S1,100,10,2023-01-01
1,A,S2,200,15,2023-01-02
2,B,S1,150,12,2023-01-03
3,B,S2,250,18,2023-01-04
4,A,S1,120,8,2023-01-05
5,B,S2,180,20,2023-01-06
6,A,S2,200,15,2023-01-07
7,B,S1,300,25,2023-01-08


 **Q:Group by category and calculate the sum of sells**

In [7]:
cat=df.groupby('Category')['Sales'].sum()
cat

Category
A    620
B    880
Name: Sales, dtype: int64


**Q:Group by store and calculate the sum of sells**

In [8]:
store=df.groupby('Store')['Sales'].sum()
store

Store
S1    670
S2    830
Name: Sales, dtype: int64

🧠 groupby() with Multiple Columns

✅ Syntax:

df.groupby(['col1', 'col2'])['target_col'].agg(func)


Yeh aapko nested grouping deta hai — jaise Category ke andar Store-wise total sales, etc

**Q:Group by Category and  store and calculate the sum of sells (Multiple Columns)**

In [12]:
Catstore=df.groupby(['Category','Store'])['Sales'].sum()
Catstore

Category  Store
A         S1       220
          S2       400
B         S1       450
          S2       430
Name: Sales, dtype: int64

# **Aggregations**

🔗 What is Aggregation?

Aggregation ka matlab hota hai:

Group karke kisi column ka summary statistic nikalna, jaise:

total (sum)

average (mean)

max/min

count etc.

✅ Common Aggregation Functions:

| Function    | Use Case           |
| ----------- | ------------------ |
| `.sum()`    | Total of values    |
| `.mean()`   | Average value      |
| `.count()`  | Number of entries  |
| `.min()`    | Minimum value      |
| `.max()`    | Maximum value      |
| `.median()` | Middle value       |
| `.std()`    | Standard deviation |


📘 Example 1: Single Aggregation

In [14]:
df.groupby('Category')['Sales'].sum()


Category
A    620
B    880
Name: Sales, dtype: int64

➡️ Gives total Sales per Category.

📘 **Example 2: Multiple Aggregations on One Column**

In [15]:
df.groupby('Store')['Sales'].agg(['sum', 'mean', 'max'])


Unnamed: 0_level_0,sum,mean,max
Store,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
S1,670,167.5,300
S2,830,207.5,250


➡️ Gives sum, average, and max Sales per Store.



📘 Example 3: Multiple Aggregations on Multiple Columns

In [16]:
df.groupby('Category')[['Sales', 'Quantity']].agg(['sum', 'mean'])


Unnamed: 0_level_0,Sales,Sales,Quantity,Quantity
Unnamed: 0_level_1,sum,mean,sum,mean
Category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
A,620,155.0,48,12.0
B,880,220.0,75,18.75


➡️ Output me har Category ke liye:

Sales ka total and average

Quantity ka total and average



🔄 Renaming Aggregated Columns

In [17]:
df.groupby('Store')['Sales'].agg(total_sales='sum', average_sales='mean')


Unnamed: 0_level_0,total_sales,average_sales
Store,Unnamed: 1_level_1,Unnamed: 2_level_1
S1,670,167.5
S2,830,207.5


🧠 Summary:
groupby() ➕ .agg() = Powerful combo

Use it to analyze trends and summarize large data