Groupby() helps manipulate data on the basis of specific groups like class, cutomer data, leads etc

In [5]:
import pandas as pd

data={
    "Salesperson":["a","b","a","b","c","c","a"],
    "Region":["north","south","east","east","south","north","south"],
    "Sales":[200,120,300,150,400,120,2000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

#Group by sales person for total sales
group_sales_pp = df.groupby("Salesperson")["Sales"].sum()
print("\nTotal Sales by Sales Person:\n\n")
print(group_sales_pp)


#group by total sales for each region
group_sales_region = df.groupby("Region")["Sales"].sum()
print("\nTotal Sales by Region:\n\n")
print(group_sales_region)

Original DataFrame:
  Salesperson Region  Sales
0           a  north    200
1           b  south    120
2           a   east    300
3           b   east    150
4           c  south    400
5           c  north    120
6           a  south   2000

Total Sales by Sales Person:


Salesperson
a    2500
b     270
c     520
Name: Sales, dtype: int64

Total Sales by Region:


Region
east      450
north     320
south    2520
Name: Sales, dtype: int64


In [6]:
#aggregate functions (sum, count, mean) etc
agg_sales = df.groupby("Region")["Sales"].agg(["sum", "count", "mean"])
print("\nAggregated Sales Data by Region:\n\n")
print(agg_sales)


Aggregated Sales Data by Region:


         sum  count   mean
Region                    
east     450      2  225.0
north    320      2  160.0
south   2520      3  840.0




---

### 🧠 What is `groupby()`?

`groupby()` in pandas is used when you want to:

* Group your data based on some column (like "gender", "department", "class").
* Apply a function (like sum, mean, count) **within** each group.
* Combine the results into a new DataFrame or Series.

This process is often called:

```
Split → Apply → Combine
```

---

### 🔹 1. Split

You **split** your data into groups based on a column.

Let’s say you have this data:

```
Name      Subject    Marks
Alice     Math       90
Bob       Math       80
Charlie   Science    70
David     Math       85
Eve       Science    75
```

You want to group by **Subject**.

So pandas will split this into:

```
Group: Math
Alice     90
Bob       80
David     85

Group: Science
Charlie   70
Eve       75
```

---

### 🔹 2. Apply

Now, you **apply** an operation like `mean()` (average) or `sum()` to each group.

So you get:

```
Math: (90 + 80 + 85) / 3 = 85.0
Science: (70 + 75) / 2 = 72.5
```

---

### 🔹 3. Combine

Now pandas **combines** the result into a new DataFrame or Series:

```
Subject    Mean Marks
Math       85.0
Science    72.5
```

---

### 🪄 Figure Summary in Plaintext:

```
Original Data:
-------------------------------
| Name    | Subject | Marks   |
|---------|---------|---------|
| Alice   | Math    | 90      |
| Bob     | Math    | 80      |
| Charlie | Science | 70      |
| David   | Math    | 85      |
| Eve     | Science | 75      |
-------------------------------

Step 1: SPLIT by "Subject"
- Group 1 (Math):     Alice, Bob, David
- Group 2 (Science):  Charlie, Eve

Step 2: APPLY mean() on Marks
- Math: (90+80+85)/3 = 85.0
- Science: (70+75)/2 = 72.5

Step 3: COMBINE
------------------------
| Subject | Mean Marks |
|---------|------------|
| Math    | 85.0       |
| Science | 72.5       |
------------------------
```

---

### 🔍 In Simple Words:

`groupby()` helps you **analyze data by categories**. You divide data into small groups (split), calculate something for each group (apply), and collect the final answer (combine).


