## **Chapter 5) Group By Aggregation**

In [1]:
import pandas as pd
import numpy as np

**1) GroupBy**

In [2]:
data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
    'Store': ['S1', 'S1', 'S2', 'S2', 'S1', 'S2', 'S2', 'S1'],
    'Sales': [100, 200, 150, 250, 120, 180, 200, 300],
    'Quantity': [10, 15, 12, 18, 8, 20, 15, 25],
    'Date': pd.date_range('2023-01-01', periods=8)
}
df = pd.DataFrame(data)

In [3]:
df

Unnamed: 0,Category,Store,Sales,Quantity,Date
0,A,S1,100,10,2023-01-01
1,B,S1,200,15,2023-01-02
2,A,S2,150,12,2023-01-03
3,B,S2,250,18,2023-01-04
4,A,S1,120,8,2023-01-05
5,B,S2,180,20,2023-01-06
6,A,S2,200,15,2023-01-07
7,B,S1,300,25,2023-01-08


* In pandas, groupby is used to split your data into groups based on one or more columns, and then apply some aggregation (like sum(), mean(), count(), etc.) or transformation.

In [4]:
# group them by Category and calculate the total sales
cat = df.groupby('Category')
cat


<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002C8C1EDC440>

we will recieve *'<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000214F4DF8EC0>'* & to access this data we will run a loop

***Q-1) group them according to category wise and find the sum of the total sales***

In [5]:
for i, v in cat:
    print(i)
    print(v)
    print()

A
  Category Store  Sales  Quantity       Date
0        A    S1    100        10 2023-01-01
2        A    S2    150        12 2023-01-03
4        A    S1    120         8 2023-01-05
6        A    S2    200        15 2023-01-07

B
  Category Store  Sales  Quantity       Date
1        B    S1    200        15 2023-01-02
3        B    S2    250        18 2023-01-04
5        B    S2    180        20 2023-01-06
7        B    S1    300        25 2023-01-08



In [6]:
cat1 = df.groupby('Category')['Sales'].sum()
cat1

Category
A    570
B    930
Name: Sales, dtype: int64

In [7]:
cat1 = df.groupby('Category')['Quantity'].sum()
cat1

Category
A    45
B    78
Name: Quantity, dtype: int64

***Q-2) group them according to Store wise and find the sum of the total sales***

In [None]:
cat2 = df.groupby('Store')
for i, v in cat2:
    print(i)
    print(v)
    print()

S1
  Category Store  Sales  Quantity       Date
0        A    S1    100        10 2023-01-01
1        B    S1    200        15 2023-01-02
4        A    S1    120         8 2023-01-05
7        B    S1    300        25 2023-01-08

S2
  Category Store  Sales  Quantity       Date
2        A    S2    150        12 2023-01-03
3        B    S2    250        18 2023-01-04
5        B    S2    180        20 2023-01-06
6        A    S2    200        15 2023-01-07



In [18]:
cat2 = df.groupby('Store')['Sales']
for i,v in cat2:
    print(i)
    print(v)
    print()

S1
0    100
1    200
4    120
7    300
Name: Sales, dtype: int64

S2
2    150
3    250
5    180
6    200
Name: Sales, dtype: int64



In [14]:
cat2 = df.groupby('Store')['Sales'].sum()
cat2

Store
S1    720
S2    780
Name: Sales, dtype: int64

*Q-3) Group by multiple columns & Group by Category and Store*

In [31]:
cat2 = df.groupby(['Category', 'Store'])['Sales'].sum()
cat2

Category  Store
A         S1       220
          S2       350
B         S1       500
          S2       430
Name: Sales, dtype: int64

**2) Aggregation**

Aggregation in pandas = combining many values into one (like sum, mean, count) — usually after grouping data.

Example:
👉 If you have employees with salaries, aggregation lets you find average salary per department.

In [24]:
df['Sales'].mean()

np.float64(187.5)

(187.5) the output

In [26]:
df['Sales'].mode()

0    200
Name: Sales, dtype: int64

In [28]:
df['Sales'].median()

np.float64(190.0)

In [29]:
df['Sales'].sum()

np.int64(1500)

In [32]:
df['Sales'].agg(['sum', 'mean', 'min', 'max', 'count', 'std', 'median'])

sum       1500.000000
mean       187.500000
min        100.000000
max        300.000000
count        8.000000
std         66.062741
median     190.000000
Name: Sales, dtype: float64

' .agg ' is a function that is used to perform multiple aggregations on a group of data.

* # TO Easily understand what is group by and aggregation &  the use of **'( )' and '[ ]'** in PaNDaS

🎒 **Imagine your school bag**

--------------------------------------------------------------

Inside the bag (DataFrame), you have books (columns) and pages (rows).

You can use the groupby() function to group the books by their type (e.g., textbooks, notebooks, etc.). 

--------------------------------------------------------------

Then, you can use the aggregate() function to perform different operations on the pages within each group.

--------------------------------------------------------------

📌 Square Brackets [ ] → Picking Things

👉 Like opening the bag and pulling something out.

Examples:

df['Math'] → you pull out the Math book (1 column).

df[['Math','Science']] → you pull out 2 books (2 columns).

df[0:3] → you take the first 3 pages (rows).

df[df['Math'] > 80] → you only keep the pages where Math marks > 80.

So: [ ] = "pick or filter".

**📌 Parentheses ( ) → Doing Actions**

👉 Like using the book (reading, calculating, flipping pages).

--------------------------------------------------------------
Examples:

df.head(3) → you read the first 3 pages.

df['Math'].mean( ) → you calculate average marks in Math.

df.groupby('Class').sum( ) → you add up marks by Class.

So: ( ) = "perform an action".

--------------------------------------------------------------
**✅ Easy Memory**

[ ] = “Which thing?”

( ) = “What to do with it?”

--------------------------------------------------------------
⚡ Example combining both:

df['Math'].mean()


df['Math'] → pick the Math column.

.mean() → calculate its average.