# Pandas Day 3

In [29]:
import pandas as pd

student_data_df = pd.DataFrame({
    "Student_ID": ["S001", "S002", "S003", "S004", "S005", "S006", "S007", "S008"],
    "Name": ["Nikhil", "Nitin", "Aditya", "Rohit", "Aman", "Sahil", "Ravi", "Kunal"],
    "Age": [17, 19, 21, 20, 18, 22, 19, 21],
    "Marks": [300, 250, 280, 260, 270, 290, 255, 275],
    "University": ["BHU", "DU", "LU", "BHU", "DU", "LU", "DU", "BHU"],
    "City": ["Lucknow", "Deoria", "Lucknow", "Lucknow", "Deoria", "Lucknow", "Deoria", "Lucknow"],
    "Department": ["IT", "Elex", "IT", "IT", "Elex", "IT", "Elex", "IT"]
}, index=["001", "002", "003", "004", "005", "006", "007", "008"])



In [32]:
student_data_df.head()

Unnamed: 0,Student_ID,Name,Age,Marks,University,City,Department
1,S001,Nikhil,17,300,BHU,Lucknow,IT
2,S002,Nitin,19,250,DU,Deoria,Elex
3,S003,Aditya,21,280,LU,Lucknow,IT
4,S004,Rohit,20,260,BHU,Lucknow,IT
5,S005,Aman,18,270,DU,Deoria,Elex


In [None]:
# Checking the Shape of the dataframe
student_data_df.shape

(8, 7)


## Understanding `groupby()`

The `groupby()` function is used to **group rows with the same values** in one or more columns.

It does **not perform any calculation by itself**.  
An aggregation function (such as `mean`, `sum`, or `count`) must be applied after grouping.

### How it works
- First, data is split into groups based on a category column
- Then, a calculation is applied to each group
- The result is a summarized output for every group

## Group data by one column 


In [20]:
student_data_df.groupby('Department')['Student_ID'].count()

Department
Elex    3
IT      5
Name: Student_ID, dtype: int64

In [21]:
student_data_df.groupby('Department')['Marks'].mean()

Department
Elex    258.333333
IT      281.000000
Name: Marks, dtype: float64

In [23]:
student_data_df.groupby('University')['Student_ID'].count()

University
BHU    3
DU     3
LU     2
Name: Student_ID, dtype: int64

## Groupby by Multiple columns

In [24]:
student_data_df.groupby(['Department','City'])['University'].count()

Department  City   
Elex        Deoria     3
IT          Lucknow    5
Name: University, dtype: int64

In [25]:
student_data_df.groupby(['Department','City','University'])['Marks'].mean()

Department  City     University
Elex        Deoria   DU            258.333333
IT          Lucknow  BHU           278.333333
                     LU            285.000000
Name: Marks, dtype: float64

## Multiple Aggregation at once 


In [27]:
student_data_df.groupby(['Department'])['Marks'].agg(["mean","max","min"])

Unnamed: 0_level_0,mean,max,min
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Elex,258.333333,270,250
IT,281.0,300,260


Groupby function is used when we want to : 
- Split data into groups → apply a calculation → combine results

## Sorting groupBy Results

In [None]:
dept_avg = student_data_df.groupby('Department')['Marks'].mean()

In [None]:
dept_avg.sort_values()     # In ascending order 

Department
Elex    258.333333
IT      281.000000
Name: Marks, dtype: float64

In [None]:
dept_avg.sort_values(ascending=False)    # In decending order

Department
IT      281.000000
Elex    258.333333
Name: Marks, dtype: float64

## Reset index 


In [37]:
dept_avg.sort_values().reset_index()

Unnamed: 0,Department,Marks
0,Elex,258.333333
1,IT,281.0


## Summary (Day 3)

- Created a dataset suitable for aggregation
- Used `groupby()` to summarize data
- Performed single and multi-column grouping
- Applied multiple aggregations using `agg()`
- Sorted grouped results to identify patterns
- Improved readability using `reset_index()`
