📊 What is Aggregation in Pandas?

Aggregation in pandas means summarizing or combining data to get meaningful insights — usually by applying functions like:

sum() → total

mean() → average

max(), min() → highest/lowest

count() → number of entries

median(), std() → median, standard deviation, etc.

🧠 Why use Aggregation?
To answer questions like:

What is the average score of students?

How many sales per region?

What is the total salary per department?

🔹 Common ways to aggregate:

On entire DataFrame or column:
df['Marks'].mean()

Group-wise using groupby():

df.groupby('Class')['Marks'].sum()

Multiple functions with agg():

df['Marks'].agg(['mean', 'max', 'min'])

📌 In short:
Aggregation = Applying summary functions to get high-level insights from detailed data.

In [3]:
# Aggregation on a Pandas Series (like a single column)
import numpy as np
import pandas as pd

# Create a Series with 5 random numbers using numpy
rng = np.random.RandomState(42)
ser = pd.Series(rng.rand(5))

# View the series
print("series:\n\n",ser)

# Sum of all values in the series
print("Sum:", ser.sum())  # Adds all 5 numbers

# Mean (average) of the values
print("Mean:", ser.mean())  # Adds and divides by 5


series:

 0    0.374540
1    0.950714
2    0.731994
3    0.598658
4    0.156019
dtype: float64
Sum: 2.811925491708157
Mean: 0.5623850983416314


In [4]:
#Aggregation on a DataFrame (like a table with multiple columns)
# Create a DataFrame with 2 columns 'A' and 'B'
df = pd.DataFrame({
    'A': rng.rand(5),
    'B': rng.rand(5)
})

# Show the dataframe
print("printing df\n\n",df)

# Mean of each column (default: axis=0 → columns)
print("Mean of each column:")
print(df.mean())  # One mean for column A, one for B

# Mean of each row (axis=1 or 'columns')
print("Mean of each row:")
print(df.mean(axis='columns'))  # One mean per row


printing df

           A         B
0  0.155995  0.020584
1  0.058084  0.969910
2  0.866176  0.832443
3  0.601115  0.212339
4  0.708073  0.181825
Mean of each column:
A    0.477888
B    0.443420
dtype: float64
Mean of each row:
0    0.088290
1    0.513997
2    0.849309
3    0.406727
4    0.444949
dtype: float64


| Aggregation | What it does              |
| ----------- | ------------------------- |
| `count()`   | Count of values           |
| `sum()`     | Adds all values           |
| `mean()`    | Average value             |
| `median()`  | Middle value              |
| `min()`     | Smallest                  |
| `max()`     | Largest                   |
| `std()`     | Standard deviation        |
| `var()`     | Variance                  |
| `mad()`     | Mean absolute deviation   |
| `prod()`    | Product of all values     |
| `first()`   | First value in the column |
| `last()`    | Last value in the column  |
