<a href="https://colab.research.google.com/github/fathanick/Fundamentals-of-Data-Science/blob/main/03_Mean_Median_Mode_Grouped_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Mean, Median, and Mode from a Grouped Frequency Table (Python Tutorial)

This notebook guides you step-by-step to compute **mean**, **median**, and **mode** from a **grouped frequency table** using Python.
We will use two core libraries:

- **pandas** — for working with tabular data
- **numpy** — for numerical operations

> Dataset: number of goals scored by players during a training season.



## 1) Import Libraries


In [1]:

import pandas as pd
import numpy as np
pd.set_option("display.precision", 2)



## 2) Create the Frequency Table

The table contains **intervals** and **frequencies**.

| Interval | Frequency (f) |
|-----------|---------------|
| 0–9       | 4             |
| 10–19     | 6             |
| 20–29     | 10            |
| 30–39     | 5             |


In [2]:

data = {
    'Interval': ['0-9', '10-19', '20-29', '30-39'],
    'Frequency': [4, 6, 10, 5]
}
df = pd.DataFrame(data)
df


Unnamed: 0,Interval,Frequency
0,0-9,4
1,10-19,6
2,20-29,10
3,30-39,5



## 3) Compute Class Boundaries and Midpoints

We split each interval `a-b` into numeric **Lower** = `a` and **Upper** = `b`, and compute the **Midpoint**:

$ x_i = \frac{\text{Lower} + \text{Upper}}{2} $


In [3]:

# Extract lower and upper limits from the "Interval" string
df[['Lower', 'Upper']] = df['Interval'].str.split('-', expand=True).astype(int)

# Midpoint for each class
df['Midpoint'] = (df['Lower'] + df['Upper']) / 2

# Class width (assumes equal width)
class_width = df.loc[0, 'Upper'] - df.loc[0, 'Lower']

df


Unnamed: 0,Interval,Frequency,Lower,Upper,Midpoint
0,0-9,4,0,9,4.5
1,10-19,6,10,19,14.5
2,20-29,10,20,29,24.5
3,30-39,5,30,39,34.5



## 4) Mean for Grouped Data

Formula:

$ \bar{x} = \frac{\sum f_i x_i}{\sum f_i} $


In [4]:

df['f_x'] = df['Frequency'] * df['Midpoint']
mean_value = df['f_x'].sum() / df['Frequency'].sum()
print(f"Mean = {mean_value:.2f}")


Mean = 20.90



## 5) Median for Grouped Data

Formula:

$ \text{Median} = L + \left( \frac{\frac{N}{2} - F}{f_m} \right) \times c $
Where:
- \(L\): lower limit of the **median class**
- \(N\): total frequency
- \(F\): cumulative frequency before the median class
- \(f_m\): frequency of the median class
- \(c\): class width


In [5]:

# Cumulative frequency
df['Cumulative'] = df['Frequency'].cumsum()

N = df['Frequency'].sum()
median_class_idx = df[df['Cumulative'] >= N/2].index[0]
median_class = df.loc[median_class_idx]

L = median_class['Lower']
F = 0 if median_class_idx == 0 else df.loc[median_class_idx - 1, 'Cumulative']
f_m = median_class['Frequency']
c = class_width

median_value = L + ((N/2 - F)/f_m) * c

print("Median class row:")
display(median_class.to_frame().T)
print(f"Median = {median_value:.2f}")


Median class row:


Unnamed: 0,Interval,Frequency,Lower,Upper,Midpoint,f_x,Cumulative
2,20-29,10,20,29,24.5,245.0,20


Median = 22.25



## 6) Mode for Grouped Data

Formula:

$ \text{Mode} = L + \left( \frac{f_m - f_1}{2f_m - f_1 - f_2} \right) \times c $

Where:
- \(f_m\): frequency of the **modal class** (highest frequency)
- \(f_1\): frequency before the modal class
- \(f_2\): frequency after the modal class
- \(L\): lower limit of the modal class
- \(c\): class width


In [6]:

modal_idx = df['Frequency'].idxmax()
modal_row = df.loc[modal_idx]

L = modal_row['Lower']
f_m = modal_row['Frequency']
f_1 = 0 if modal_idx == 0 else df.loc[modal_idx - 1, 'Frequency']
f_2 = 0 if modal_idx == len(df) - 1 else df.loc[modal_idx + 1, 'Frequency']
c = class_width

mode_value = L + ((f_m - f_1) / (2*f_m - f_1 - f_2)) * c

print("Modal class row:")
display(modal_row.to_frame().T)
print(f"Mode = {mode_value:.2f}")


Modal class row:


Unnamed: 0,Interval,Frequency,Lower,Upper,Midpoint,f_x,Cumulative
2,20-29,10,20,29,24.5,245.0,20


Mode = 24.00



## 7) Summary


In [7]:

summary = pd.DataFrame({
    'Measure': ['Mean', 'Median', 'Mode'],
    'Value': [round(mean_value, 2), round(median_value, 2), round(mode_value, 2)]
})
summary


Unnamed: 0,Measure,Value
0,Mean,20.9
1,Median,22.25
2,Mode,24.0



## 8) (Optional) Reuse: Helper Function for Any Grouped Table

You can reuse the function below with your own intervals and frequencies.


In [8]:

def stats_grouped(intervals, freqs):
    df = pd.DataFrame({'Interval': intervals, 'Frequency': freqs}).copy()
    df[['Lower','Upper']] = df['Interval'].str.split('-', expand=True).astype(int)
    df['Midpoint'] = (df['Lower'] + df['Upper']) / 2
    c = df.loc[0, 'Upper'] - df.loc[0, 'Lower']
    # mean
    mean_ = (df['Frequency'] * df['Midpoint']).sum() / df['Frequency'].sum()
    # median
    df['Cumulative'] = df['Frequency'].cumsum()
    N = df['Frequency'].sum()
    m_idx = df[df['Cumulative'] >= N/2].index[0]
    L = df.loc[m_idx, 'Lower']
    F = 0 if m_idx == 0 else df.loc[m_idx - 1, 'Cumulative']
    f_m = df.loc[m_idx, 'Frequency']
    median_ = L + ((N/2 - F)/f_m) * c
    # mode
    mo_idx = df['Frequency'].idxmax()
    Lm = df.loc[mo_idx, 'Lower']
    fm = df.loc[mo_idx, 'Frequency']
    f1 = 0 if mo_idx == 0 else df.loc[mo_idx - 1, 'Frequency']
    f2 = 0 if mo_idx == len(df) - 1 else df.loc[mo_idx + 1, 'Frequency']
    mode_ = Lm + ((fm - f1) / (2*fm - f1 - f2)) * c
    return df, {'mean': mean_, 'median': median_, 'mode': mode_}

# Demo with the same data
demo_df, demo_stats = stats_grouped(['0-9','10-19','20-29','30-39'], [4,6,10,5])
display(demo_df)
demo_stats


Unnamed: 0,Interval,Frequency,Lower,Upper,Midpoint,Cumulative
0,0-9,4,0,9,4.5,4
1,10-19,6,10,19,14.5,10
2,20-29,10,20,29,24.5,20
3,30-39,5,30,39,34.5,25


{'mean': np.float64(20.9),
 'median': np.float64(22.25),
 'mode': np.float64(24.0)}