Statistics & Data Science Masterclass

A structured, hands-on curriculum covering statistics from fundamentals to advanced inference — built entirely with Jupyter Notebooks. Theory (definitions + LaTeX formulas) and practice (Python code) in every lesson.

Why This Exists

Most statistics courses are either too theoretical (textbook-only) or too practical (library calls with no intuition). This masterclass bridges the gap — every concept is explained with definitions and formulas, then implemented from first principles in Python before using library shortcuts.

The curriculum follows a university-style progression (STATS100 → STATS500), where each level builds on the previous one.

Curriculum Overview

STATS100 — Descriptive Statistics Fundamentals

The building blocks: what statistics is, data types, frequency distributions, and the three pillars of central tendency.

#	Notebook	Topics
01	Descriptive Statistics Fundamentals	Definition of statistics (descriptive vs inferential), types of data (NOIR scale), population vs sample, frequency distributions, mean (arithmetic, weighted, trimmed), median, mode, geometric & harmonic mean, when to use what, best practices

STATS200 — Measures of Dispersion

Central tendency alone isn't enough — learn to quantify how spread out data is.

#	Notebook	Topics
01	Measures of Dispersion	Range, interquartile range (IQR), outlier detection (1.5×IQR rule), mean absolute deviation (MAD), variance (population vs sample, Bessel's correction), standard deviation, empirical rule (68-95-99.7), coefficient of variation (CV), mean absolute error (MAE)

STATS300 — Probability Distributions

The mathematical models that describe how data is generated — from coin flips to bell curves.

#	Notebook	Topics
01	Probability Distributions	Probability axioms, random variables, PMF, PDF, CDF, expected value & variance. Discrete: Bernoulli, Binomial, Poisson, Geometric, Negative Binomial, Hypergeometric, Discrete Uniform. Continuous: Normal, Standard Normal (Z-scores), Exponential, Gamma, Beta, Weibull, Log-Normal, Chi-Square, Student's t, F-distribution. Central Limit Theorem

STATS400 — Inference & Estimation

Drawing conclusions about populations from samples — the core of statistical reasoning.

#	Notebook	Topics
01	Inference and Estimation	Point estimation (sample mean, variance, proportion), properties of estimators (unbiasedness, consistency, efficiency), biased vs unbiased estimators, sampling distributions, standard error, Maximum Likelihood Estimation (MLE), log-likelihood, method of moments, confidence intervals (Z, t, proportion)

STATS450 — Hypothesis Testing

The formal framework for making data-driven decisions under uncertainty.

#	Notebook	Topics
01	Hypothesis Testing	Null & alternative hypotheses, test statistics, p-values (interpretation & misconceptions), significance level & confidence, Type I/II errors & power, Z-test (one-sample, two-sample), t-test (one-sample, independent, paired/Welch's), skewness (testing & visualization), kurtosis, normality tests (Shapiro-Wilk, D'Agostino), multiple testing & Bonferroni correction

STATS500 — Causality, Correlation & ANOVA

Measuring relationships between variables and testing for group differences.

#	Notebook	Topics
01	Correlation, ANOVA & Causality	Correlation vs causation, covariance, Pearson r, Spearman ρ, Kendall τ, point-biserial correlation, correlation heatmaps, chi-square test of independence (Cramér's V), one-way ANOVA (F-test, ANOVA table, η²), post-hoc tests (Tukey HSD), two-way ANOVA (interaction effects), MANOVA (Wilks' Lambda), causal inference methods

Learning Path

STATS100  Descriptive Stats   →  Summarize data (mean, median, mode)
  ↓
STATS200  Dispersion           →  Measure spread (variance, std, IQR)
  ↓
STATS300  Distributions        →  Model data generation (normal, binomial, etc.)
  ↓
STATS400  Estimation           →  Infer population parameters (MLE, CIs)
  ↓
STATS450  Hypothesis Testing   →  Make decisions (p-values, t-tests, z-tests)
  ↓
STATS500  Relationships        →  Measure associations (correlation, ANOVA, causality)

Tech Stack

Category	Tools
Language	Python 3.13+
Package Manager	uv
Notebooks	Jupyter (via VS Code)
Core Libraries	NumPy, Pandas, Matplotlib, Seaborn
Statistics	SciPy, statsmodels, scikit-learn

Getting Started

Prerequisites

Python 3.13 or higher
VS Code with the Jupyter extension
Git

Setup

# 1. Clone the repository
git clone https://github.com/shri-singh/Statistics-DataScience-Masterclass.git
cd Statistics-DataScience-Masterclass

# 2. Install uv (if not already installed)
pip install uv

# 3. Create virtual environment and install all dependencies
uv sync

# 4. Activate the environment
source .venv/Scripts/activate      # Git Bash on Windows
source .venv/bin/activate           # macOS / Linux

# 5. Register the Jupyter kernel
python -m ipykernel install --user --name=stats-demo --display-name "Stats Demo"

# 6. Open in VS Code
code .

Then open any .ipynb file and select the "Stats Demo" kernel from the top-right kernel picker.

Project Structure

Statistics-DataScience-Masterclass/
├── STATS100/                # Descriptive Statistics Fundamentals
├── STATS200/                # Measures of Dispersion
├── STATS300/                # Probability Distributions
├── STATS400/                # Inference & Estimation
├── STATS450/                # Hypothesis Testing
├── STATS500/                # Causality, Correlation & ANOVA
├── LICENSE                  # CC BY-NC 4.0
└── README.md

Who Is This For

Aspiring Data Scientists who need a strong statistical foundation
Analysts transitioning from Excel/SQL to Python-based statistics
Students who want a structured, progressive curriculum with code
ML Engineers who want to understand the statistical theory behind models
Self-learners who prefer understanding fundamentals over memorizing library calls

License

This work is licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

You are free to:

Share, copy, and redistribute the material
Adapt, remix, and build upon the material

Under these terms:

Attribution — Credit the original work and link to this repository
NonCommercial — You may not use the material for commercial purposes

See the full LICENSE file for details.

Author: Shri Singh

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
STATS100		STATS100
STATS200		STATS200
STATS300		STATS300
STATS400		STATS400
STATS450		STATS450
STATS500		STATS500
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistics & Data Science Masterclass

Why This Exists

Curriculum Overview

STATS100 — Descriptive Statistics Fundamentals

STATS200 — Measures of Dispersion

STATS300 — Probability Distributions

STATS400 — Inference & Estimation

STATS450 — Hypothesis Testing

STATS500 — Causality, Correlation & ANOVA

Learning Path

Tech Stack

Getting Started

Prerequisites

Setup

Project Structure

Who Is This For

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Statistics & Data Science Masterclass

Why This Exists

Curriculum Overview

STATS100 — Descriptive Statistics Fundamentals

STATS200 — Measures of Dispersion

STATS300 — Probability Distributions

STATS400 — Inference & Estimation

STATS450 — Hypothesis Testing

STATS500 — Causality, Correlation & ANOVA

Learning Path

Tech Stack

Getting Started

Prerequisites

Setup

Project Structure

Who Is This For

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages