You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The probability and statistics foundations behind modern machine learning.
Deep-dive explainers combining rigorous mathematics, interactive visualizations, and working code. The bridge between formalCalculus and formalML — building the probabilistic and inferential machinery that ML assumes you already have.
formalStatistics is a curated collection of long-form explainers on the probability and statistics foundations that modern ML relies on. Every topic receives a three-pillar treatment:
Rigorous exposition — Formal definitions, theorems, and proofs presented with full mathematical detail. Every convergence argument is expanded. Every estimator is derived from first principles.
Interactive visualization — Embedded widgets that let you manipulate parameters and watch the math come alive (e.g., slide a sample size to watch the Central Limit Theorem kick in, drag data points to see a least-squares fit update in real time, animate sequential Bayesian updating as observations arrive).
Working code — Production-oriented Python implementations you can run immediately, with bridges to NumPy, SciPy, statsmodels, and standard statistical computing libraries.
The site exists because the gap between "I can call sklearn.fit()" and "I understand why this estimator is consistent" is wider than it needs to be.
Relationship to Sister Sites
formalStatistics sits in the middle of a three-site learning path:
Where formalCalculus covers the calculus and analysis that probability assumes, and formalML covers the mathematical machinery of machine learning, formalStatistics covers the probabilistic and inferential foundations that connect them. Every topic includes backward links to formalCalculus prerequisites and forward links to the formalML topics it enables.
Curriculum
32 topics across 8 tracks, progressing from foundational probability through the statistical theory that directly feeds into graduate-level ML.
Track 1: Foundations of Probability
Topic
Level
Description
Sample Spaces, Events & Axioms
Foundational
Kolmogorov axioms, sigma-algebras for the working statistician, combinatorial probability
Conditional Probability & Independence
Foundational
Bayes' theorem, law of total probability, conditional independence — the backbone of graphical models
Random Variables & Distributions
Foundational
Measurable functions, PMFs and PDFs, CDFs — the formal bridge from events to numbers
Expectation, Variance & Moments
Foundational
Integration against a measure, moment-generating functions, characteristic functions
Track 2: Core Distributions & Families
Topic
Level
Description
Discrete Distributions
Foundational
Bernoulli, Binomial, Poisson, Geometric, Negative Binomial — derivations, relationships, and ML appearances
Continuous Distributions
Foundational
Normal, Exponential, Gamma, Beta, Uniform — density derivations, transformations, and why the Gaussian is everywhere
Exponential Families
Intermediate
Sufficient statistics, natural parameters, log-partition function — the unifying framework for GLMs
# Install dependencies
pnpm install
# Start dev server (localhost:4321)
pnpm dev
# Build for production
pnpm build
# Preview production build
pnpm preview
Author
Jonathan Rocha — Data scientist and researcher. MS Data Science (SMU), MA English (Texas A&M University-Central Texas), BA History (Texas A&M University). Research interests: time-series data mining, topology-aware deep learning.