A 15-module course on classical machine learning — regression, classification, ensemble methods, and how to choose between them. Each module is a self-contained Jupyter notebook that walks through one algorithm from problem framing to working code, with a worked example you can read by hand and a "when it breaks" section that names the failure modes you'll actually hit.
The course is designed to be read in order, but every module also stands on its own as a reference.
| # | Topic |
|---|---|
| 00 | Introduction — what machine learning is, the workflow, and how to use this course |
| 01 | Linear Regression |
| 02 | Polynomial & Regularized Regression |
| 03 | Logistic Regression |
| 04 | Support Vector Machines |
| 05 | Decision Trees |
| 06 | Bagging |
| 07 | Random Forests |
| 08 | AdaBoost |
| 09 | Gradient Boosting |
| 10 | Stacking & Voting |
| 11 | k-Nearest Neighbors |
| 12 | Naive Bayes |
| 13 | Algorithm Selection — given a problem, which algorithm should you reach for? |
| 14 | Model Comparison & Evaluation — capstone on metrics, validation, and head-to-head comparison |
Concept-focused notebooks the modules cross-link to. Read them when a module assumes a concept you want to revisit.
correlation.ipynbfeature-encoding.ipynbgradient-descent.ipynblearning-rate.ipynbloss-functions.ipynbn-grams.ipynbregularization.ipynbstandardization.ipynbstatistical-inference.ipynbtf-idf.ipynb
| File | Used in | Description |
|---|---|---|
california-housing.csv |
Modules 00–02 (regression) | 20,640 California Census block groups, 8 features, median house price target |
breast-cancer.csv |
Modules 03–12 (classification) | 569 samples, 30 features, malignant/benign target (Wisconsin Diagnostic) |
breast-cancer-explore.csv |
Exploration | Same data with a small subset for hand-readable walkthroughs |
See data/README.md for full dataset documentation, sources, and the pattern for adding new ones.
You'll need Python 3.10+ and the standard scientific stack. From the repo root:
pip install jupyter numpy pandas scikit-learn matplotlib seaborn
jupyter notebookThen open any .ipynb. Notebooks load data from the local data/ directory, so they work offline.
Every algorithm module follows the same arc:
- Problem — the situation where this algorithm earns its keep
- Model — what it predicts and what shape its decisions take
- How it learns — the training procedure, formulas, and intuition
- Superpower — what this algorithm does better than its peers
- When it breaks — the failure modes, visually and quantitatively
- Worked example — a 10-row dataset you can compute by hand
- Code — full scikit-learn implementation with sensible defaults
- Comparisons — head-to-head tables against algorithms covered earlier
Formulas are followed by a "Reading this formula" block that names every symbol. Tunable numbers (learning rates, depths, regularization strengths) are named variables, not magic literals.
All rights reserved.