Skip to content
Notes, examples, and Python demos for the textbook "Machine Learning Refined" (published by Cambridge University Press).
Branch: gh-pages
Clone or download
Latest commit f000e07 Apr 22, 2019

Machine Learning Refined: Notes, Exercises, and Jupyter notebooks Tweet

Publisher: Cambridge University Press

First edition: November 2016
Second edition: January 2020 (expected)

Table of contents

A little sampler first

(Back to top)

Many machine learning concepts - like convergence of an algorithm, evolution of a model from an underfitting one all the way to an overfitting model, etc. - can be illustrated and intuited best using animations (as opposed to static figures). You'll find a large number of both images and animated videos here - which you can modify yourself too via the raw Jupyter notebook version of these notes. Here are just a few examples:

Cross-validation (regression) Cross-validation (two-class classification) Cross-validation (multi-class classification)

K-means clustering Feature normalization Normalized gradient descent

Rotation Convexification Dogification!

A nonlinear transformation Weighted classification The moving average

Batch normalization Logistic regression

Polynomials vs. NNs vs. Trees (regression) Polynomials vs. NNs vs. Trees (classification)

Changing gradient descent's steplength (1d) Changing gradient descent's steplength (2d)

Convex combination of two functions Taylor series approximation

Feature selection via regularization Secant planes

Function approximation with a neural network A regression tree

What is in this book?

(Back to top)

We believe that understanding machine learning is impossible without having a firm grasp of its underlying mathematical machiney. But we also believe that the bulk of learning the subject takes place when learners "get their hands dirty" and code things up for themselves. That's why in this book we discuss both how to derive machine learnig models mathematically and how to implement them from scratch (using numpy, matplotlib, and autograd libraries) - and yes, this includes multi-layer neural networks as well!

Who is this book for?

(Back to top)

This text aims to bridge the existing gap between practicality and rigor in machine learning education, in a market saturated with books that are either mathematically rigorous but not practical, or vice versa. Conventional textbooks usually place little to no emphasis on coding, leaving the reader struggling to put what they learned into practice. On the other hand the more hands-on books in the market typically lack rigor, leaving machine learning a 'black box' to the reader.

If you're looking for a practical yet rigorous treatment of machine learning, then this book is for you.

What is in the repo?

(Back to top)

1. Interatcive html notes

These notes - listed here - served as an early draft for the second edition of the text. You can also find them in the notes directory. Here's an example:

2. Accompanying Jupyter notebooks (used to create the html notes)

Feel free to take a peek under the hood, tweak the models, explore new datasets, etc. Here's an example:

3. Coding exercises (1st edition)

In the exercises directory you can find starting wrappers for coding exercises from the first edition of the text in Python and MATLAB. Exercises for the 2nd edition will be added soon.


(Back to top)

Chapter 2: Zero order / derivative free optimization

2.1 Introduction
2.2 Zero order optimiality conditions
2.3 Global optimization
2.4 Local optimization techniques
2.5 Random search methods
2.6 Coordinate search and descent

Chapter 3: First order optimization methods

3.1 Introduction
3.2 The first order optimzliaty condition
3.3 The anatomy of lines and hyperplanes
3.4 The anatomy of first order Taylor series approximations
3.5 Automatic differentiation and autograd
3.6 Gradient descent
3.7 Two problems with the negative gradient direction
3.8 Momentum acceleration
3.9 Normalized gradient descent procedures
3.10 Advanced first order methods
3.11 Mini-batch methods
3.12 Conservative steplength rules

Chapter 4: Second order optimization methods

4.1 Introduction
4.2 The anatomy of quadratic functions
4.3 Curvature and the second order optimality condition
4.4 Newton's method
4.5 Two fundamental problems with Newton's method
4.6 Quasi-newton's methods

Chapter 5: Linear regression

5.1 Introduction
5.2 Least squares regression
5.3 Least absolute deviations
5.4 Regression metrics
5.5 Weighted regression
5.6 Multi-output regression

Chapter 6: Linear two-class classification

6.1 Introduction
6.2 Logistic regression and the cross-entropy cost
6.3 Logistic regression and the softmax cost
6.4 The perceptron
6.5 Support vector machines
6.6 Categorical labels
6.7 Comparing two-class schemes
6.8 Quality metrics
6.9 Weighted two-class classification

Chapter 7: Linear multi-class classification

7.1 Introduction
7.2 One-versus-All classification
7.3 The multi-class perceptron
7.4 Comparing multi-class schemes
7.5 The categorical cross-entropy cost
7.6 Multi-class quality metrics

Chapter 8: Unsupervised learning

8.1 Introduction
8.2 Spanning sets and vector algebra
8.3 Learning proper spanning sets
8.4 The linear Autoencoder
8.5 The class PCA solution
8.6 Recommender systems
8.7 K-means clustering
8.8 Matrix factorization techniques

Chapter 9: Principles of feature selection and engineering

9.1 Introduction
9.2 Histogram-based features
9.3 Standard normalization and feature scaling
9.4 Imputing missing values
9.5 PCA-sphereing
9.6 Feature selection via boosting
9.7 Feature selection via regularization

Chapter 10: Introduction to nonlinear learning

10.1 Introduction
10.2 Nonlinear regression
10.3 Nonlinear multi-output regression
10.4 Nonlinear two-class classification
10.5 Nonlinear multi-class classification
10.6 Nonlinear unsupervised learning

Chapter 11: Principles of feature learning

11.1 Introduction
11.2 Universal approximators
11.3 Universal approximation of real data
11.4 Naive cross-validation
11.5 Efficient cross-validation via boosting
11.6 Efficient cross-validation via regularization
11.7 Testing data
11.8 Which universal approximator works best in practice?
11.9 Bagging cross-validated models
11.10 K-folds cross-validation
11.11 When feature learning fails
11.12 Conclusion

Chapter 12: Kernels

12.1 Introduction
12.2 The variety of kernel-based learners
12.3 The kernel trick
12.4 Kernels as similarity measures
12.5 Scaling kernels

Chapter 13: Fully connected networks

13.1 Introduction
13.2 Fully connected networks
13.3 Optimization issues
13.4 Activation functions
13.5 Backpropogation
13.6 Batch normalization
13.7 Early-stopping

Chapter 14: Tree-based learners

14.1 Introduction
14.2 Varieties of tree-based learners
14.3 Regression trees
14.4 Classification trees
14.5 Gradient boosting
14.6 Random forests
14.7 Cross-validating individual trees

Chapter 15: Derivatives and Automatic Differentiation

15.1 Introduction
15.2 The derivative
15.3 Derivative rules for elementary functions and operations
15.4 The gradient
15.5 The computation graph
15.6 The forward mode of automatic differentiation
15.7 The reverse mode of automatic differentiation
15.8 Using the Autograd library
15.9 Higher order derivatives
15.10 Taylor series

Chapter 16: Linear algebra

16.1 Introduction
16.2 Vectors and vector operations
16.3 Matrices and matrix operations
16.4 Eigenvalues and eigenvectors
16.5 Vector and matrix norms


(Back to top)

To successfully run the Jupyter notebooks contained in this repo we highly recommend downloading the Anaconda Python 3 distribution. Many of these notebooks also employ the Automatic Differentiator autograd which can be installed by typing the following command at your terminal

  pip install autograd

With minor adjustment users can also run these notebooks using the GPU/TPU extended version of autograd JAX.


(Back to top)

This repository is in active development by Jeremy Watt and Reza Borhani - please do not hesitate to reach out with comments, questions, typos, etc.

You can’t perform that action at this time.