# A lightning introduction to machine learning (ML)

Prepared by Jacob Zavatone-Veth (<jzavatoneveth@g.harvard.edu>).

This week, we'll wrap up the bootcamp series by giving you a lightning introduction to machine learning (ML). This notebook will introduce some basic ML concepts and vocabulary; you'll get to apply this knowledge in the next notebook. 

This is not a comprehensive introduction to machine learning. For more Python examples, see the Jupyter notebook version of the [Python Data Science Handbook](https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb). For more detailed information, the following books are useful references:
- [Bishop, *Pattern Recognition and Machine Learning*](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf)
- [Goodfellow, Bengio, and Courville, *Deep Learning*](https://www.deeplearningbook.org)
- [Hertz, Krogh, and Palmer, *Introduction to the Theory of Neural Computation*](https://www.amazon.com/Introduction-Theory-Neural-Computation-Institute/dp/0201515601)

In Python, the main packages currently used (to my knowledge) for ML functionality beyond Numpy and SciPy are [scikit-learn](https://scikit-learn.org), [PyTorch](https://pytorch.org) and [TensorFlow](https://www.tensorflow.org). 

## What is ML?
The basic goal of ML is to find patterns in data. To seach for these patterns, ML constructs a mathematical model for the structure of data; the ''learning'' in ''machine learning'' refers to having a computer adapt some ''free parameters'' of the model to ''fit'' observed data. The boundaries of ML are [somewhat fuzzy](https://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ckelmtt/?context=3), but these general principles are shared. ML is traditionally divided into *supervised learning*, *unsupervised learning*, and *reinforcement learning*. These three categories are differentiated by the nature of the feedback used to adapt model parameters; we discuss them below in turn.

## Supervised learning
In a supervised learning task, the goal is to learn a function that maps inputs $x$ to outputs $y$ given a set of example input-output pairs $\{(x_{\mu},y_{\mu})\}_{\mu=1}^{p}$. To do so, one must select some family of functions to model the data, as well as some cost that measures how well a given function models the observed data. 

Probably the simplest form of supervised learning is linear regression, which we discussed [back in Week 4](https://github.com/jrussell25/qbio-python/blob/summer21/week4/week4_regression.ipynb). There, the model for the input-output map $x \mapsto y$ is simply $y = \beta^{\top} x + \epsilon$ for some matrix $\beta$ and residuals $\epsilon$. Most commonly, one seeks parameters $\beta$ that minimize the least-squares cost $L(\beta)=\sum_{\mu=1}^{p}\Vert \beta^{\top} x_{\mu}-y_{\mu} \Vert_{2}^{2}$ (see the [Week 4 notebook](https://github.com/jrussell25/qbio-python/blob/summer21/week4/week4_regression.ipynb) for more details). 

Most supervised learning tasks will be of the same form: one chooses a parametric family of functions $f_{\theta}$ (most famously [deep neural networks](https://en.wikipedia.org/wiki/Artificial_neural_network)), and chooses the parameters $\theta$ by minimizing some cost $L(\theta)$ that measures the error on the training set for a given choice of parameters. Regardless of the setting, the eventual goal is to use the trained model $f_{\theta_{\ast}}$ to predict the output for an unseen input, i.e., to obtain a model that *generalizes* beyond the training set.

Probably the best-known supervised learning task is [image recognition](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf). For an introduction to these methods, see the [PyTorch](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) or [TensorFlow](https://www.tensorflow.org/tutorials/images/classification) tutorials. 

Later in this session, you'll have the chance to experiment with a more complex supervised learning model and real data.

## Unsupervised learning
In unsupervised learning tasks, the learner must find structure in its input without externally-provided labels. The canonical example of an unsupervised learning algorithm is PCA, which we discussed in Week 4. For a PCA demo and an introduction to other dimensionality reduction algorithms, see the [Week 4 notebook](https://github.com/jrussell25/qbio-python/blob/summer21/week4/week4_dimred.ipynb). 

## Reinforcement learning
Reinforcement learning (RL) is rather different from supervised and unsupervised learning; it considers how an agent should choose its actions as it navigates an environment in order to maximize some notion of reward. In recent years, RL has gained prominence through [DeepMind's](https://deepmind.com/) work on the games of [Go](https://deepmind.com/research/case-studies/alphago-the-story-so-far), [chess](https://deepmind.com/blog/article/alphazero-shedding-new-light-grand-games-chess-shogi-and-go), and [Starcraft](https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii).