# A quick introduction to machine learning

**Author:** [Leonardo Uieda](https://www.leouieda.com/)
    
This notebook is meant as a very brief hands-on introduction to machine learning. It will cover some of the common nomenclature, principles, and applications. It's designed to be taught as a 1-2 hour session with live-coding.

## Learner profile

* Is currently in their final year of a STEM undergraduate degree or early years of a postgraduate degree.
* Has studies the basics of statistics, Python programming, and linear algebra.
* Is interested in using machine learning in their projects or as a future career.

## What is ML?

Some features of machine learning (from my personal point of view):

* Focus on practical problems
* Learning from data and making predictions
* Overlap with statistics and optimization
* Computational approach

**Oversimplified summary:** Fit a mathematical model to data and use it to make predictions.

## Glossary 

<dl>
    <dt>model</dt>
    <dd>mathematical formula used to approximate the data</dd>
    <dt>parameters</dt>
    <dd>variables that define the model and control its behavior</dd>
    <dt>labels/classes</dt>
    <dd>quantity/category that we want to predict</dd>
    <dt>features</dt>
    <dd>measurements (information) used as predictors of labels/classes</dd>
    <dt>training</dt>
    <dd>using features and known labels/classes to fit the model (estimate its parameters)</dd>
    <dt>hyper-parameters</dt>
    <dd>variables that influence the training and the model but are not estimated during training</dd>
</dl>


**Disclaimer:** I'm not an ML researcher. Don't quote me on this.

## Libraries

In Python, the main tool used for machine learning is [scikit-learn](https://scikit-learn.org/). We'll use it and some of the other scientific Python *stack* to play with some data as we work through the core principles of machine learning.

In [11]:
import numpy as np

## Data

Sample data. Talk about the format expected `X, y`. Challenges of getting data into that format.

## Unsupervised learning: Finding patterns in the data

Show how PCA and other things can be used with the dataset. Maybe clustering and cross-plots.

## Supervised learning: Training models for prediction

Train a model on our data for classification. Maybe show how the behaviour is different when using PCA and not.

## Validation: How good are our predictions?

Talk about validation and cross-validation.

## Summary

Summarize the main take home messages.

---

## License

> This work is based on this excelled tutorial by Jake VanderPlas: https://github.com/jakevdp/sklearn_tutorial

All Python source code is made available under the BSD 3-clause license. You
can freely use and modify the code, without warranty, so long as you provide
attribution to the authors.

Unless otherwise specified, all figures and Jupyter notebooks are available
under the Creative Commons Attribution 4.0 License (CC-BY).

The full text of these licenses is provided in the [`LICENSE.txt`](LICENSE.txt)
file.