# Supervised Machine Learning with Python I


<img src="https://www.python.org/static/img/python-logo.png" alt="yogen" style="width: 200px; float: right;"/>
<br>
<br>
<br>
<img src="../assets/yogen-logo.png" alt="yogen" style="width: 200px; float: right;"/>

# Objectives

* Learn the basic principles of Machine Learning

* Get to know the `scikit-learn` library and how to use it for Machine Learning.

* Understand the close relationship between ML and optimization

* Learn how to do linear regression with `scikit-learn`

* Learn how to evaluate the results of training a regression algorithm


# Machine Learning


# `scikit-learn`

![scikit-learn cheat sheet](http://amueller.github.io/sklearn_tutorial/cheat_sheet.png)

## Linear Regression as a model for Machine Learning

Linear regression finds an $h_\theta$ of the form:

$$ h_\theta(X) = \theta \cdot X = \theta_0 + \theta_1 \cdot X_1 + ... + \beta_n \cdot X_n$$


Notice that X is **fixed**: we have one set of data. Finding $h_\theta(X)$ means finding the values of $\theta$ that make $h_\theta(X)$ most similar to y.




### Cost (loss) function

Difference between  $h_\theta(X)$  and y

$$L(\theta) = h_\theta(X) - y$$

We need to minimize it: we need its derivative _with respect to $\theta_i$_

## ML is optimization of a loss function.

In the case of Linear Regression, we are lucky because we can get an analytic expression of the derivative. That means we can "teleport" to its minimum.

In many other algorithms we can't, but we can calculate the derivative numerically at any point we want. How can we use that to find the minima?

### Gradient descent

![Gradient Descent](http://cdn-images-1.medium.com/max/800/1*NRCWfdXa7b-ak2nBtmwRvw.png)

from [primo.ai](http://primo.ai/)

# Machine Learning with Python: `scikit-learn`

## Generate dummy data

We are going to generate fake data before we dive into real data

This time, rather than code our own gradient descent optimizer, we will use `scikit-learn`

## The `Estimator` interface

In scikit-learn, preprocessing, supervised and unsupervised learning algorithms share a uniform interface.

Al estimators have a `.fit()` method, which takes:

- X, a numpy array or scipy sparse matrix

- y, in the case of supervised learning. It is a one-dimensional numpy array containing target values.

Once fitted, `Estimators` can either:

- `.predict()` a new set of y values from an `X_test` array: classification, regression, clustering

- `.transform()` an input `X_test` array: preprocessing, dimensionality reduction, feature extraction...

## Fitting a LinearRegression with `sklearn`

Is this a good estimate of the generalization error?

## Scoring and model validation

Our training set error will always be an optimistic estimate of our test set error.

We need to do a train, test split:

## Metrics for regression

MSE: Mean Squared Error

$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - h(x_i))^2$$

MAE: Mean Absolute Error 

$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - h(x_i)|$$

MAPE: Mean Absolute Percent Error

$$MAE = \frac{1}{n} \sum_{i=1}^{n} \frac{|y_i - h(x_i)|}{y_i}$$

Explained Variance:


$$explained\_{}variance(y, \hat{y}) = 1 - \frac{Var\{ y - \hat{y}\}}{Var\{y\}}$$



We will learn more about scoring and model selection in a later module

# Regression algorithms in sklearn

We have already met Linear regression, which is a parametric algorithm. There are, however, _non-parametric_ algorithms that do not make assumption regarding the shape of the function to be approximated.

Let's try more sophisticated algorithms on our toy data and on real data.

## K nearest neighbors

Now with real data: 

```python
diabetes = datasets.load_diabetes()
```

```python
from sklearn.datasets import fetch_openml
housing = fetch_openml(name="house_prices", as_frame=True)
```

## Decision Tree Regression


# Generalizability of our models

We want to train models on known data in order to make inferences (predictions) on unknown data.

How do we know how good our models are? 

## Overfitting

![Under- and overfitting](https://djsaunde.files.wordpress.com/2017/07/bias-variance-tradeoff.png)

from https://djsaunde.wordpress.com/2017/07/17/the-bias-variance-tradeoff/

# Additional References


[An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/)

[Introduction to Machine Learning with Python](http://shop.oreilly.com/product/0636920030515.do)

[scikit-learn cheat sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf)

[Regression metrics in sklearn](https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics)