# Supervised Learning

Algorithms are trained using *labeled* examples, such as an input where the **desired output is known**. Example: *Spam* vs *Legitimate* Email / *Positive* vs *Negative* Movei Review.

Supervied Learning is commonly used in applications where historical data predicts likely future events.

Steps:
* Data Acquisition.
* Data Cleaning.
* Split Data.
    * Training Data: used to train model parameters.
    * Validation Data: used to determine what model hyperparameters to adjust.
    * Test Data: used to get some final performance metrics (expected real-world performance).
* Model Fitting using training data.
* Model Testing.
    * Adjust Model Paramenters using validation data.
    * Iterate between model training and model testing.
* Model Deployment.

## Evaluating performance

"Good" and "bad" values for each performance metric depend on the context and specific circumstances. There is no magick number to evaluate whether a given performance metric is good or not.

### Classification Error Metrics
Key classification metrics:
* Accuracy: number of correct predictions made by the model divided by the total number of predictions.
    * Higher is better.
    * Useful when target classes (Y) are **well balanced**.
* Recall: number of true positives divided by the number of true positives plus the number of false negatives.
    * Ability to find **all** the relevant cases within a dataset.
    * Useful when target classes (Y) are **unbalanced**.
    * Minimizes false negatives.
* Precision: number of true positives divided by the number of true positives plus the number of false positives.
    * Ability to identify **only** the data points that were actually relevant.
    * Useful when target classes (Y) are **unbalanced**.
    * Minimizes false positives.
* F1-Score: $ F_{1} = 2 * \frac{precision * recall}{precision + recall}$
    * Optimal blend of precision and recall.
    * Harmonic mean punishes extreme values giving a fair assessment of the combination.
* Confusion matrix: Real values vs Predicted values.


### Regression Errror Metrics
Regression is a task when a model attempts to predict continuous values.
* Mean Absolute Error (MAE): mean of the absolute value of errors. It's simple, however it won't punish large errors.
* Mean Squared Error (MSE): mean of the squared error. Greater punishment for outliers. Issue: it also squares the predicted units, making it more difficult to interpret.
* Root Mean Square Error (RMSE): take the square root of the MSE.
    * Most common error metric.

# Use of Scikit-Learn

All model are available through estimators (model classes).

General form:
```python
from sklean.family import model
```

Example:
```python
from sklean.linear_model import LinearRegression
```

## Instanciating
Estimators have suitable default values.

General form:
```python
model = ModelName(parameter='value')
```

Example:
```python
model = LinearRegression(normalize=True)
print(model)
```



## Split training & test data

General form:
```python
from sklean.cross_validation import train_test_split
```

Example:

In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
x,y = np.arange(10).reshape((5,2)), range(5)
print("X:",x)
print("Y:",list(y))
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3)
print("X train:",x_train)
print("Y train:",y_train)
print("X test:",x_test)
print("Y test:",y_test)

X: [[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
Y: [0, 1, 2, 3, 4]
X train: [[2 3]
 [0 1]
 [8 9]]
Y train: [1, 0, 4]
X test: [[4 5]
 [6 7]]
Y test: [2, 3]


## Fit model on data

General form:
```python
model.fit(x_train, y_train)
```

## Predict values on the test data

General form:
```python
predictions = model.predict(x_test)
```

### Other prediction methods

Predict probability of each category:
```python
model.predict_proba()
```

Calculate score values:
```python
model.score()
```

## Evaluate the model

Compare predictions on test data againt y_test values.

The method depends on which ML algorithm is being used.

# Bias Variance Trade-Off

The bias-variance trade-off is the point where we are adding noise by adding model complexity (flexibility) without increasing the performance of the model on unseen data. The training error goes down as it has to, but the test error starts to go up. This is also known as overfitting. 

![Bias Variance Trade-Off](https://i0.wp.com/www.coriers.com/wp-content/uploads/2019/06/bias-graph-analytics.png?resize=894%2C599&ssl=1)