In [5]:
from IPython.display import HTML
css_file = './custom.css'
HTML(open(css_file, "r").read())

# Bias-Variance Trade-Off

© 2018 Daniel Voigt Godoy

## 1. Definition

Every time you train a model and then test it on some ***unseen data***, you're going to get ***prediction errors***.

These errors can be ***decomposed*** into three distinct terms: ***bias***, ***variance*** and ***noise***.

Let's start with the last term: ***noise***. This is the ***irreducible*** part of the error. Data is ***noisy***. Sometimes the noise can be cleaned up (removing outliers, for instance), but more often than not, there will be a level of noise you simply cannot get rid of.

Since we cannot do much about the noise, let's check the other two:

1. ***Bias***: 

    1.1 If a model has high ***bias***, it means it is ***consistently*** either ***under-*** or ***over-*** shooting the actual values (on unseen data!)
    
    1.2 This is due to ***wrong assumptions*** about the model, i.e., the model is ***too simple*** to represent the data - like ***fitting a line*** to ***quadratic data***.

    1.3 ***High Bias*** is usually a sign of a model ***underfitting*** the training data.
    
    1.4 If you add ***more data*** and retrain the model, its coefficients ***do not change much***.

    1.5 ***IMPORTANT: DO NOT*** confuse this bias with the ***bias term (b0)*** from a linear model!
    

2. ***Variance***: 

    2.1 If a model has high ***variance***, it means it is ***too sensitive*** to variations in the training data.
    
    2.2 This is due to a ***too complex*** model - it is fitting the noisy data too good!
    
    2.3 ***High Variance*** is usually a sign of a model ***overfitting*** the training data.
    
    2.4 If you add ***more data*** and retrain the models, its coefficients are likely ***changing a lot***.
    
    
The plot below illustrates the difference:

![bias variance](bias_variance.png)
<center>Source: Scott Fortmann-Roe - Understanding the Bias-Variance Tradeoff</center>


### Talking about model complexity...
![](https://imgs.xkcd.com/comics/curve_fitting.png)
<center>Source: <a href="https://xkcd.com/2048/">XKCD</a></center>

## 2. Experiment

Time to try it yourself!

This is a dataset with 12 points only. We start with two of them and fit a line to it.

The controls below allow you to:
- change the ***degree*** of the polynomial to fit the data to (using degree + 1 points)
- add ***more samples*** to the training data (up to 10 samples for 1st degree down to 4 samples for 8th degree), ***fitting it again***
- include ***validation*** and ***test*** data

The ***left plot*** contains the data, and the ***fitted model*** to it (green line).

The ***upper right*** plot has the model's coefficients, as many as the polynomial degree you chose.

The ***lower right*** plot has the ***average error*** when fitting the model with as ***many samples*** as shown in the horizontal axis. It starts with ***zero*** error for ***two points***.

Use the controls to play with different configurations and answer the ***questions*** below.

In [6]:
from intuitiveml.supervised.regression.BiasVariance import *

In [7]:
x_train, y_train, x_val, y_val, x_test, y_test = data()
f = build_figure(x_train, y_train, x_val, y_val, x_test, y_test)
vb = VBox(f, layout={'align_items': 'center'})

In [8]:
vb

VBox(children=(FigureWidget({
    'data': [{'marker': {'color': 'green'},
              'mode': 'markers',
   …

#### Questions

1. For a ***degree of 1***:
    - increase the number of extra samples to 5 
        - what happens to the ***training error***?
        
        increases the coefficients and the training error
        
    - keep increasing the number of samples up to 10 
        - what happens to the ***training error***? 
        increases more
        - what about the ***coefficients***?
        not much
        
    - what is this behavior a sign of? High ***bias*** or high ***variance***? Why?
    
    
2. For a ***degree of 5***:
    - increase the number of extra samples one by one up to 6 (to the total of 12 data points)
        - what happens to the ***training error***?
        
        increases
        
        - what about the ***coefficients***?
        even out?
        
    - what is this behavior a sign of? High ***bias*** or high ***variance***? Why?
 

3. Looking at the ***training error*** only? Which model (degree) is the best? Why?


4. Is this enough information to settle on a given model? If not, why?

### 2.1 Train - Validation - Test Split

- ***Training Set***: the data you use to, obviously, ***train*** your model - you can use and abuse this data!


- ***Validation Set***: the data you should only use to ***hyper-parameter tuning***, that is, comparing differently parameterized models trained on the training data, to decide which parameters are best. 

    You should use, but ***not*** abuse this data, as it is intended to provide an ***unbiased*** evaluation of your model and, if you mess around with it too much, you'll end up incorporating knowledge about it in your model without even noticing.


- ***Test Set***: the data you should use only ***once***, when you are done with everything else, to check if your model is still performing well.

    I like to pretend this is data from the ***"future"*** - that particular day in the future when my model is ready to give it a go in the real world! So, until that day, I cannot know this data, as the future hasn't arrived yet :-)
    
This is a nice representation of the split:

![](train-validate-test.png)
<center>Source: David Ziganto - Model Tuning (Part 2 - Validation & Cross-Validation) </center>

Now you can go back to the ***experiment*** to answer the questions below.

#### More questions:

5. Now, set ***extra samples*** to 10 and add ***validation data***:
    - for a ***degree of 1***:
        - what happens to the ***validation error***?
        - how ***far apart*** both errors, training and validation, are?
        - what is this behavior a sign of? High ***bias*** or high ***variance***? Why?
    - increase the ***degree*** one by one:
        - what happens to the ***distance*** between the two error curves?


6. Looking at both ***training*** and ***validation*** errors, which model (degree) is the best? Why?


7. Set ***degree*** to your choice of best model and add ***test data***:
    - how does ***test error*** compare to ***validation*** and ***training*** errors?
    - are you happy with your choice of best model? Why?

## 3. Learning Curves

After performing the experiment and answering the question, you probably observed some differences in the training, validation and test curves as the number of samples got increased.

There are typical patterns, depending on how well your model is tuned:

![](bias-variance2.png)

<center>Source: Utku Ufuk - Learning Curves in Linear & Polynomial Regression </center>

- High Bias (left plot): training error is ***high*** even if you add more samples, test error is ***high**** but there is ***little gap*** between the two

- High Variance (right plot): training error is ***very low***, but test error is ***high*** - there is a ***huge gap***

- "Just right" (center plot): training error sits between the other two situations, and test error converges to a similar level as new samples are added to the training - there is a ***little gap***, but the overall level is lower than the "high bias" situation

## 4. The Trade-Off

To summarize it, as the ***model complexity*** increases:

- ***variance*** increases
- ***bias*** decreases
- ***total error*** goes ***down*** up to a point and then ***up*** again

![](bias_vs_variance_error.png)
<center>Source: Scott Fortmann-Roe - Understanding the Bias-Variance Tradeoff</center>

There is a ***sweet spot*** of model complexity where the ***total error*** is at its ***minimum***.

## 5. Scikit-Learn

[Train Test Split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

[Cross-validation](https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation)

[Learning Curves](https://scikit-learn.org/stable/modules/learning_curve.html#learning-curve)

## 6. More Resources

[Model Tuning and the Bias-Variance Tradeoff](http://www.r2d3.us/visual-intro-to-machine-learning-part-2/)

[Understanding the Bias-Variance Tradeoff](http://scott.fortmann-roe.com/docs/BiasVariance.html)

[Learning Curves in Linear & Polynomial Regression](https://utkuufuk.github.io/2018/05/04/learning-curves/)

[Model Evaluation & Selection](https://heartbeat.fritz.ai/model-evaluation-selection-i-30d803a44ee)

[Learning Curves for Machine Learning](https://www.dataquest.io/blog/learning-curves-machine-learning/)

[Model Tuning (Part 2 - Validation & Cross-Validation)](https://dziganto.github.io/cross-validation/data%20science/machine%20learning/model%20tuning/python/Model-Tuning-with-Validation-and-Cross-Validation/)

[How (and Why) to create a good validation set](https://www.fast.ai/2017/11/13/validation-sets/)

[How (dis)similar are my train and test data?](https://towardsdatascience.com/how-dis-similar-are-my-train-and-test-data-56af3923de9b)

[Validation strategies for your machine learning model](https://heartbeat.fritz.ai/model-evaluation-selection-i-30d803a44ee)

#### This material is copyright Daniel Voigt Godoy and made available under the Creative Commons Attribution (CC-BY) license ([link](https://creativecommons.org/licenses/by/4.0/)). 

#### Code is also made available under the MIT License ([link](https://opensource.org/licenses/MIT)).

In [9]:
from IPython.display import HTML
HTML('''<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>''')