# Bias Variance Tradeoff
Agenda today:
- Overfitting and Underfitting: Bias Variance Tradeoff
- Validating your model: K-fold Cross Validation 
    - Validation set
    - Leave-one-out cross validation (LOOCV)
    - K-fold Cross Validation

### What is the difference between an algorithm and a model?


## Part I. Bias and Variance 
In module 2 project, you might have built a regression model that performed really well on your data, that is highly complex. However, does it really have good generalizability to unseen data? That is the problem of tradeoff: we want a model that does not fit to every squiggle of the data but learns the general and underlying pattern such that it can perform well on unseen data. So what is bias and what is variance?

#### Bias
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the underlying pattern. It always leads to high error on training and test data.

Models with high bias tend to underfit, which doesn't learn the signal of our training data and of course, generalize poorly to testing data.

#### Variance
Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to variability training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.

Models with high variance tend to overfit, which learns all the noises of the training data and generalizes poorly to the testing data. 


<img src="attachment:Screen%20Shot%202019-03-07%20at%202.23.37%20PM.png" style="width:500px;">

<img src="attachment:Screen%20Shot%202019-03-07%20at%202.23.37%20PM.png" style="width:500px;">

<img src="attachment:Screen%20Shot%202019-03-07%20at%2011.30.43%20AM.png" style="width:500px;">

#### MSE and RMSE
MSE stands for mean squared error, it is the variance of the estimated values from the actual value. RMSE stands for root-mean square error, and it is the standard deviation of the residual. The RMSE is a measure of how much each prediction is away from the actual y value. The smaller RMSE is, the closer the prediction is to the actual values. Note that RMSE and MSE is unstandardized--meaning it is calculated in the original unit of the values measure. Therefore, you can have a good $R^2$ but still a high RMSE if the original unit of measurement is large.

$$ MSE = \frac{1}{n} \sum(Y_i - \hat Y_i)^2 $$

## Part II. Cross Validation
What are some of the obvious problems with using train test split?
In order to tackle this problem, we need cross validation. There are a few ways of working with cross validation

#### 1. Train Test Split
As you probably have guessed it, train test split is splitting the data into a training set, for fitting the model, and a validation, or hold-out, or testing set. The fitted model is then used for prediction on the validation set. The resulting validation set error rate--typically assesed using MSE, provides an estimate of the test error rate. The advantage of train test split is apparent - we no longer have an overfitted model trained and tested on the same dataset. However, what could be some of the drawback of using train test split?


#### 2. Leave one out cross validation
The Leave-One-Out Cross Validation is an approach it address the drawback of train-test-split. Instead of partitioning the data into subsets of equal sizes, we hold out one single observation $(x_1,y_1)$ as the validation set, and the remaining observations are used for training the model. The MSE will be evaluated as $(\hat y - \bar y)^2$. We will then use this approach on every single observation and compute MSE for the $nth$ term and average them, giving the Leave one out MSE as:


<center>   $CV = \frac{1}{n} \sum MSE_i$

What are some of the drawback regarding this method of cross-validation?

<img src=attachment:Screen%20Shot%202019-03-07%20at%203.36.27%20PM.png style="width:500px;">

#### 3. K-fold Cross Validation
What is the disadvantage of LOOCV? As you can imagine, this method can be costly to implement, especially when the sample size $n$ is very large. Let's take a look at another, perhaps most commonly used validation method--k-fold validation. 

This approach is an alternative to LOOCV and it involves randomly dividing the set of observations into k groups, or folds, of approximately equal size. The first fold is treated as a validation set, and the method is fit the model on the remaining k − 1 folds. The mean squared error, $MSE_1$, is then computed on the observations in the held-out fold. This procedure is repeated k times; each time, a different group of observations is treated as a validation set. This process results in k estimates of the test error, MSE1, MSE2, . . . , MSEk. The k-fold CV estimate is computed by averaging these testing error.

<img src="attachment:Screen%20Shot%202019-03-11%20at%207.34.04%20AM.png" style="width:500px;">