# Testing and Validating

# Testing
How well is your model generalizing? - The only way to do this is to try it out!

## Method 1: Put it in production
- Put the model in production
- Monitor how well it is doing

### Not a good idea
If the model performs terribly bad, the users will complain

## Method 2: Split the data
Split the data in two:
- **Training Set**: Used to train the model
- **Test Set**: Used to test how well the model is doing

### Generalizing Error
The error rate on the Test Set

## Defining Overfitting
If the training error is low but the generalizing error is high <br>
![Overfitting](images/overfitting.png)

# Validating
Which model is the best for you dataset?

## Can money buy happiness?
Say we have 5 models for predicting happiness from GDP:
- Linear Regression
- Linear Regression with tuned hyperparameters
- 3-Nearest Neighbors (3-NN)
- 4-Nearest Neighbors (4-NN)
- 5-Nearest Neighbors (5-NN)

Which is best? You have to test them!

### Approach 1
- Split the dataset in two: ***training set*** and ***test set***
- Train all the models on the training set
- Find the ***generalization error*** for all the models on the test set:<br>
  13%, 5%, 10%, 7%, 9%
- Whichever has the ***least*** generalization error is the best model:<br>
  Linear regression with hyperparams

### Then you put it in production
And find out, the error rate is actually 12%<br>
What just happened?

We selected the model that performed best on a particular datatset (test dataset). So, it's unlikely that the model will perform as well with new data

### Approach 2: ***Holdout Vaidation***
- Split the dataset in two: ***training set*** and ***test set***
- Split the training set in two: ***reduced training set** and ***validation set***
- Train all the models on the reduced training set
- Find the ***generalization error*** for all the models on the ***validation set***:<br>
  13%, 5%, 10%, 7%, 9%
- Whichever has the ***least*** generalization error is the best model:<br>
  Linear regression with hyperparams
- Train the model on the full training set
- Test the model on test set: 12% error

## Size of Validation Set
- If it's too small: You may end up choosing a sub-optimal model
- If it's too large:
  - The size of the reduced training set and the full trainig set will be quite different
  - So, the initial model (which is tested with validation dataset) and the final model (wich is tested with the test dataset) will be two very different models
  - "It would be like selecting the fastest sprinter for marathon"

Solution in next page

### Cross-validation
How to pick rows for validation set?<br>
![cross validation](images/cross_validation.jpg)<br>
More: https://www.youtube.com/watch?v=fSytzGwwBVw

### Drawback
Training time is multiplied by number of folds

# Data Mismatch
Say you want to create an app that identifies flower species<br>
![flower identification](images/flower_detection.webp)

## How do you prepare a large dataset?
Download millions of photos from the web

### Drawback
They won’t be perfectly representative of the pictures that will actually be taken using the app on a mobile device

Suppose you have 1000 representative pictures (that are actually taken with the app)

In that case, the validation set and test set should be as representative as possible
- Training set: Web images from Google
- Validation set: Half of the representative set
- Test set: Other half of the representative set

If you do the opposite, you won't be able to test it really. Beware of customer complaints!

### If your model performs badly on the validation set, what does it mean?
- Is the model overfitting?
- Or, is it because of the data mismatch?

We don't know

### Solution: Train-Dev Set
- Hold out some data from the trainting set - call it, train-dev set
- If the model performs well on train-dev set but not on the test set: Its data mismatch
- If the model performs poorly on train-dev set: Its overfitting

# No Free Lunch Theorem
Which model performs the best, a priori?