# 1. Causes of Error

### Bias-variance dilemma and no. of features
High bias: Pays little attention to data, oversimplified
- High error on training set
- Low r^2, large SSE
High variance: Pays too much attention to data (does not generalise well), overfits.
- Much higher error on test set than on training data

E.g. 
- few features used (if you have access to lots more) -> high bias.
- Carefully minimised SSE (used lots of features, tuned parameters) -> High variance

Want min number of features (simplicity) to achieve good accuracy (goodness of fit)
- Few features, large r^2, low SSE.

[overfit regression](images/p1-4-1-1.png)

# 2. Curse of Dimensionality
As the number of **features or dimensions grows**, the amount of **data** we need to **generalise accurately** grows **exponentially**.

e.g. KNN: distance or similarity function that assumes veerything is equally relevant, you'll have to see a lot of data before it washes itself away.

e.g. 10 points uniformly distributed across a line segment. Each point owns a uniform part of the line segment. (// KNN)
-> Move from a line segment to a 2D space. Each `x` still represents 1/10th of the space, but now it represents a bigger space. The farthest point that the first `x` is representing has a much larger distance. 
-> Q: How to make it such that each `x` has the same farthest-point-'diameter'? -> many more `x`s, e.g. 100.

Think of it as points covering a space. If you want to cover the same amount of hyperspace...
More features -> more volume to fill.

# 3. Learning Curves

A **Learning Curve** is: a graph that compares the performance of a model on training and testing data over a varying number of training instances.

- Should generally see performance improve as the number of training points increases.

- By separating training and testing sets and graphing performance on each separately, we can get a better idea of how well the model can generalize to unseen data.

A learning curve allows us to **verify when a model has learned as much as it can about the data**. When this occurs, the performance on both training and testing sets plateau and there is a consistent gap between the two error rates.

### Bias
When the training and testing errors converge and are quite high this usually means 
-> the model is biased. 
- No matter how much data we feed it, the model cannot represent the underlying relationship and therefore has systematic high errors.

### Variance
When there is a large gap between the training and testing error this generally means 
-> the model suffers from high variance. 
- Unlike a biased model, models that suffer from variance generally require more data to improve. 
- We can also limit variance by simplifying the model to represent only the most important features of the data.

## Ideal Learning Curve
The ultimate goal for a model is one that **has good performance that generalizes well to unseen data**. In this case, both the **testing and training curves converge at similar values**. 
- The smaller the gap between the training and testing sets, the better our model generalizes. 
- The better the performance on the testing set, the better our model performs.

## Model Complexity
The visual technique of graphing performance is not limited to learning. With most models, we can change the complexity by changing the inputs or parameters.

A **model complexity graph** looks at training and testing curves as the model's complexity varies. The most common trend is that **as a model increases (in complexity), bias will fall off and variance will rise.**

Scikit-learn provides a tool for validation curves which can be used to monitor model complexity by varying the parameters of a model. We'll explore the specifics of how these parameters affect complexity in the next course on supervised learning.

### Learning Curves and Model Complexity
So what is the relationship between learning curves and model complexity?

If we were to take the learning curves of the same machine learning algorithm with the same fixed set of data, but create several graphs at different levels of model complexity, all the learning curve graphs would fit together into a 3D model complexity graph.

If we took the final testing and training errors for each model complexity and visualized them along the complexity of the model we would be able to see how well the model performs as the model complexity increases.

## Practical use of Model Complexity
Knowing that **we can identify issues with bias and variance by analyzing a model complexity graph**, we now have a visual tool to help identify ways to optimize our models.

This will be one of the core tools we use in the upcoming project.

In the final section, we will introduce cross validation and grid search, which will give us a concrete, systematic way of searching through different levels of complexity to find the optimal model that complexity and learning curves give us a holistic understanding of.