# Overfitting and Underfitting With Machine Learning Algorithms

You learned about the terminology of generalization in machine learning of overfitting and underfitting:

- **Overfitting**: Good performance on the training data, poor generliazation to other data.
- **Underfitting**: Poor performance on the training data and poor generalization to other data.

[link_here](https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/)

# How to Avoid Overfitting in Deep Learning Neural Networks

The model has not generalized.

- **Underfit Model**. A model that fails to sufficiently learn the problem and performs poorly on a training dataset and does not perform well on a holdout sample.
- **Overfit Model**. A model that learns the training dataset too well, performing well on the training dataset but does not perform well on a hold out sample.
- **Good Fit Model**. A model that suitably learns the training dataset and generalizes well to the old out dataset.


**An underfit model** has high bias and low variance. An overfit model has low bias and high variance. We can address underfitting by increasing the capacity of the model. Capacity refers to the ability of a model to fit a variety of functions; more capacity, means that a model can fit more types of functions for mapping inputs to outputs. Increasing the capacity of a model is easily achieved by changing the structure of the model, such as adding more layers and/or more nodes to layers.

**An overfit model** is easily diagnosed by monitoring the performance of the model during training by evaluating it on both a training dataset and on a holdout validation dataset. Graphing line plots of the performance of the model during training, called learning curves, will show a familiar pattern.

## Reduce Overfitting by Constraining Model Complexity:
There are two ways to approach an overfit model:

1. Reduce overfitting by training the network on more examples.
2. Reduce overfitting by changing the complexity of the network.

A model can overfit a training dataset because it has sufficient capacity to do so. Reducing the capacity of the model reduces the likelihood of the model overfitting the training dataset, to a point where it no longer overfits.

The capacity of a neural network model, it’s complexity, is defined by both it’s structure in terms of nodes and layers and the parameters in terms of its weights. Therefore, we can reduce the complexity of a neural network to reduce overfitting in one of two ways:

1. Change network complexity by changing the network structure (number of weights).
2. Change network complexity by changing the network parameters (values of weights).

It is more common to instead constrain the complexity of the model by ensuring the parameters (weights) of the model remain small. Small parameters suggest a less complex and, in turn, more stable model that is less sensitive to statistical fluctuations in the input data.

Techniques that seek to reduce overfitting (reduce generalization error) by keeping network weights small are referred to as regularization methods. More specifically, regularization refers to a class of approaches that add additional information to transform an ill-posed problem into a more stable well-posed problem.


## Regularization Methods for Neural Networks

The most common regularization method is to add a penalty to the loss function in proportion to the size of the weights in the model.

1. [Weight Regularization (weight decay)](https://machinelearningmastery.com/weight-regularization-to-reduce-overfitting-of-deep-learning-models/): Penalize the model during training based on the magnitude of the weights.

This will encourage the model to map the inputs to the outputs of the training dataset in such a way that the weights of the model are kept small. This approach is called weight regularization or weight decay and has proven very effective for decades for both simpler linear models and neural networks.

Below is a list of five of the most common additional regularization methods.

1. [Activity Regularization](https://machinelearningmastery.com/how-to-reduce-generalization-error-in-deep-neural-networks-with-activity-regularization-in-keras/): Penalize the model during training base on the magnitude of the activations.
2. [Weight Constraint](https://machinelearningmastery.com/introduction-to-weight-constraints-to-reduce-generalization-error-in-deep-learning/): Constrain the magnitude of weights to be within a range or below a limit.
3. [Dropout](https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/): Probabilistically remove inputs during training.
4. [Noise](https://machinelearningmastery.com/train-neural-networks-with-noise-to-reduce-overfitting/): Add statistical noise to inputs during training.
5. [Early Stopping](https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/): Monitor model performance on a validation set and stop training when performance degrades.

Most of these methods have been demonstrated (or proven) to approximate the effect of adding a penalty to the loss function.

Each method approaches the problem differently, offering benefits in terms of a mixture of generalization performance, configurability, and/or computational complexity.

## Regularization Recommendations
This section outlines some recommendations for using regularization methods for deep learning neural networks.

You should always consider using regularization, unless you have a very large dataset, e.g. big-data scale.

A good general recommendation is to design a neural network structure that is under-constrained and to use regularization to reduce the likelihood of overfitting.

Some more specific recommendations include:

- Classical: use early stopping and weight decay (L2 weight regularization).
- Alternate: use early stopping and added noise with a weight constraint.
- Modern: use early stopping and dropout, in addition to a weight constraint.

These recommendations would suit Multilayer Perceptrons and Convolutional Neural Networks.

Some recommendations for recurrent neural nets include:

- Classical: use early stopping with added weight noise and a weight constraint such as maximum norm.
- Modern: use early stopping with a backpropagation-through-time-aware version of dropout and a weight constraint.

There are no silver bullets when it comes to regularization and systematic experimentation is strongly encouraged.

[link_here](https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/)