# Overfitting and Underfitting

Reference: [Kaggle](https://www.kaggle.com/code/ryanholbrook/overfitting-and-underfitting/tutorial)

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn

import tensorflow as tf

- we'll examine at the learning curves for evidence of underfitting and overfitting and look at a couple of strategies for correcting it.

## Interpreting Learning Curves

- we have two kinds of information in *training data*: **signal** and **noise**.
    - ***signal***: part that generalizes, help your model to make predictions from new data
    - ***noise***: part that is *only* true for training data
- we train a model by choosing weights or parameters that minimize the loss on a training set
- **leaning curves** are the plots of loss on the training and on the validation set, epoch by epoch.
- the training loss will go down when the mode learns signal or when it learns noise
- we have to mkae a trade: we can get the model to learn more signal at the cost of learning more noise. 
    - as long as the trade is in our favor, the validation loss will continue to decrease
- this trade-off indicates that there can be two problems that occur when training model: **not enough signal** or **too much noise**.
    - **Underfitting** the training set is when the loss is not as low as it could be because the model hasn't learned enough *signal*
    - **Overfitting** the training set is when the loss is not as low as it could be because the model learned too much *noise*

## Capcity

- refers to the *size* and *complexity* of the patterns it is able to learn
    - in NN, this will largely be determined by how manyneurons it has and how they are connected together
    - if the network is underfitting, you should ttry increasing its capacity
- to increase the capacity of a network you can either make it *wider* (more unis to existing layers) or make it *deeper* (adding more layers)
    - wider networks have an easier time learning more linear relationships
    - deeper networks prefer more nonlinear ones

In [3]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(16, activation='relu'),
    layers.Dense(1),
])

wider = keras.Sequential([
    layers.Dense(32, activation='relu'),
    layers.Dense(1),
])

deeper = keras.Sequential([
    layers.Dense(16, activation='relu'),
    layers.Dense(16, activation='relu'),
    layers.Dense(1),
])