# Overfitting and Underfitting

In machine learning, **overfitting** and **underfitting** are two common problems that arise when training models. Understanding these concepts is crucial for building models that generalize well to new data.

## Underfitting

Underfitting occurs when a model is too simple to capture the underlying patterns in the training data. This results in poor performance on both the training data and unseen test data.

**Characteristics of Underfitting:**

*   High bias
*   Low variance
*   Poor performance on training and test data

**Causes of Underfitting:**

*   Model is too simple (e.g., linear model for non-linear data)
*   Insufficient training data
*   Features are not informative enough

## Overfitting

Overfitting occurs when a model is too complex and learns the training data too well, including the noise and random fluctuations. This results in excellent performance on the training data but poor performance on unseen test data.

**Characteristics of Overfitting:**

*   Low bias
*   High variance
*   Excellent performance on training data
*   Poor performance on test data

**Causes of Overfitting:**

*   Model is too complex (e.g., too many layers or neurons in a neural network)
*   Insufficient training data
*   Training for too many epochs
*   Lack of regularization techniques

## Strategies to Combat Overfitting

Several techniques can be employed to mitigate overfitting. We will explore three common methods:

*   Reducing the network's size
*   Adding weight regularization
*   Adding dropout

### Reducing the Network's Size

A large network with many parameters has a higher capacity to memorize the training data, which can lead to overfitting. Reducing the number of layers or neurons in a network can help to limit its capacity and encourage it to learn more generalizable patterns.

**Intuition:** A smaller model has fewer degrees of freedom and is less likely to fit the noise in the data.

**Example:** Consider a neural network with multiple hidden layers and a large number of neurons in each layer. This network has a high capacity.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Example of a larger model prone to overfitting
def build_larger_model(input_shape):
    model = keras.Sequential([
        layers.Dense(128, activation='relu', input_shape=input_shape),
        layers.Dense(64, activation='relu'),
        layers.Dense(1, activation='sigmoid')
    ])
    return model

# Example of a smaller model
def build_smaller_model(input_shape):
    model = keras.Sequential([
        layers.Dense(32, activation='relu', input_shape=input_shape),
        layers.Dense(1, activation='sigmoid')
    ])
    return model

# Example usage (assuming you have training data X_train, y_train)
# input_shape = (X_train.shape[1],)
# larger_model = build_larger_model(input_shape)
# smaller_model = build_smaller_model(input_shape)

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Load the dataset
df = pd.read_csv('/content/sample_data/california_housing_train.csv')
print("Dataset Head:")
display(df.head())

# Prepare data for the model
X = df.drop('median_house_value', axis=1)
y = df['median_house_value']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Determine input shape after splitting data
input_shape = (X_train.shape[1],)

Dataset Head:


Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0


### Adding Weight Regularization

Weight regularization is a technique that adds a penalty to the loss function based on the magnitude of the model's weights. This encourages the model to use smaller weights, which can lead to simpler models and reduce overfitting.

There are two common types of weight regularization:

*   **L1 Regularization (Lasso):** Adds a penalty proportional to the absolute value of the weights. This can lead to sparse weights, effectively setting some weights to zero and performing feature selection.

    $$ L1 \, Penalty = \lambda \sum_{i} |w_i| $$

*   **L2 Regularization (Ridge):** Adds a penalty proportional to the square of the weights. This encourages weights to be small but rarely forces them to be exactly zero.

    $$ L2 \, Penalty = \lambda \sum_{i} w_i^2 $$

In both cases, $\lambda$ is a hyperparameter that controls the strength of the regularization. A larger $\lambda$ imposes a stronger penalty on the weights.

The regularized loss function becomes:

$$ Loss_{regularized} = Loss_{original} + Regularization \, Penalty $$

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import regularizers
from sklearn.model_selection import train_test_split
import pandas as pd

# Load the dataset
df = pd.read_csv('/content/sample_data/california_housing_train.csv')
print("Dataset Head:")
display(df.head())

# Prepare data for the model
X = df.drop('median_house_value', axis=1)
y = df['median_house_value']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Determine input shape after splitting data
input_shape = (X_train.shape[1],)


# Example of adding L2 regularization to a dense layer
def build_model_with_l2_regularization(input_shape, l2_lambda=0.001):
    model = keras.Sequential([
        layers.Dense(64, activation='relu', input_shape=input_shape,
                     kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.Dense(64, activation='relu',
                     kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.Dense(1) # Changed activation to linear for regression task
    ])
    return model

# Example of adding L1 regularization to a dense layer
def build_model_with_l1_regularization(input_shape, l1_lambda=0.001):
    model = keras.Sequential([
        layers.Dense(64, activation='relu', input_shape=input_shape,
                     kernel_regularizer=regularizers.l1(l1_lambda)),
        layers.Dense(64, activation='relu',
                     kernel_regularizer=regularizers.l1(l1_lambda)),
        layers.Dense(1) # Changed activation to linear for regression task
    ])
    return model

# Example usage

model_l2 = build_model_with_l2_regularization(input_shape)
model_l1 = build_model_with_l1_regularization(input_shape)

print("\nModel with L2 Regularization Summary:")
model_l2.summary()

print("\nModel with L1 Regularization Summary:")
model_l1.summary()

Dataset Head:


Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0



Model with L2 Regularization Summary:


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)



Model with L1 Regularization Summary:


### Adding Dropout

Dropout is a regularization technique that randomly sets a fraction of the neurons in a layer to zero during training. This prevents neurons from becoming too co-adapted and reliant on specific inputs, forcing the network to learn more robust features.

During training, for each training example, a different subset of neurons is dropped out. During inference (testing), dropout is turned off, and the weights of the remaining neurons are scaled to compensate for the dropped-out neurons.

**Intuition:** Dropout can be seen as training an ensemble of many different subnetworks.

**How it works:**

For a layer with $n$ neurons, during training, each neuron has a probability $p$ of being kept active and a probability $1-p$ of being dropped out (set to zero). The output of the layer is then scaled by $1/p$.

During inference, all neurons are active, and the weights are multiplied by $p$.

**Example:** Consider a dense layer with 128 neurons. With a dropout rate of 0.5, during training, approximately half of the neurons will be randomly deactivated for each training example.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
import pandas as pd

# Load the dataset (already loaded in the previous cell, but keeping for standalone execution)
df = pd.read_csv('/content/sample_data/california_housing_train.csv')
print("Dataset Head:")
display(df.head())

# Prepare data for the model (already prepared in the previous cell, but keeping for standalone execution)
X = df.drop('median_house_value', axis=1)
y = df['median_house_value']


# Split data into training and testing sets (already split in the previous cell, but keeping for standalone execution)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Determine input shape after splitting data
input_shape = (X_train.shape[1],)


# Example of adding dropout to a neural network
def build_model_with_dropout(input_shape, dropout_rate=0.5):
    model = keras.Sequential([
        layers.Dense(128, activation='relu', input_shape=input_shape),
        layers.Dropout(dropout_rate),
        layers.Dense(64, activation='relu'),
        layers.Dropout(dropout_rate),
        layers.Dense(1) # Changed activation to linear for regression task
    ])
    return model

# Example usage
model_dropout = build_model_with_dropout(input_shape)

print("\nModel with Dropout Summary:")
model_dropout.summary()

Dataset Head:


Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0



Model with Dropout Summary:


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Summary and Conclusion

### Summary

*   **Underfitting:** Occurs when a model is too simple to capture data patterns, leading to poor performance on both training and test data.
*   **Overfitting:** Occurs when a model is too complex and learns the training data (including noise) too well, leading to excellent training performance but poor test performance.
*   **Strategies to Combat Overfitting:**
    *   **Reducing Network Size:** Decreasing layers or neurons limits model capacity.
    *   **Weight Regularization (L1 and L2):** Adds a penalty to the loss function based on weight magnitude, encouraging smaller weights.
    *   **Dropout:** Randomly deactivates neurons during training to prevent over-reliance on specific inputs and encourage robust feature learning.

### Conclusion

Understanding and addressing underfitting and overfitting are crucial for building machine learning models that generalize well to unseen data. By employing techniques like reducing network size, adding weight regularization, and implementing dropout, we can mitigate overfitting and improve model performance on new data. The choice of technique and hyperparameters often requires experimentation and validation to find the optimal balance.