# Overfitting

Overfitting happens when a model learns the training data too well, to the point that it captures noise or irrelevant patterns that are specific to the training set but do not apply to new, unseen data. An overfit model tends to have high variance and performs poorly on test or validation data. It essentially "memorizes" the training examples instead of learning the general underlying patterns. Signs of overfitting include very low training error but high test error or poor performance on new data.

# Underfitting

Underfitting occurs when a model is too simplistic and fails to capture the underlying patterns in the training data. An underfit model may have high bias and performs poorly both on the training data and new data. It essentially oversimplifies the problem and does not have enough capacity to learn the true relationship between the input features and the target variable. Signs of underfitting include both high training and test errors.

# Regularization

Regularization is a technique used to address overfitting by adding a penalty term to the model's objective function. The penalty term discourages the model from learning overly complex or intricate patterns from the training data. It helps to prevent the model from relying too heavily on any particular feature or from fitting noise in the data. Regularization techniques commonly used include L1 regularization (Lasso), L2 regularization (Ridge), and Elastic Net, each with different ways of penalizing model complexity.

Regularization can help mitigate underfitting to some extent, but it is primarily used to address overfitting. Underfitting occurs when a model is too simple and fails to capture the underlying patterns in the data. In such cases, regularization alone may not be sufficient to overcome underfitting.

# Movie Review Classification Example

In [1]:
from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

#vectorizing input data
import numpy as np

def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

#print(x_train[0])

#vectorizing labels
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')


In [None]:
#originally

from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

In [4]:
#smaller network

from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(4, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(4, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

In [5]:
#bigger network

model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

# L1 regularization   

The cost added is proportional to the absolute value of the weight coefficients (the L1 norm of the weights).


# L2 regularization

The cost added is proportional to the square of the value of the weight coefficients (the L2 norm of the weights). L2 regularization is also called weight decay in the context of neural networks.

In [6]:
from keras import regularizers

model = models.Sequential()
model.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001), activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

# DropOut

In [None]:
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))