## 5.4.4 Regularizing your model
Regularization techniques are a set of best practices that actively impede the model’s ability to fit perfectly to the training data, with the goal of making the model perform better during validation. This is called “regularizing” the model, because it tends to make the model simpler, more “regular,” its curve smoother, more “generic”; thus it is less specific to the training set and better able to generalize by more closely approximating the latent manifold of the data.

Let’s review some of the most common regularization techniques and apply them in practice to improve the movie-classification model from chapter 4.

## 1- Reducing the network’s size
The simplest way to mitigate overfitting is to reduce the size of the model by reducing The number of learnable parameters in the model by:
1. Reducing The number of layers
2. Reducing The number of units per layer

You have to find a balance for your model, so it has:
1. Limited memorization resources, so it won’t be able to simply memorize its training data.
2. Enough parameters that they don’t underfit.

## 2- Adding weight Regularization
- It’s to put constraints on the complexity of a model by forcing its weights to take only small values, which makes the distribution of weight values more regular.
- It’s done by adding to the loss function of the model a cost associated with having large weights. This cost comes in two flavors:
    1. L1 regularization—The cost added is proportional to the absolute value of the weight coefficients (the L1 norm of the weights).
    2. L2 regularization (weight decay)—The cost added is proportional to the square of the value of the weight coefficients (the L2 norm of the weights).

In [None]:
from tensorflow.keras import regularizers
model = keras.Sequential([
    layers.Dense(16,
                 kernel_regularizer=regularizers.l2(0.002),
                 activation="relu"),
    layers.Dense(16,
                 kernel_regularizer=regularizers.l2(0.002),
                 activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
history_l2_reg = model.fit(
    train_data, train_labels,
    epochs=20, batch_size=512, validation_split=0.4)