# Controlling the Learning Process

In this chapter we 
- observe the training progress of a neural network model
- recognize and diagnose unwanted situations during training

## Preamble

In [None]:
from IPython.display import HTML

## Interactive Demo: Learning Progress

The [TensorFlow Playground](https://playground.tensorflow.org/) provides an interactive demo of a simple neural network for classification.

In [None]:
HTML('<iframe src=https://playground.tensorflow.org/ width=100% height=700></iframe>')

## Preamble

In [None]:
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy

## Built-In and Custom Losses

During training, a **loss function** or **objective function** is optimized. As discussed in previouschapters, choosing an appropriate criterion to optimize is an essential step in any machine learning process.

`keras` provides a few common losses as built-in functions, for example:

In [None]:
keras.losses.mae

However, if the desired loss function is not there, it is straightforward to implement it efficiently with `tensorflow` operations: Here is the **Root Mean Squared Error** for two vectors $y$ and $\hat{y}$:

$$R M S E(y, \hat{y}) =\sqrt{\frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-{\hat{y}}_{i}\right)^{2}}$$


In [None]:
def root_mean_squared_error(y_true, y_pred):
        return keras.backend.sqrt(
            keras.backend.mean(
                keras.backend.square(y_pred - y_true)
            )
        ) 

## Visualizing Training History 

## Training and Validation Loss

Training loss is almost always smaller than validation loss, since the model has had a chance to see the training data already. However, validation loss should follow closely - if there is a large gap, this is  a sign of overfitting.

## Unwanted Situations and Remedies

**Overfitting through Overtraining**

Observing the training loss and validation loss curves, we frequently encounter the following picture with prolonged training:

![](https://i.stack.imgur.com/1QU0m.png)

Training and validation loss decrase together until a certain number of epochs, where validation loss increases again while training loss continues to decrease. This is an indication of **overfitting** caused by adapting the weights too strongly to the training set. This results in a loss of generalization power. 

Clearly this can be avoided by stopping the training process at the right time, when a minimum in validation loss is reached.

## Automatic Stopping

Rather than estimating the number of epochs, we can rely on `keras` callbacks to monitor our target function and stop the training when no more significant improvement is made.

In [None]:
keras.callbacks.EarlyStopping?

### Example: Automatic Stopping

Here we configure the training process to stop if the accuracy on the validation set does not improve much for several epochs.

**Data**

In [None]:
(X_train, y_train),(X_test, y_test) = keras.datasets.mnist.load_data()

In [None]:
img_size = 28
n_classes = 10

In [None]:
X_train = X_train.reshape(-1, img_size, img_size, 1)
X_test = X_test.reshape(-1, img_size, img_size, 1)
input_shape = (img_size, img_size, 1)


**Network Architecture**

In [None]:
n1 = 8    # number of filters in first convolutional layer
n2 = 16   # number of filters in second convolutoinal layer
n3 = 16   # number of neurons in final dense layer

net = keras.models.Sequential(
    [
        keras.layers.Convolution2D(
            input_shape=input_shape,
            filters=n1,
            kernel_size=(5,5),
            activation="relu",
        ),
        keras.layers.MaxPooling2D(
            pool_size=(2,2)
        ),
        keras.layers.Convolution2D(
            filters=n2,
            kernel_size=(5,5),
            activation="relu"
        ),
        keras.layers.MaxPooling2D(
            pool_size=(2,2)
        ),
        keras.layers.Flatten(),
        keras.layers.Dense(
            units=n3,
            activation="relu"
        ),
        keras.layers.Dense(
            units=n_classes,
            activation="softmax"
        )
        
    ]
)

**Training**

In [None]:
net.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

In [None]:
history = net.fit(
    X_train,
    y_train,
    batch_size=128,
    verbose=True,
    validation_data=(X_test, y_test),
    epochs=1000,
    callbacks=[
        keras.callbacks.EarlyStopping(
            monitor='val_accuracy',
            min_delta=0.01,
            patience=3,
            verbose=True
        )
    ]
)

---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2025 [Point 8 GmbH](https://point-8.de)_