<body style="font-family: Arial;font-size: 17px;">
    <div style="position: relative; max-width: 100%;margin: auto; vertical-align: middle;">
      <img src="images/0.jpg" alt="Notebook" style="width:100%;">
      <div style="position: absolute; top: 0; padding: 25px; font-size:28px; color: white;">
          <h1>Label <i>Smoothing</i></h1>
        </div>
      <div style="position: absolute;bottom: 0;background: rgb(0, 0, 0);background: rgba(0, 0, 0, 0.5);color: #f1f1f1;width: 100%;padding: 20px;">
        <h1>What is label smoothing and why would it be used?</h1>
        <p>The model should not become too confident in its predictions, this can be avoided by applying label smoothing, this technique can lessen the confidence of the model and prevent it from descending into deep crevices of the loss landscape where overfitting occurs. <i>Label Smoothing is form of regularization.</i></p>
      </div>
    </div>
    <p>
        There a two methods to implement <b>Label <i>Smoothing</i></b>:
        <ul>
            <li>Label smoothing by <b>explicitly updating your labels list</b>.</li>
            <li>Label smoothing by <b>using the loss function</b>.</li>
        </ul>
    <br/>
    Regularization methods are used to help combat overfitting and help our model generalize. Examples of regularization methods include:
        <ul>
            <li>Dropout.</li>
            <li>L1 and L2 weight decay.</li>
            <li>Data Augmentation.</li>
            <li>Synthetic Data.</li>
        </ul>
    However, there is another regularization technique, it is <k><i>Label Smoothing</i></k>: Turns “hard” class label assignments to “soft” label assignments. Operates directly on the labels themselves. Is dead simple to implement. Can lead to a model that generalizes better.
    </p>
    <h2>Why would Label <i>Smoothing</i> be applied?</h2>
    <p>
       In image classification tasks, typically labels are thought as hard, binary assignments.
    For example, by considering the following image from the MNIST dataset:
    <br/>
    <img src='images/3.jpg' style='width: 15%; height: 15%;'>
    <br/>
    The digit from the image above, is clearly a “3”, and in case of being necessary the one-hot encoded label vector for this data point it would look like the following:
    <br/>
    <br/>
    <div style='text-align: center;'>
        <code>[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]</code>
    </div>
    <br/>
    Notice how we’re performing hard label assignment here: all entries in the vector are 0 except for the 4th index (which corresponds to the digit 3) which is a 1.
    <br/>
    Hard label assignment is natural to us and maps to how our brains want to efficiently categorize and store information in neatly labeled and packaged boxes.
    <br/>
    If we were to apply soft label assignment to our one-hot encoded vector above it would now look like this:
    <br/>
    <br/>
    <div style='text-align: center;'>
        <code>[0.01 0.01 0.01 0.91 0.01 0.01 0.01 0.01 0.01 0.01]</code>
    </div>
    <br/>
    <br/>
    Notice how summing the list of values equals <code>1</code>, just like in the original one-hot encoded vector.
    This type of label assignment is called <b>soft label assignment</b>.
    <br/>
    <br/>
    Unlike hard label assignments where class labels are binary (i.e., positive for one class and a negative example
    for all other classes), soft label assignment allows:
    <ul>
        <li>The positive class to have the largest probability.</li>
        <li>While all other classes have a very small probability.</li>
    </ul>
    <h3>Benefits from Label <i>Smoothing</i></h3>
    <br/>
    The answer is that we don’t want our model to become too confident in its predictions. By applying label smoothing we can lessen the confidence of the model and prevent it from descending into deep crevices of the loss landscape where overfitting occurs.
    </p>
    <hr>
    <h2>Time to Code</h2>
    <br/>
    First import the necessary packages, classes, libraries, etc..
</body>

In [47]:
# Import the necessary packages
from learning_rate_schedulers import PolynomialDecay
from minigooglenet import MiniGoogLeNet
from sklearn.metrics import classification_report
from sklearn.preprocessing import LabelBinarizer
from IPython.display import SVG
from tensorflow.keras.utils import model_to_dot
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import LearningRateScheduler
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np

In [48]:
import matplotlib
matplotlib.use("Agg")

<h3>Label smoothing by explicitly updating the labels list</h3>
<br/>
<p>
    The <b>Label <i>Smoothing</i> by explicitly updating the labels list</b> implementation works by directly modification of the labels after one-hot encoding. All that it must be needed to do is implement a simple custom function.
    <br/>
    <br/>
    The method <code>smooth_labels(labels, factor=0.1)</code> is the core of the method by explicitly updating the labels list. The parameters of this method are:
    <ul>
        <li><code>labels</code>: Contains one-hot encoded labels for all data points in our dataset.</li>
        <li><code>factor</code>: The optional “smoothing factor” is set to 10% by default.</li>
    </ul>
    <br/>
    To start, let’s assume that the following one-hot encoded vector is supplied to our function:
    <br/>
    <br/>
    <div style='text-align: center;'>
        <code>[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]</code>
    </div>
    <br/>
    Notice how there is a hard label assignment, the true class labels is a <code>1</code> while all others are <code>0</code>.
    <br/>
    <br/>
    Reduces the hard assignment label of <code>1</code> by the supplied <code>factor</code> amount. With <code>factor=0.1</code>, the operation <code>labels *= (1 - factor)</code> yields the following vector:
    <br/>
    <br/>
    <div style='text-align: center;'>
        <code>[0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]</code>
    </div>
    <br/>
    Notice how the hard assignment of <code>1.0</code> has been dropped to <code>0.9</code>.
    <br/>
    The next step is to apply a very small amount of confidence to the rest of the class labels in the vector.
    <br/>
    <br/>
    A small amount of confidence can be done by taking <code>factor</code> and dividing it by the total number of possible class labels. In this case, there are 10 possible class labels so when <code>factor=0.1</code>, therefore, have<code>0.1 / 10 = 0.01</code> — that value is added to the vector on <code>labels += (factor / labels.shape[1])</code>, resulting in:
    <br/>
    <br/>
    <div style='text-align: center;'>
        <code>[0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.91 0.01 0.01]</code>
    </div>
    <br/>
    Notice how the “incorrect” classes here have a very small amount of confidence. It doesn’t seem like much, but in practice, it can help our model from overfitting.
    <br/>
    <br/>
    Note: The <code>smooth_labels</code> function in part comes from <a href="https://www.dlology.com/blog/bag-of-tricks-for-image-classification-with-convolutional-neural-networks-in-keras/">Chengwei’s article</a>, by discussing the <b>Bag of Tricks for Image Classification with Convolutional Neural Networks paper</b>.
</p>

In [None]:
def smooth_labels(labels, factor=0.1):
    # smooth the labels
    labels *= (1 - factor)
    labels += (factor / labels.shape[1])
 
    # returned the smoothed labels
    return labels

In [None]:
def input_smoothing_value():
    print('Please input the Smoothing value:')
    SMOOTHING = input()
    print('Smoothing value of {}'.format(SMOOTHING))
    return SMOOTHING

<p>
    Ask for the <code>SMOOTHING</code> factor.
</p>

In [None]:
SMOOTHING = input_smoothing_value()
SMOOTHING = float(SMOOTHING)

<p>
    Initialize three training hyperparameters including the total number of epochs to train for, initial learning rate, and batch size.
</p>

In [None]:
# define the total number of epochs to train for, initial learning
# rate, and batch size
NUM_EPOCHS = 32
INIT_LR = 5e-3
BATCH_SIZE = 8

<p>
    Initialize the class <code>labelNames</code>  for the <a href='https://en.wikipedia.org/wiki/CIFAR-10'>CIFAR-10</a> dataset.
</p>

In [None]:
# initialize the label names for the CIFAR-10 dataset
labelNames = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]

<p>
    Handle loading <a href='https://en.wikipedia.org/wiki/CIFAR-10'>CIFAR-10</a> dataset.
</p>

In [None]:
# load the training and testing data, converting the images from
# integers to floats
print("[INFO] loading CIFAR-10 data...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
trainX = trainX.astype("float")
testX = testX.astype("float")

<p>
    Mean subtraction, a form of normalization for faster generelization.
</p>

In [None]:
# apply mean subtraction to the data
mean = np.mean(trainX, axis=0)
trainX -= mean
testX -= mean

<p>
    One-hot encode the labels and convert them to floats.
</p>

In [None]:
# convert the labels from integers to vectors, converting the data
# type to floats so we can apply label smoothing
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)
trainY = trainY.astype("float")
testY = testY.astype("float")

<p>
    Applies label smoothing using the <code>smooth_labels</code> function.
</p>

In [None]:
# apply label smoothing to the *training labels only*
print("[INFO] smoothing amount: {}".format(SMOOTHING))
print("[INFO] before smoothing: {}".format(trainY[0]))
trainY = smooth_labels(trainY, SMOOTHING)
print("[INFO] after smoothing: {}".format(trainY[0]))

<p>
     Instantiate the data augmentation object.
</p>

In [None]:
# construct the image generator for data augmentation
aug = ImageDataGenerator(
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    fill_mode="nearest")

<p>
    Initialize learning rate decay via a callback that will be executed at the start of each epoch. 
</p>

In [None]:
# construct the learning rate scheduler callback
schedule = PolynomialDecay(maxEpochs=NUM_EPOCHS,
                           initAlpha=INIT_LR,
                           power=1.0)
callbacks = [LearningRateScheduler(schedule)]

<p>
    Compile and Train the MiniGoogleNet model
</p>

In [None]:
# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=INIT_LR, momentum=0.7)
model = MiniGoogLeNet.build(width=32, height=32, depth=3, classes=10)

In [None]:
model.compile(loss="categorical_crossentropy",
              optimizer=opt,
              metrics=["accuracy"])

In [None]:
# train the network
print("[INFO] training network...")
H = model.fit_generator(
    aug.flow(trainX, trainY, batch_size=BATCH_SIZE),
    validation_data=(testX, testY),
    steps_per_epoch=len(trainX) // BATCH_SIZE,
    epochs=NUM_EPOCHS,
    callbacks=callbacks,
    verbose=1)

<p>
    Save the model.
</p>

In [None]:
model.save('minigooglenet_explicit_smooth_labels.h5')

<p>
    Once the model is fully trained and saved, generate a classification report as well as a training history plot.
</p>

In [28]:
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=BATCH_SIZE)
print(classification_report(testY.argmax(axis=1),
                            predictions.argmax(axis=1),
                            target_names=labelNames))

[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.89      0.91      0.90      1000
  automobile       0.95      0.96      0.96      1000
        bird       0.83      0.83      0.83      1000
         cat       0.81      0.77      0.79      1000
        deer       0.89      0.88      0.89      1000
         dog       0.88      0.81      0.84      1000
        frog       0.87      0.94      0.91      1000
       horse       0.92      0.93      0.92      1000
        ship       0.93      0.95      0.94      1000
       truck       0.94      0.94      0.94      1000

    accuracy                           0.89     10000
   macro avg       0.89      0.89      0.89     10000
weighted avg       0.89      0.89      0.89     10000



In [49]:
# construct a plot that plots and saves the training history
def plot_history_metrics(metric, val_metric, lbl_metric, lbl_val_metric, title, ylabel, plt_file_name):
    N = np.arange(0, NUM_EPOCHS)
    plt.style.use("ggplot")
    plt.figure()
    plt.plot(N, metric, label=lbl_metric)
    plt.plot(N, val_metric, label=lbl_val_metric)
    plt.title(title)
    plt.xlabel("Epoch #")
    plt.ylabel(ylabel)
    plt.legend(loc="lower left")
    plt.savefig(plt_file_name)

In [None]:
plot_history_metrics(H.history["loss"],
                     H.history["val_loss"],
                     "train_loss",
                     "val_loss",
                     "Training Loss vs Validation Loss",
                     "Loss",
                     "loss_value_label_smoothing_explicitly_updating_labels_list")

<img src='loss_value_label_smoothing_explicitly_updating_labels_list.png' style='display: block; margin-left: auto; margin-right: auto; width: 50%;'>

In [None]:
plot_history_metrics(H.history["accuracy"],
                     H.history["val_accuracy"],
                     "train_accuracy",
                     "val_accuracy",
                     "Training Accuracy vs Validation Accuracy",
                     "Accuracy",
                     "Accuracy_value_label_smoothing_explicitly_updating_labels_list")

<img src='Accuracy_value_label_smoothing_explicitly_updating_labels_list.png' style='display: block; margin-left: auto; margin-right: auto; width: 50%;'>

<p>
    <h3>Label smoothing by using the loss function.</h3>
    <br/>
    <p>
    The second method to implement <b>Label <i>Smoothing</i></b> utilizes <b>Keras/TensorFlow’s</b> <code>CategoricalCrossentropy</code> class directly.
    <br/>
    <br/>
    The benefit here is that there is not need to implement any custom function. <b>Label <i>Smoothing</i> can be applied on the fly when instantiating the</b> <code>CategoricalCrossentropy</code> <b>class with the </b> <code>label_smoothing parameter</code>:
    <br/>
    <br/>
    <div style='text-align: center;'>
        <code> CategoricalCrossentropy(label_smoothing=0.1)</code>
    </div>
    <br/>
    <br/>
    The benefit here is that there is not need of any custom implementation, <i><b>but the downside is that we don’t have access to the raw labels list which would be a problem if you need it to compute your own custom metrics when monitoring the training process.</i></b>
    <br/>
    <br/>
    With all that said, let’s learn how to utilize the CategoricalCrossentropy for label smoothing. The process is really similar, there are only a few changes, which will be explained.
    </p>
</p>

<p>
    As usual initialize the optimizer and loss. The core of the method by using the loss function is here in the loss method with <b>Label <i>Smoothing</i></b>: Notice how we’re passing in the <code>label_smoothing</code> parameter to the <code>CategoricalCrossentropy</code> class. This class will automatically apply label smoothing for us.
</p>

In [32]:
# initialize the Optimizer and Loss
print("[INFO] smoothing amount: {}".format(SMOOTHING))
opt = SGD(lr=INIT_LR, momentum=0.9)
loss = CategoricalCrossentropy(label_smoothing=SMOOTHING)

[INFO] smoothing amount: 0.1


<p>
    Then compile the model, passing in our loss with label smoothing. To wrap up, we’ll train our model, evaluate it, and plot the training history
</p>

In [None]:
print("[INFO] compiling model...")
model = MiniGoogLeNet.build(width=32, height=32, depth=3, classes=10)
model.compile(loss=loss, optimizer=opt, metrics=["accuracy"])

In [34]:
# Train the MiniGoogleNet network
print("[INFO] training network...")
H = model.fit_generator(
    aug.flow(trainX, trainY, batch_size=BATCH_SIZE),
    validation_data=(testX, testY),
    steps_per_epoch=len(trainX) // BATCH_SIZE,
    epochs=NUM_EPOCHS,
    callbacks=callbacks,
    verbose=1)

[INFO] training network...
Instructions for updating:
Please use Model.fit, which supports generators.
  ...
    to  
  ['...']
Train for 6250 steps, validate on 10000 samples
Epoch 1/32
Epoch 2/32
Epoch 3/32
Epoch 4/32
Epoch 5/32
Epoch 6/32
Epoch 7/32
Epoch 8/32
Epoch 9/32
Epoch 10/32
Epoch 11/32
Epoch 12/32
Epoch 13/32
Epoch 14/32
Epoch 15/32
Epoch 16/32
Epoch 17/32
Epoch 18/32
Epoch 19/32
Epoch 20/32
Epoch 21/32
Epoch 22/32
Epoch 23/32
Epoch 24/32
Epoch 25/32
Epoch 26/32
Epoch 27/32
Epoch 28/32
Epoch 29/32
Epoch 30/32
Epoch 31/32
Epoch 32/32


In [52]:
# Save model in case of being necessary
model.save("minigooglenet_by_loss_function.h5")

In [35]:
# Evaluate The Network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=BATCH_SIZE)
print(classification_report(testY.argmax(axis=1),
                            predictions.argmax(axis=1), target_names=labelNames))

[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.91      0.91      0.91      1000
  automobile       0.96      0.96      0.96      1000
        bird       0.85      0.86      0.85      1000
         cat       0.82      0.80      0.81      1000
        deer       0.90      0.90      0.90      1000
         dog       0.88      0.85      0.86      1000
        frog       0.90      0.94      0.92      1000
       horse       0.94      0.92      0.93      1000
        ship       0.94      0.94      0.94      1000
       truck       0.94      0.94      0.94      1000

    accuracy                           0.90     10000
   macro avg       0.90      0.90      0.90     10000
weighted avg       0.90      0.90      0.90     10000



In [None]:
plot_history_metrics(H.history["accuracy"],
                     H.history["val_accuracy"],
                     "train_accuracy",
                     "val_accuracy",
                     "Training Accuracy vs Validation Accuracy",
                     "Accuracy",
                     "accuracy_value_label_smoothing_by_loss_function")

<img src='accuracy_value_label_smoothing_by_loss_function.png' style='display: block; margin-left: auto; margin-right: auto; width: 50%;'>

In [None]:
plot_history_metrics(H.history["loss"],
                     H.history["val_loss"],
                     "train_loss",
                     "val_loss",
                     "Training Loss vs Validation Loss",
                     "Loss",
                     "loss_value_label_smoothing_by_loss_function")

<img src='loss_value_label_smoothing_by_loss_function.png' style='display: block; margin-left: auto; margin-right: auto; width: 50%;'>

<p>
    A score of ~90% accuracy, but that does not mean that the <code>CategoricalCrossentropy</code> method is “better” than the <code>smooth_labels</code> technique. For all intents and purposes these results are “equal” and would show to follow the same distribution if the results were averaged over multiple runs.
    <br/>
    <h2>A Strange Behaviour</h2> <h3>Validation Loss is way more lower than Training Loss <i>but Training Accuracy is way more higher than Validation Accuracy</i></h3>
    <br/>
    Note that the validation loss is lower than our training loss yet our training accuracy is higher than our validation accuracy, <i><b>this is totally normal behavior when using label smoothing so don’t be alarmed by it.</i></b>
    <br/>
    <h3>Remember when to apply Label <i>Smoothing</i></h3>
    <br/>
    It is recommended to apply label smoothing when there is trouble getting the model to generalize and/or the model is overfitting to the training set. When those situations happen regularization techniques must be applied. Label smoothing is just one type of regularization, however. Other types of regularization include:
    <ul>
        <li>Dropout.</li>
        <li>L1, L2, etc. weight decay.</li>
        <li>Data augmentation.</li>
        <li>Decreasing model capacity.</li>
    </ul>
    <hr>
    <h2>Summary</h2>
    <p>
        This notebook brings a brief practical explanation of how to implement two methods to apply <i>Label Smoothing</i> using Keras, TensorFlow, and Deep Learning:
    <ol>
        <li>Label smoothing by updating your labels lists using a custom label parsing function</li>
        <li>Label smoothing using the loss function in TensorFlow/Keras.</li>
    </ol>
    Label Smoothing can be seen as a form of regularization that improves the ability of the model to generalize to testing data, <b><i>but perhaps at the expense of accuracy on your training set, typically this tradeoff is well worth it.</b></i>
    <br/>
    <br/>
    It is normally recommend apply Label Smoothing by updating your labels lists using a custom label parsing function when either:
    <ul>
        <li>Entire dataset fits into memory and you can smooth all labels in a single function call.</li>
        <li>Need direct access to the label variables.</li>
    </ul>
    
Otherwise, Label smoothing using the loss function in TensorFlow/Keras tends to be easier to utilize as:
    <ol>
    <li>It’s baked right into Keras/TensorFlow</li>
    <li>Does not require any hand-implemented functions.</li>

Regardless of which method is chosen, they both do the same thing, <b><i>Smooth the labels</b></i>, thereby attempting to improve the ability of the model to generalize.
    </p>
    <hr>
    <h3>References</h3>
    <br/>
    <ul>
        <li><a href='https://www.dlology.com/blog/bag-of-tricks-for-image-classification-with-convolutional-neural-networks-in-keras/'>Bag of Tricks for Image Classification with Convolutional Neural Networks in Keras.</a></li>
        <li><a href='https://leimao.github.io/blog/Label-Smoothing/'>Label Smoothing by Lei Mao.</a></li>
        <li><a href='https://arxiv.org/abs/1906.02629'>When Does Label Smoothing Help?</a></li>
        <li><a href='https://arxiv.org/abs/1812.01187'>Bag of Tricks for Image Classification with Convolutional Neural Networks.</a></li>
        <li><a href='https://www.pyimagesearch.com/2019/12/30/label-smoothing-with-keras-tensorflow-and-deep-learning/'>Label Smoothing by Adrian Rosebrock.</a></li>
    </ul>
</p>