# Label Smoothing
This notebook is for training and understanding purposes only. All algorithms and credits go to pyimagesearch.com, specifically https://www.pyimagesearch.com/2019/12/30/label-smoothing-with-keras-tensorflow-and-deep-learning/ and Adrian Rosebrock (A wonderful source and inspiration for Computer Vision and Deep Learning)

As this notebook is for training and understanding purposes, rather than downloading the source code right away. The code will be typed in order to build "muscle-memory". Author-readable comments will appear from time to time.

## What is label smoothing?
When we trained a model, we hope that the model generalizes well. There are regularization techniques such as neuron dropout, data augmentation, L2 weight decay etc to achieve such puposes. And label smoothing is another technique to do so. <br>
<br>
Put simply, label smoothing is a technique to change "hard-coded" outcome/label into "soft-coded" outcome/label. The classical example of a "hard-coded" label is the one-hot encoding where we input sparse vectors of 0s and 1s only (i.e. [0, 0, 0, 1, 0, 0 ] as our target. In one-hot encoding, we are effectively saying, we are 100% confident that it is indeed the label we specified. What Label smoothing involve transitioning from [0.01, 0.01, 0.01, 0.96, 0.01].<br>
<br>
This effectively change our loss function (remember $y log(p) + (1-y) log (1-p)$), now that y is no longer 0.

In [1]:
import matplotlib
%matplotlib inline

# import the necessary packages
from pyimagesearch.learning_rate_schedulers import PolynomialDecay
from pyimagesearch.minigooglenet import MiniGoogLeNet
from sklearn.metrics import classification_report
from sklearn.preprocessing import LabelBinarizer
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import LearningRateScheduler
from tensorflow.keras.optimizers import SGD,Adam
from tensorflow.keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np
import argparse

def smooth_labels(labels, factor=0.1):
    # smooth the labels
    labels *= (1 - factor)
    labels += (factor / labels.shape[1])
    # returned the smoothed labels
    return labels

# ap = argparse.ArgumentParser()
# ap.add_argument("-s", "--smoothing", type=float, default=0.1, help="amount of label smoothing to be applied")
# ap.add_argument("-p", "--plot", type=str, default="plot.png", help="path to output plot file")
# args = vars(ap.parse_args())

In [2]:
# define the total number of epochs to train for, initial learning rate, and batch size
NUM_EPOCHS = 10
INIT_LR = 5e-3
BATCH_SIZE = 64

# initialize the label names for the CIFAR-10 dataset
labelNames = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]

# load the training and testing data, converting the images from integers to floats
print("[INFO] loading CIFAR-10 data...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
trainX = trainX.astype("float")
testX = testX.astype("float")

# apply mean subtraction to the data
mean = np.mean(trainX, axis=0)
trainX -= mean
testX -= mean

# convert the labels from integers to vectors, converting the data type to floats so we can apply label smoothing
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)
trainY = trainY.astype("float")
testY = testY.astype("float")

# apply label smoothing to the *training labels only*
# print("[INFO] smoothing amount: {}".format(args["smoothing"]))
print("[INFO] smoothing amount: {}".format(0.1))
print("[INFO] before smoothing: {}".format(trainY[0]))
trainY = smooth_labels(trainY, 0.1)
print("[INFO] after smoothing: {}".format(trainY[0]))


# construct the image generator for data augmentation
# a powerful tool from keras to augment more data (change in rotation, transformation, etc)
aug = ImageDataGenerator(width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=True, fill_mode="nearest")

# construct the learning rate scheduler callback
# this is a callback function for change in learning rate, instead of fix learning rate
schedule = PolynomialDecay(maxEpochs=NUM_EPOCHS, initAlpha=INIT_LR, power=1.0)
callbacks = [LearningRateScheduler(schedule)]

# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=INIT_LR, momentum=0.9)
model = MiniGoogLeNet.build(width=32, height=32, depth=3, classes=10)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

# train the network
print("[INFO] training network...")
H = model.fit_generator( aug.flow(trainX, trainY, batch_size=BATCH_SIZE), validation_data=(testX, testY), steps_per_epoch=(len(trainX) // BATCH_SIZE), epochs=NUM_EPOCHS, callbacks=callbacks, verbose=1)

[INFO] loading CIFAR-10 data...
[INFO] smoothing amount: 0.1
[INFO] before smoothing: [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[INFO] after smoothing: [0.01 0.01 0.01 0.01 0.01 0.01 0.91 0.01 0.01 0.01]
[INFO] compiling model...
[INFO] training network...
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Another method is to initialize the loss function with label smoothing directly.
<br>
i.e.
loss = CategoricalCrossentropy(label_smoothing=0.1)