# Introduction

In this notebook, I will implement LeNet-5 architecture.

Shortly, LeNet-5 is one of the first CNN architectures. It is really simple architecture with just two convolutional layers, two average pooling layers and three fully connected layers.

It is based on MNIST dataset, and for its trained over 20 epoch. For convergence, about 10 epochs is enough.

In [None]:
import os
import random
import gc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix

import tensorflow as tf
from keras.utils.vis_utils import plot_model
from keras.utils.np_utils import to_categorical
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, AveragePooling2D, Dense, Flatten
from keras.callbacks import LearningRateScheduler

BATCH_SIZE = 64

In [None]:
seed = 666
tf.random.set_seed(seed)
np.random.seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)                      
random.seed(666)

In [None]:
train = pd.read_csv("../input/digit-recognizer/train.csv")
test = pd.read_csv("../input/digit-recognizer/test.csv")

print(f"Training observations {train.shape[0]}, Test observations {test.shape[0]} \n")

train.head()

In [None]:
fig, ax = plt.subplots(figsize = (8, 4))

sns.countplot(data = train, x = "label", ax = ax, color = "#101820")

ax.set_title("Countplot for Train Labels")

sns.despine()
plt.show()

In [None]:
train_X = train.drop("label", axis = 1)
train_y = train["label"]

del train
_ = gc.collect()

In [None]:
train_X = train_X / 255
X_test = test / 255

del test
_ = gc.collect()

In [None]:
print(f"Training data shape: {train_X.shape} \nTest data shape: {X_test.shape}")

train_X = train_X.values.reshape(-1, 28, 28, 1)
X_test = X_test.values.reshape(-1, 28, 28, 1)

print(f"Training data shape after rescaling: {train_X.shape} \nTest data shape after rescaling: {X_test.shape}")

In [None]:
train_X2 = np.pad(train_X, ((0,0),(2,2),(2,2),(0,0)), "constant")
X_test = np.pad(X_test, ((0,0),(2,2),(2,2),(0,0)), "constant")

In [None]:
train_y = to_categorical(train_y, num_classes = 10)

In [None]:
fig = plt.figure(1, figsize = (8, 8))
fig.suptitle("Training Set Images (Sample)")

for i in range(100):
    
    plt.subplot(10, 10, i + 1)
    plt.imshow(train_X[i], cmap = plt.cm.binary)
    plt.axis("off")
    
plt.tight_layout()
plt.show()

In [None]:
fig = plt.figure(1, figsize = (8, 8))
fig.suptitle("Test Set Images (Sample)")

for i in range(100):
    
    plt.subplot(10, 10, i + 1)
    plt.imshow(X_test[i], cmap = plt.cm.binary)
    plt.axis("off")
    
plt.tight_layout()
plt.show()

In [None]:
X_train, X_val, y_train, y_val = train_test_split(train_X2, 
                                                  train_y, 
                                                  test_size = 0.1, 
                                                  random_state = 666, 
                                                  stratify = train_y)

print(f"Training set shape: {X_train.shape} \nValidation set shape: {X_val.shape}")

# Lenet5 Architecture & Implementation

## Paper

[Gradient-Based Learning Applied to Document. Recognition](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiBvuapsqL0AhVI3aQKHRRhAgEQFnoECAMQAQ&url=http%3A%2F%2Fyann.lecun.com%2Fexdb%2Fpublis%2Fpdf%2Flecun-98.pdf&usg=AOvVaw00sBiUROin1_Z1KYLBXPX6)


and also, 

http://yann.lecun.com/exdb/lenet/index.html

## Architecture

![](https://www.researchgate.net/profile/Sheraz-Khan-14/publication/321586653/figure/fig4/AS:568546847014912@1512563539828/The-LeNet-5-Architecture-a-convolutional-neural-network.png)

## Keras Implementation

In [None]:
input_shape = (32, 32, 1)

def lenet5():
    
    model = Sequential(
        [
            Conv2D(filters = 6, kernel_size = (5, 5), strides = (1, 1), activation = "tanh", input_shape = input_shape),
            AveragePooling2D(pool_size = (2, 2)),
            Conv2D(filters = 16, kernel_size = (5, 5), strides = (1, 1), activation = "tanh"),
            AveragePooling2D(pool_size = (2, 2), strides = 2),
            
            Flatten(),
            Dense(units = 120, activation = "tanh"),
            Dense(units = 84, activation = "tanh"),
            Dense(units = 10, activation = "softmax")
        ]
    )
    
    return model

In [None]:
model = lenet5()

In [None]:
model.summary()

model.compile(loss = "categorical_crossentropy", metrics = "accuracy", optimizer = "SGD")

print("LeNet5 Architecture\n")
plot_model(model, to_file = "lenet5.png", show_shapes = True, show_layer_names = True)

# Lenet5 with SGD Optimizer

In [None]:
history = model.fit(
    X_train, y_train,
    epochs = 20, batch_size = BATCH_SIZE, validation_data = (X_val, y_val),
    steps_per_epoch = X_train.shape[0] // BATCH_SIZE
)

In [None]:
fig, axes = plt.subplots(1, 2, figsize = (12, 4))

sns.lineplot(x = range(len(history.history["loss"])), 
             y = history.history["loss"], 
             ax = axes[0], label = "Training Loss")

sns.lineplot(x = range(len(history.history["loss"])), 
             y = history.history["val_loss"], 
             ax = axes[0], label = "Validation Loss")

sns.lineplot(x = range(len(history.history["accuracy"])), 
             y = history.history["accuracy"], 
             ax = axes[1], label = "Training Accuracy")

sns.lineplot(x = range(len(history.history["accuracy"])), 
             y = history.history["val_accuracy"], 
             ax = axes[1], label = "Validation Accuracy")

axes[0].set_title("Loss")
axes[1].set_title("Accuracy")
fig.suptitle("LeNet5 \nSGD Optimizer")

plt.tight_layout()

sns.despine()
plt.show()

# Lenet5 with Adam Optimizer

In [None]:
tf.keras.backend.clear_session()

model = lenet5()

model.compile(loss = "categorical_crossentropy", metrics = "accuracy", optimizer = "adam")

history = model.fit(
    X_train, y_train,
    epochs = 20, batch_size = BATCH_SIZE, validation_data = (X_val, y_val),
    steps_per_epoch = X_train.shape[0] // BATCH_SIZE
)

In [None]:
fig, axes = plt.subplots(1, 2, figsize = (12, 4))

sns.lineplot(x = range(len(history.history["loss"])), 
             y = history.history["loss"], 
             ax = axes[0], label = "Training Loss")

sns.lineplot(x = range(len(history.history["loss"])), 
             y = history.history["val_loss"], 
             ax = axes[0], label = "Validation Loss")

sns.lineplot(x = range(len(history.history["accuracy"])), 
             y = history.history["accuracy"], 
             ax = axes[1], label = "Training Accuracy")

sns.lineplot(x = range(len(history.history["accuracy"])), 
             y = history.history["val_accuracy"], 
             ax = axes[1], label = "Validation Accuracy")

axes[0].set_title("Loss")
axes[1].set_title("Accuracy")
fig.suptitle("LeNet5 \nAdam Optimizer")

plt.tight_layout()

sns.despine()
plt.show()

# Lenet5 with Adam Optimizer & Learning Rate Scheduler

In [None]:
#from paper
def lenet_scheduler(epoch):
    
    if epoch < 3: 
        return 0.0005
    
    elif epoch < 6: 
        return 0.0002
    
    elif epoch < 9: 
        return 0.0001
    
    elif epoch < 13: 
        return 0.00005
    
    else:
        return 0.00001
    
scheduler = LearningRateScheduler(lenet_scheduler, verbose = 1)

In [None]:
tf.keras.backend.clear_session()

model = lenet5()

model.compile(loss = "categorical_crossentropy", metrics = "accuracy", optimizer = "adam")

history = model.fit(
    X_train, y_train,
    epochs = 20, batch_size = BATCH_SIZE, validation_data = (X_val, y_val),
    steps_per_epoch = X_train.shape[0] // BATCH_SIZE,
    callbacks = [scheduler]
)

In [None]:
fig, axes = plt.subplots(1, 2, figsize = (12, 4))

sns.lineplot(x = range(len(history.history["loss"])), 
             y = history.history["loss"], 
             ax = axes[0], label = "Training Loss")

sns.lineplot(x = range(len(history.history["loss"])), 
             y = history.history["val_loss"], 
             ax = axes[0], label = "Validation Loss")

sns.lineplot(x = range(len(history.history["accuracy"])), 
             y = history.history["accuracy"], 
             ax = axes[1], label = "Training Accuracy")

sns.lineplot(x = range(len(history.history["accuracy"])), 
             y = history.history["val_accuracy"], 
             ax = axes[1], label = "Validation Accuracy")

axes[0].set_title("Loss")
axes[1].set_title("Accuracy")
fig.suptitle("LeNet5 \nAdam with learning rate scheduler")

plt.tight_layout()
sns.despine()
plt.show()

# Predictions & Metrics

In [None]:
val_preds = np.argmax(model.predict(X_val), axis = 1)
train_preds = np.argmax(model.predict(X_train), axis = 1)

In [None]:
fig, axes = plt.subplots(1, 2, figsize = (18, 6))

cm_train = confusion_matrix(np.argmax(y_train, axis = 1), train_preds)
disp = ConfusionMatrixDisplay(confusion_matrix = cm_train)
disp.plot(cmap = plt.cm.Blues, ax = axes[0])

cm_val = confusion_matrix(np.argmax(y_val, axis = 1), val_preds)
disp = ConfusionMatrixDisplay(confusion_matrix = cm_val)
disp.plot(cmap = plt.cm.Blues, ax = axes[1])

axes[0].set_title("Training Set"); axes[1].set_title("Validation Set")

plt.show()

In [None]:
errors = (val_preds - np.argmax(y_val, axis = 1) != 0)

pred_error = val_preds[errors]
observed_error = np.argmax(y_val, axis = 1)[errors]
image_error = X_val[errors]
len(pred_error)

In [None]:
fig = plt.figure(1, figsize=(15, 15))
fig.suptitle("Errors in Validation")

rows = int(len(pred_error) ** 0.5) - 1
cols = int(len(pred_error) / rows) + 1

for i in range(len(pred_error)):
    
    plt.subplot(rows, cols, i + 1)
    plt.imshow(image_error[i], cmap = plt.cm.binary)
    plt.axis("off")
    plt.title(f"True Value: {observed_error[i]} \nPrediction: {pred_error[i]}")
    
plt.tight_layout()
plt.show()

In [None]:
submission = pd.read_csv("../input/digit-recognizer/sample_submission.csv")

preds = np.argmax(model.predict(X_test), axis = 1)
submission["Label"] = preds
submission.to_csv("cnn.csv",index = False)
submission

# Conclusion

I give an importance to implementations or analysis for different architectures. This is one of the first CNN architectures. It is simple with just a few layer. It won't give us best results. But, reading a paper, analysing its structure and coding it probably provides a benefit to you.

I think, researching process and getting other ideas about that topic is more important than creating a model without enough knowledge.

# Readings

https://d2l.ai/chapter_convolutional-neural-networks/lenet.html

https://www.analyticsvidhya.com/blog/2021/03/the-architecture-of-lenet-5/

https://hackmd.io/@bouteille/S1WvJyqmI

https://debuggercafe.com/lenet-5-a-practical-approach/

http://yann.lecun.com/exdb/lenet/index.html


My similar works with image data;

https://www.kaggle.com/mustafacicek/mnist-cnn-data-augmentation

https://www.kaggle.com/mustafacicek/dogs-cats-vgg16-implementation-transfer-learning