# Basic Image Classification with Feedforward NN and LetNet5

All libraries we introduced in the last chapter provide support for convolutional layers. We are going to illustrate the LeNet5 architecture using the most basic MNIST handwritten digit dataset, and then use AlexNet on CIFAR10, a simplified version of the original ImageNet to demonstrate the use of data augmentation.
    LeNet5 and MNIST using Flux.

## Imports

In [None]:
using Flux
using Flux.Data: DataLoader
using Flux.Optimise: Optimiser, WeightDecay
using Flux: onehotbatch, onecold, flatten
using Flux.Losses: logitcrossentropy
using Statistics, Random
using Logging: with_logger
using TensorBoardLogger: TBLogger, tb_overwrite, set_step!, set_step_increment!
using ProgressMeter: @showprogress
import MLDatasets
import BSON
using CUDA

## Load MNIST Database

The original MNIST dataset contains 60,000 images in 28x28 pixel resolution with a single grayscale containing handwritten digits from 0 to 9. A good alternative is the more challenging but structurally similar Fashion MNIST dataset that we encountered in Chapter 12 on Unsupervised Learning.

We can load it in flux out of the box:


In [None]:
# MLDatasets.MNIST.download("MNIST/", i_accept_the_terms_of_use=true)

In [None]:
using MLUtils: shuffleobs

xtrain, ytrain = MLDatasets.MNIST.traindata(Float32; dir="MNIST/");
xtest, ytest = MLDatasets.MNIST.testdata(Float32; dir="MNIST/");

xtrain = reshape(xtrain, 28, 28, 1, :)
xtest = reshape(xtest, 28, 28, 1, :)

xtrain = xtrain[:, :, :, 1:1000];
ytrain = ytrain[1:1000];
xtest = xtest[:, :, :, 1:500];
ytest = ytest[1:500];

n_train = size(xtrain, 4);
n_test = size(xtest, 4);

In [None]:
print("The MNIST database has a training set of $n_train examples.\n")
print("The MNIST database has a test set of $n_test examples.\n")

In [None]:
size(xtrain), size(xtest), size(ytrain), size(ytest)

## Visualize Data

### Visualize First 10 Training Images

The below figure shows the first ten images in the dataset and highlights significant variation among instances of the same digit. On the right, it shows how the pixel values for an indivual image range from 0 to 255.

In [None]:
using CairoMakie, Images

In [None]:
W = H = 28;
scale = 10;

nrow, ncol = 5, 5;
f = Figure(backgroundcolor = RGBf(0.0, 0.0, 0.0), resolution = (ncol * W * scale, nrow * H * scale));

N = nrow * ncol;
n = 1;
for row ∈ 1:nrow
    for col ∈ 1:ncol
        gray_image = (reshape(xtrain[:, :, :, n], 28, 28))
        image(f[row, col], gray_image, axis = (aspect = DataAspect(), yreversed = true, title = "Digit: $(ytrain[n])", titlecolor=:white));
        n += 1;
    end
end

f

### Show random image in detail

In [None]:
random_ = floor(Int, rand() * n_train)

In [None]:
scale = 50;

f = Figure(backgroundcolor = RGBf(0.0, 0.0, 0.0), resolution = (ncol * W * scale, nrow * H * scale));


gray_image = (reshape(xtrain[:, :, :, random_], 28, 28))
image(f[1, 1], gray_image, axis = (aspect = DataAspect(), yreversed = true, title = "Digit: $(ytrain[random_])", titlecolor=:white, titlesize=200));

f

## Prepare Data

### Rescale pixel values

We rescale the pixel values to the range [0, 1] to normalize the training data and faciliate the backpropagation process and convert the data to 32 bit floats that reduce memory requirements and computational cost while providing sufficient precision for our use case:

In Flux MLDataset API all images pixels values are in the range [0, 1] already.

### One-Hot Label Encoding using Keras

Print first ten labels

In [None]:
print("Integer-valued labels:\n")
print(ytrain[1:10])

In [None]:
ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9);

### ML Flow Experiment Tracking

MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). 

In [None]:
using PyCall

mlflow = pyimport("mlflow")

MLF_EXPERIMENT_NAME = "Digit Classification Wign LeNet5"
MLF_EXPERIMENT_ID = 0

try
    MLF_EXPERIMENT_ID = mlflow.get_experiment_by_name(MLF_EXPERIMENT_NAME).experiment_id
catch e
    MLF_EXPERIMENT_ID = mlflow.create_experiment(MLF_EXPERIMENT_NAME)
end

mlflow.set_experiment(experiment_id=MLF_EXPERIMENT_ID)

## Feed-Forward NN

### Model Architecture

In [None]:
ffnn = Chain(
    Chain(
        Flux.flatten,
        Flux.Dense(H * W, 512, NNlib.relu),
        Flux.Dropout(0.2),
        Flux.Dense(512, 512, NNlib.relu),
        Flux.Dropout(0.2),
        Flux.Dense(512, 10),
    ), NNlib.softmax
);

In [None]:
ffnn

### Train, Validation, Test Split

In [None]:
using MLUtils: splitobs

batchsize = 32;

(xtrain, ytrain), (xvalidation, yvalidation) = splitobs((xtrain, ytrain), at=0.80, shuffle=true);

train_loader = DataLoader((xtrain, ytrain), batchsize=batchsize, shuffle=true);
validation_loader = DataLoader((xvalidation, yvalidation), batchsize=batchsize, shuffle=true);

test_loader = DataLoader((xtest, ytest),  batchsize=batchsize);

### Define Components

In [None]:
epochs = 100;
device = cpu;

optimiser = Flux.RMSProp();
model = ffnn |> device;
loss = Flux.logitcrossentropy;
ps = Flux.params(model);

### Calculate Baseline Classification Accuracy

In [None]:
num_params(model) = sum(length, Flux.params(model)) 
round4(x) = round(x, digits=4)

In [None]:
#TODO
function eval_loss_accuracy(loader, model, device, phase)
    l = 0f0
    acc = 0
    ntot = 0
    for (x, y) in loader
        x, y = x |> device, y |> device
        ŷ = model(x)
        l += loss(ŷ, y) * size(x)[end]        
        acc += sum(onecold(ŷ |> cpu) .== onecold(y |> cpu))
        ntot += size(x)[end]
    end

    loss_value = l / ntot |> round4;
    accuracy = acc / ntot * 100 |> round4;

    metrics_dict = Dict(
        "$phase accuracy" => accuracy,
        "$phase loss" => loss_value,
    );

    return metrics_dict;
end;

In [None]:
train_metrics_dict = eval_loss_accuracy(train_loader, model, device, "train");
validation_metrics_dict = eval_loss_accuracy(validation_loader, model, device, "validation");

In [None]:
train_metrics_dict

In [None]:
validation_metrics_dict

### Train the Model

In [None]:
mlflow.end_run()
mlflow.start_run(run_name="FFNN");
for epoch ∈ 1:epochs

    for (x, y) in train_loader
        x, y = x |> device, y |> device;

        gs = Flux.gradient(ps) do
            ŷ = model(x);
            loss(ŷ, y);
        end

        Flux.Optimise.update!(optimiser, ps, gs);
    end

    train_metrics_dict = eval_loss_accuracy(train_loader, model, device, "train");
    validation_metrics_dict = eval_loss_accuracy(validation_loader, model, device, "validation"); 

    mlflow.log_metrics(train_metrics_dict, step=epoch);
    mlflow.log_metrics(validation_metrics_dict, step=epoch);

    # if args.checktime > 0 && epoch % args.checktime == 0
    #     !ispath(args.savepath) && mkpath(args.savepath)
    #     modelpath = joinpath(args.savepath, "model.bson") 
    #     let model = cpu(model) #return model to cpu before serialization
    #         BSON.@save modelpath model epoch
    #     end
    #     @info "Model saved in \"$(modelpath)\""
    # end
end
mlflow.end_run()

In [None]:
modelpath = "02_ffnn.bson";
let model = cpu(model) # return model to cpu before serialization
    BSON.@save modelpath model
end

### CV Results

Run command `mlflow server -p 5001` to observe experiment tracking's results.

## LeNet5

### Model Architecture

We can define a simplified version of LeNet5 that omits the original final layer containing radial basis functions as follows, using the default ‘valid’ padding and single step strides unless defined otherwise:

In [None]:
function LeNet5(; imgsize=(28,28,1), nclasses=10) 
  out_conv_size = (imgsize[1] ÷ 4 - 3, imgsize[2] ÷ 4 - 3, 16);

  return Chain(
    Chain(
    Conv((5, 5), imgsize[end]=>6, relu),
    MaxPool((2, 2)),
    Conv((5, 5), 6=>16, relu),
    MaxPool((2, 2)),
    flatten,
    Dense(prod(out_conv_size), 120, relu), 
    Dense(120, 84, relu), 
    Dense(84, nclasses)),
    softmax
  )
end

### Define Components

In [None]:
η = 3e-4;            # learning rate
λ = 0;               # L2 regularizer param, implemented as weight decay
epochs = 100;        # number of epochs
device = cpu;        # device to use
model = LeNet5();    # model to use

In [None]:
optimiser = ADAM(η) 
if λ > 0 # add weight decay, equivalent to L2 regularization
    opoptimisert = Optimiser(WeightDecay(λ), optimiser)
end

model = model |> device;
loss = Flux.logitcrossentropy;
ps = Flux.params(model);

### Train the Model

In [None]:
mlflow.end_run()
mlflow.start_run(run_name="LeNet5");
for epoch ∈ 1:epochs

    for (x, y) in train_loader
        x, y = x |> device, y |> device;

        gs = Flux.gradient(ps) do
            ŷ = model(x);
            loss(ŷ, y)
        end

        Flux.Optimise.update!(optimiser, ps, gs)
    end

    train_metrics_dict = eval_loss_accuracy(train_loader, model, device, "train")
    validation_metrics_dict = eval_loss_accuracy(validation_loader, model, device, "validation") 

    mlflow.log_metrics(train_metrics_dict, step=epoch);
    mlflow.log_metrics(validation_metrics_dict, step=epoch);

    # if args.checktime > 0 && epoch % args.checktime == 0
    #     !ispath(args.savepath) && mkpath(args.savepath)
    #     modelpath = joinpath(args.savepath, "model.bson") 
    #     let model = cpu(model) #return model to cpu before serialization
    #         BSON.@save modelpath model epoch
    #     end
    #     @info "Model saved in \"$(modelpath)\""
    # end
end
mlflow.end_run()

### CV Results

On a single GPU, 50 epochs take around 2.5 minutes, resulting in a test accuracy of 99.09%, slightly below the same result as for the original LeNet5.

Run command `mlflow server -p 5001` to observe experiment tracking's results.

### Test Classification Accuracy

In [None]:
test_metrics_dict = eval_loss_accuracy(test_loader, model, device, "train");
test_metrics_dict

## Summary

For comparison, a simple two-layer feedforward network achieves only 37.36% test accuracy. 

The LeNet5 improvement on MNIST is, in fact, modest. Non-neural methods have also achieved classification accuracies greater than or equal to 99%, including K-Nearest Neighbours or Support Vector Machines. CNNs really shine with more challenging datasets as we will see next.