# CIFAR10 Image Classification

Fast-forward to 2012, and we move on to the deeper and more modern VGG16 architecture. We will use the CIFAR10 dataset that uses 60,000 ImageNet samples, compressed to 32x32 pixel resolution (from the original 224x224), but still with three color channels. There are only 10 of the original 1,000 classes. 

## Imports

In [None]:
using Flux
using Flux.Data: DataLoader
using Flux.Optimise: Optimiser, WeightDecay
using Flux: onehotbatch, onecold, flatten
using Flux.Losses: logitcrossentropy
using Statistics, Random
using Logging: with_logger
using TensorBoardLogger: TBLogger, tb_overwrite, set_step!, set_step_increment!
using ProgressMeter: @showprogress
using BSON
using CUDA
using MLDatasets: CIFAR10

## Load MNIST Database

CIFAR10 can also be downloaded from MLDatasets Lib, and we similarly rescale the pixel values and one-hot encode the ten class labels. 

In [None]:
train = CIFAR10(Tx=Float64, split=:train);
train

In [None]:
test = CIFAR10(Tx=Float64, split=:test);
test

In [None]:
using MLUtils: shuffleobs

xtrain, ytrain = train[:];
xtest, ytest = test[:];

xtrain = xtrain[:, :, :, 1:1000];
ytrain = ytrain[1:1000];
xtest = xtest[:, :, :, 1:250];
ytest = ytest[1:250];

n_train = size(xtrain, 4);
n_test = size(xtest, 4);

In [None]:
print("The MNIST database has a training set of $n_train examples.\n");
print("The MNIST database has a test set of $n_test examples.\n");

In [None]:
size(xtrain), size(xtest), size(ytrain), size(ytest)

## Visualize Data

### Visualize the First 30 Training Images

In [None]:
cifar10_labels = Dict(
    0 => "airplane",
    1 => "automobile",
    2 => "bird",
    3 => "cat",
    4 => "deer",
    5 => "dog",
    6 => "frog",
    7 => "horse",
    8 => "ship",
    9 => "truck",
);

In [None]:
num_classes = length(cifar10_labels);
num_classes

In [None]:
using CairoMakie, Images
using Images: colorview, channelview

In [None]:
W = H = 32;
scale = 10;

nrow, ncol = 5, 6;
f = Figure(backgroundcolor = RGBf(0.0, 0.0, 0.0), resolution = (ncol * W * scale, nrow * H * scale));

N = nrow * ncol;
n = 1;
for row ∈ 1:nrow
    for col ∈ 1:ncol
        ch_view = xtrain[:, :, :, n];
        ch_view = permutedims(ch_view, (3, 1, 2));

        color_view = Images.colorview(RGB, ch_view);
        image(f[row, col], color_view, axis = (aspect = DataAspect(), yreversed = true, title = "Label: $(cifar10_labels[ytrain[n]])", titlecolor=:white, titlesize=24));
        n += 1;
    end
end

f

### Show random image in detail

In [None]:
random_ = floor(Int, rand() * n_train)

In [None]:
scale = 50;

f = Figure(backgroundcolor = RGBf(0.0, 0.0, 0.0), resolution = (ncol * W * scale, nrow * H * scale));

ch_view = xtrain[:, :, :, random_];
ch_view = permutedims(ch_view, (3, 1, 2));

color_view = Images.colorview(RGB, ch_view);

image(f[1, 1], color_view, axis = (aspect = DataAspect(), yreversed = true, title = "Label: $(cifar10_labels[ytrain[n]])", titlecolor=:white, titlesize=200));

f

## Prepare Data

### Rescale pixel values

We rescale the pixel values to the range [0, 1] to normalize the training data and faciliate the backpropagation process and convert the data to 32 bit floats that reduce memory requirements and computational cost while providing sufficient precision for our use case:

In Flux MLDataset API all images pixels values are in the range [0, 1] already.

In [None]:
ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9);

### ML Flow Experiment Tracking

MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud)

In [None]:
using PyCall

mlflow = pyimport("mlflow")

MLF_EXPERIMENT_NAME = "Digit Classification Wign LeNet5"
MLF_EXPERIMENT_ID = 0

try
    MLF_EXPERIMENT_ID = mlflow.get_experiment_by_name(MLF_EXPERIMENT_NAME).experiment_id
catch e
    MLF_EXPERIMENT_ID = mlflow.create_experiment(MLF_EXPERIMENT_NAME)
end

mlflow.set_experiment(experiment_id=MLF_EXPERIMENT_ID)

## Feed-Forward NN

### Model Architecture

In [None]:
ffnn = Chain(
    Chain(
        Flux.flatten,
        Flux.Dense(H * W * 3, 1000, NNlib.relu),
        Flux.Dropout(0.2),
        Flux.Dense(1000, 512, NNlib.relu),
        Flux.Dropout(0.2),
        Flux.Dense(512, 10),
    ), NNlib.softmax
);

In [None]:
ffnn

### Train, Validation, Test Split

In [None]:
size(xtrain), size(ytrain)

In [None]:
using MLUtils: splitobs

batchsize = 8;

(xtrain, ytrain), (xvalidation, yvalidation) = splitobs((xtrain, ytrain), at=0.80, shuffle=true);

train_loader = DataLoader((xtrain, ytrain), batchsize=batchsize, shuffle=true);
validation_loader = DataLoader((xvalidation, yvalidation), batchsize=batchsize, shuffle=true);

test_loader = DataLoader((xtest, ytest),  batchsize=batchsize);

### Define Components

In [None]:
epochs = 100;
device = gpu;

optimiser = Flux.ADAM();
model = ffnn |> device;
loss = Flux.crossentropy;
ps = Flux.params(model);

### Calculate Baseline Classification Accuracy

In [None]:
num_params(model) = sum(length, Flux.params(model));
round4(x) = round(x, digits=4);

In [None]:
#TODO
function eval_loss_accuracy(loader, model, device, phase)
    l = 0f0
    acc = 0
    ntot = 0
    for (x, y) in loader
        x, y = x |> device, y |> device
        ŷ = model(x)
        l += loss(ŷ, y) * size(x)[end]        
        acc += sum(onecold(ŷ |> cpu) .== onecold(y |> cpu))
        ntot += size(x)[end]
    end

    loss_value = l / ntot |> round4;
    accuracy = acc / ntot * 100 |> round4;

    metrics_dict = Dict(
        "$phase accuracy" => accuracy,
        "$phase loss" => loss_value,
    );

    return metrics_dict;
end;

In [None]:
train_metrics_dict = eval_loss_accuracy(train_loader, model, device, "train");
validation_metrics_dict = eval_loss_accuracy(validation_loader, model, device, "validation");

In [None]:
train_metrics_dict

In [None]:
validation_metrics_dict

### Train the Model

In [None]:
mlflow.end_run()
mlflow.start_run(run_name="FFNN");
for epoch ∈ 1:epochs

    for (x, y) in train_loader
        x, y = x |> device, y |> device;

        gs = Flux.gradient(ps) do
            ŷ = model(x);
            loss(ŷ, y);
        end

        Flux.Optimise.update!(optimiser, ps, gs);
    end

    train_metrics_dict = eval_loss_accuracy(train_loader, model, device, "train");
    validation_metrics_dict = eval_loss_accuracy(validation_loader, model, device, "validation"); 

    mlflow.log_metrics(train_metrics_dict, step=epoch);
    mlflow.log_metrics(validation_metrics_dict, step=epoch);

    # if args.checktime > 0 && epoch % args.checktime == 0
    #     !ispath(args.savepath) && mkpath(args.savepath)
    #     modelpath = joinpath(args.savepath, "model.bson") 
    #     let model = cpu(model) #return model to cpu before serialization
    #         BSON.@save modelpath model epoch
    #     end
    #     @info "Model saved in \"$(modelpath)\""
    # end
end
mlflow.end_run()

In [None]:
modelpath = "03_ffnn.bson";
let model = cpu(model) # return model to cpu before serialization
    BSON.@save modelpath model
end

### CV Results

Run command `mlflow server -p 5001` to observe experiment tracking's results.

### Test Classification Accuracy

In [None]:
test_metrics_dict = eval_loss_accuracy(test_loader, model, device, "train");
test_metrics_dict

## Convolutional Neural Network

### Model Architecture

In [None]:
cnn = Chain(
    Chain(
        Conv((2, 2), 3 => 16, relu, pad=(1, 1), stride=(1, 1)),
        MaxPool((2,2)),
        Conv((2, 2), 16 => 32, relu, pad=(1, 1), stride=(1, 1)),
        MaxPool((2,2)),
        Conv((2, 2), 32 => 64, relu, pad=(1, 1), stride=(1, 1)),
        MaxPool((2,2)),
        Dropout(0.3),
        Flux.flatten,
        Dense(64, 500, relu), 
        Dropout(0.4),
        Dense(500, 10, relu),
    ), 
    NNlib.softmax
);

In [None]:
cnn

### Define Components

In [None]:
η = 3e-4;            # learning rate
λ = 1e-5;               # L2 regularizer param, implemented as weight decay
epochs = 100;        # number of epochs
device = gpu;        # device to use
model = cnn;      # model to use

In [None]:
optimiser = ADAM(η) 
if λ > 0 # add weight decay, equivalent to L2 regularization
    opoptimisert = Optimiser(WeightDecay(λ), optimiser)
end

model = model |> device;
loss = Flux.logitcrossentropy;
ps = Flux.params(model);

### Train the Model

In [None]:
mlflow.end_run()
mlflow.start_run(run_name="CNN");
for epoch ∈ 1:epochs

    for (x, y) in train_loader
        x, y = x |> device, y |> device;

        gs = Flux.gradient(ps) do
            ŷ = model(x);
            loss(ŷ, y)
        end

        Flux.Optimise.update!(optimiser, ps, gs)
    end

    train_metrics_dict = eval_loss_accuracy(train_loader, model, device, "train")
    validation_metrics_dict = eval_loss_accuracy(validation_loader, model, device, "validation") 

    mlflow.log_metrics(train_metrics_dict, step=epoch);
    mlflow.log_metrics(validation_metrics_dict, step=epoch);

    # if args.checktime > 0 && epoch % args.checktime == 0
    #     !ispath(args.savepath) && mkpath(args.savepath)
    #     modelpath = joinpath(args.savepath, "model.bson") 
    #     let model = cpu(model) #return model to cpu before serialization
    #         BSON.@save modelpath model epoch
    #     end
    #     @info "Model saved in \"$(modelpath)\""
    # end
end
mlflow.end_run()

### CV Results

On a single GPU, 50 epochs take around 2.5 minutes, resulting in a test accuracy of 99.09%, slightly below the same result as for the original LeNet5.

Run command `mlflow server -p 5001` to observe experiment tracking's results.

### Test Classification Accuracy

In [None]:
test_metrics_dict = eval_loss_accuracy(test_loader, model, device, "train");
test_metrics_dict

## VGG16

We also need to simplify the VGG16 architecture in response to the lower dimensionality of CIFAR10 images relative to the ImageNet samples used in the competition. We use the original number of filters but make them smaller (see notebook for implementation). The summary shows the five convolutional layers followed by two fully-connected layers with frequent use of batch normalization, for a total of 21.5 million parameters:

### Model Architecture

```julia
Chain(
    Conv((3, 3), 3 => 64, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(64),
    Conv((3, 3), 64 => 64, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(64),
    MaxPool((2,2)),
    Conv((3, 3), 64 => 128, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(128),
    Conv((3, 3), 128 => 128, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(128),
    MaxPool((2,2)),
    Conv((3, 3), 128 => 256, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(256),
    Conv((3, 3), 256 => 256, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(256),
    Conv((3, 3), 256 => 256, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(256),
    MaxPool((2,2)),
    Conv((3, 3), 256 => 512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    MaxPool((2,2)),
    Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    MaxPool((2,2)),
    flatten,
    Dense(512, 4096, relu),
    Dropout(0.5),
    Dense(4096, 4096, relu),
    Dropout(0.5),
    Dense(4096, 10)
)
```

In [None]:
# VGG16 and VGG19 models
function vgg16()
    Chain(
        Conv((3, 3), 3 => 64, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(64),
        Conv((3, 3), 64 => 64, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(64),
        MaxPool((2,2)),
        Conv((3, 3), 64 => 128, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(128),
        Conv((3, 3), 128 => 128, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(128),
        MaxPool((2,2)),
        Conv((3, 3), 128 => 256, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(256),
        Conv((3, 3), 256 => 256, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(256),
        Conv((3, 3), 256 => 256, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(256),
        MaxPool((2,2)),
        Conv((3, 3), 256 => 512, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(512),
        Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(512),
        Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(512),
        MaxPool((2,2)),
        Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(512),
        Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(512),
        Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
        BatchNorm(512),
        MaxPool((2,2)),
        flatten,
        Dense(512, 4096, relu),
        Dropout(0.5),
        Dense(4096, 4096, relu),
        Dropout(0.5),
        Dense(4096, 10)
    )
end;

### Define Components

In [None]:
η = 3e-4;            # learning rate
λ = 1e-5;               # L2 regularizer param, implemented as weight decay
epochs = 100;        # number of epochs
device = gpu;        # device to use
model = vgg16();       # model to use

In [None]:
optimiser = ADAM(η) 
if λ > 0 # add weight decay, equivalent to L2 regularization
    opoptimisert = Optimiser(WeightDecay(λ), optimiser)
end

model = model |> device;
loss = Flux.logitcrossentropy;
ps = Flux.params(model);

### Train the Model

In [None]:
mlflow.end_run()
mlflow.start_run(run_name="VGG16");
for epoch ∈ 1:epochs

    for (x, y) in train_loader
        x, y = x |> device, y |> device;

        gs = Flux.gradient(ps) do
            ŷ = model(x);
            loss(ŷ, y)
        end

        Flux.Optimise.update!(optimiser, ps, gs)
    end

    train_metrics_dict = eval_loss_accuracy(train_loader, model, device, "train")
    validation_metrics_dict = eval_loss_accuracy(validation_loader, model, device, "validation") 

    mlflow.log_metrics(train_metrics_dict, step=epoch);
    mlflow.log_metrics(validation_metrics_dict, step=epoch);

    # if args.checktime > 0 && epoch % args.checktime == 0
    #     !ispath(args.savepath) && mkpath(args.savepath)
    #     modelpath = joinpath(args.savepath, "model.bson") 
    #     let model = cpu(model) #return model to cpu before serialization
    #         BSON.@save modelpath model epoch
    #     end
    #     @info "Model saved in \"$(modelpath)\""
    # end
end
mlflow.end_run()

### CV Results

On a single GPU, 50 epochs take around 2.5 minutes, resulting in a test accuracy of 99.09%, slightly below the same result as for the original LeNet5.

Run command `mlflow server -p 5001` to observe experiment tracking's results.

### Test Classification Accuracy

In [None]:
test_metrics_dict = eval_loss_accuracy(test_loader, model, device, "train");
test_metrics_dict

## Summary

For comparison, a simple two-layer feedforward network achieves only 37.36% test accuracy. 

The LeNet5 improvement on MNIST is, in fact, modest. Non-neural methods have also achieved classification accuracies greater than or equal to 99%, including K-Nearest Neighbours or Support Vector Machines. CNNs really shine with more challenging datasets as we will see next.