In [91]:
using MLDatasets
using PyPlot
using Random, Statistics
using Flux

# Rapid intro to supervised learning with neural nets I: from scratch

This notebook gives a rapid introduction to supervised learning with neural networks. The example is based on [Chapter 1 of Nielsen's online book "Neural Networks and Deep Learning"](http://neuralnetworksanddeeplearning.com/chap1.html) and it guides you to set up the neural network training using the [Flux](https://fluxml.ai/Flux.jl/stable/) Julia package.

For further reading I recommend also the review article ["A high-bias, low-variance introduction to Machine Learning for physicists"](https://arxiv.org/abs/1803.08823).

## A few words about Flux

## The MNIST hand-written digits data set

Let's first get a simple exemplary data set - the MNIST hand-written digits. The following cell downloads both the test and training parts of the data set.

In [70]:
# load full training set
train_x, train_y = float.(MNIST.traindata())
# load full test set
test_x,  test_y  = float.(MNIST.testdata());

`trainData` is now a array of shape `(28,28, 60000)`, meaning that we have 60k images of 28$\times$28 pixels (grayscale), each showing one hand-written digit. `trainLabels` holds the corresponding *labels*, i.e. an integer for each image, stating which digit it shows.

## Defining a neural network model using Flux

In [48]:
function build_model(layers; imgsize=(28,28))
    m = Dense(prod(imgsize), layers[1], sigmoid)
    for j in 2:length(layers)
        m = Chain(m, Dense(layers[j-1], layers[j],sigmoid))
    end
    return m
end

build_model (generic function with 1 method)

Now we can again what the network thinks about our images of digits. For this purpose we define `initialize_network` and `neural_network` analogous to part I of the tutorial, but this time based on our `MyNet` class.

In [49]:
net_layers=[100,10]

neural_network = build_model(net_layers)
params = Flux.params(neural_network) 

neural_network(reshape(train_x[:,:,1],28*28))

Tracked 10-element Vector{Float32}:
 0.428681f0
 0.64658606f0
 0.52417076f0
 0.61604553f0
 0.4864485f0
 0.5423035f0
 0.25397527f0
 0.4249637f0
 0.73968923f0
 0.3823749f0

Next, we need a cost function. This is the same as in the previous notebook.

In [53]:
function cost_function(predictions, labels)
    """This function evaluates the cost function for given predictions and labels
    Args:
    * predictions: Predictions from neural net. Array of shape mathcal T x 10.
    * labels: Correct labels for the corresponding images. Array of mathcal T integers.
    Returns: Cost associated with the neural network predictions for the given data.
    """

    labels = Flux.onehotbatch(labels, 0:9)

    cost = sum((predictions-labels).^2)
    return cost / size(labels)[2]
end

cost_function (generic function with 1 method)

With this, we can check the performance of our randomly initialized network in classifying some of our images:

In [74]:
batch = train_x[:,:,1:128]    # select a batch of images
labels = train_y[1:128] # and corresponding labels

# ! compute neural network predictions
predictions = neural_network(reshape(batch, 28*28,size(batch)[3]))

# ! evaluate the cost function
cost_function(predictions,labels)

2.7701077f0 (tracked)

Now, what is missing is a function to compute the gradients of the cost function. This is easily solved using `Flux.gradient()` for automatic differentiation:

In [112]:
function cost_function_gradient(net, params, batch, labels)
    return Flux.gradient(() -> cost_function(net(reshape(batch, 28*28,size(batch)[3])),labels), params) 
end

cost_function_gradient (generic function with 1 method)

Finally, we are ready to train the network:

In [113]:
function evaluate_predictions(predictions, labels)
    """This is a helper function that counts how many of the given predictions match the labels.
    Args:
    * `predictions`: Predictions from neural network (=activations on output layer)
    * `labels`: correct labels
    Returns: Number of correct predictions, i.e., number of cases, in which the index of the maximal 
    activation matches the given label.
    """
    pred_labels = [Int(findmax(predictions[:,i])[2])-1 for i in 1:size(predictions)[2]]
        
    return sum(pred_labels .== labels)
end


prng_key = Random.seed!(1234)

neural_network = build_model(net_layers)
params = Flux.params(neural_network) 

# Here we define the hyperparamters
num_epochs = 10 # Number of epochs to loop over
learning_rate = 0.001 # Learning rate
batch_size = 128 # Size of mini-batches

# Compute the number of mini-batches that matches the chosen mini-batch size
batch_number = floor(Int,size(train_x)[end] / batch_size)

# Evaluate network and assess performance
predictions = neural_network(reshape(test_x,28*28,size(test_x)[3]))
current_cost = cost_function(predictions, test_y)
correct_predictions = evaluate_predictions(predictions, test_y)
println("Initial cost: $(current_cost)")
println("Correctly predicted labels: $(correct_predictions) / $(length(test_y))")

for n in 1:num_epochs
    
    println("Episode $(n)")
    order = shuffle(1:length(train_y))
    samples, labels = ( reshape(train_x[:,:,order][:,:,1:Int(batch_number*batch_size)], 28,28,128,:), 
        reshape(train_y[order][1:Int(batch_number*batch_size)], 128,:))

    
    for i in 1:batch_number

        # Compute gradients
        gs=cost_function_gradient(neural_network, params, samples[:,:,:,i], labels[:,i])
  
        # Perform SGD parameter update step
        for p in params
            Flux.Optimise.update!(p,-learning_rate*gs[p])
        end
        
    end

    # Evaluate network and assess performance
    predictions = neural_network(reshape(test_x,28*28,size(test_x)[3]))
    current_cost = cost_function(predictions, test_y)    
    correct_predictions = evaluate_predictions(predictions, test_y)
    println("Current cost: $(current_cost)")
    println("Correctly predicted labels: $(correct_predictions)/$(length(test_y))")
end

Initial cost: 2.9444165f0 (tracked)
Correctly predicted labels: 1067 / 10000
Episode 1
Current cost: 1.0339497f0 (tracked)
Correctly predicted labels: 2503/10000
Episode 2
Current cost: 0.9229258f0 (tracked)
Correctly predicted labels: 3013/10000
Episode 3
Current cost: 0.89791876f0 (tracked)
Correctly predicted labels: 3280/10000
Episode 4
Current cost: 0.8880436f0 (tracked)
Correctly predicted labels: 3401/10000
Episode 5
Current cost: 0.8825424f0 (tracked)
Correctly predicted labels: 3452/10000
Episode 6
Current cost: 0.8786173f0 (tracked)
Correctly predicted labels: 3505/10000
Episode 7
Current cost: 0.87532216f0 (tracked)
Correctly predicted labels: 3533/10000
Episode 8
Current cost: 0.8722753f0 (tracked)
Correctly predicted labels: 3553/10000
Episode 9
Current cost: 0.8693074f0 (tracked)
Correctly predicted labels: 3572/10000
Episode 10
Current cost: 0.8663381f0 (tracked)
Correctly predicted labels: 3612/10000
