# Machine Leaning Exercise 3: Multi-class Classification and Neural Networks

From Week 4 of Coursera course, Machine Learning by Andrew Ng: https://www.coursera.org/learn/machine-learning/. The topic is the logistic regression for clustering.

Eric Nam, https://github.com/eric-nam, 2020

In [None]:
input_layer_size = 400  # 20x20 Input Images of Digits
num_labels = 10

## Read in the dataset from a MATLAB file
The dataset contains 5,000 hand-written images in rows of a matrix (X) and labels in a vector (y). Each row is an unrolled 20 by 20 image.

In [None]:
import MAT
fpath_mat = "ex3data1.mat"
file = MAT.matopen(fpath_mat)
X = MAT.read(file, "X")
y = vec(MAT.read(file, "y"))
MAT.close(file)

## Plot random 100 examples

In [None]:
using Plots
using StatsBase
n = 10
rows = sample(1:size(X)[1], n * n, replace=false)  # Sample 100 randomly
px = isqrt(size(X)[2])
# Combine into one big image
image = zeros(px * n, px * n)
for i in 0:n-1
    for j in 0:n-1
        image[i * px + 1: (i + 1) * px, j * px + 1: (j + 1) * px] = X[rows[i * n + j + 1], :]
    end
end
heatmap(image[end:-1:1, :], c=cgrad(:grays))   # Needs to be flipped upside down

# Part 1: Multi-class classification

## Define a cost and cost gradient function
They are copied from the previous exercise.

In [None]:
"""
    sigmoid(z)

Calculate the sigmoid function, ``g(z) = \\frac{1}{1 + e^{-z}}``

# Argument
- `z::Number`: input variable

# Return
`Number`
"""
function sigmoid(z)
    1.0 / (1.0 + exp(-z))
end

In [None]:
"""
    cost(theta, x, y, lambda)

Compute the cost with with the dataset and theta

# Arguments
- `theta::{Number, 1}`: the coefficients of the cost function
- `x::Array{Number, 2}` : the independent variable matrix. The rows are examples, the columns features.
This matrix has the first column filled with ones.
- `y::Array{Number, 1}` : the dependent vector.
- `lambda::Number` : regularization coefficient

# Returns
`::Number`: cost
"""
function cost(theta, x, y, lambda)
    m, _ = size(x)
    sig = sigmoid.(x * theta)
    (- y' * log.(sig) - (1.0 .- y)' * log.(1.0 .- sig)) / m + lambda * 0.5 / m * (theta[2:end]' * theta[2:end])
end

In [None]:
"""
    ∇cost(theta, x, y, lambda)

Compute the cost with with the dataset and theta

# Arguments
- `theta::{Number, 1}`: the coefficients of the cost function
- `x::Array{Number, 2}` : the independent variable matrix. The rows are examples, the columns features.
This matrix has the first column filled with ones.
- `y::Array{Number, 1}` : the dependent vector.
- `lambda::Number` : regularization coefficient

# Returns
`::Array{Number, 1}`: cost gradient
"""
function ∇cost(theta, x, y, lambda)
    m, n = size(x)
    sig = sigmoid.(x * theta)
    lambdas = fill(lambda, n)
    lambdas[1] = 0.
    (x' * (sig - y) + lambdas .* theta) / m
end

In [None]:
theta_t = [-2, -1, 1, 2]
X_t = hcat(ones(5), reshape(1:15, (5, 3)) / 10.)
y_t = [1, 0, 1, 0, 1] .>= 0.5
lambda_t = 3;

In [None]:
cost(theta_t, X_t, y_t, lambda_t)

In [None]:
∇cost(theta_t, X_t, y_t, lambda_t)

# One-vs-all training
Find solutions using the cost and gradient function.

In [None]:
using Optim

### Optimization conditions

In [None]:
m, n = size(X)
x = hcat(ones(m), X)
lambda = 0.1
theta_init = zeros(n + 1);

### Optimizing ten one-vs-all cases

In [None]:
thetas = map(label -> Optim.minimizer(optimize(t -> cost(t, x, y .== label, lambda), 
                                               t -> ∇cost(t, x, y .== label, lambda),
                                               theta_init, 
                                               inplace=false)),
             1:10);

In [None]:
thetas = hcat(thetas...);

## Apply the solution to the data and compare with the labels
The expected accuracy from the problem is 94.9%

In [None]:
sum(map(x -> x[2], argmax(x * thetas, dims=2)) .== y) / m * 100

# Part 2: Neural Network

## Read the weight from the file

In [None]:
import MAT
fpath_mat = "ex3weights.mat"
file = MAT.matopen(fpath_mat)
println(MAT.names(file))
theta1 = MAT.read(file, "Theta1")
theta2 = MAT.read(file, "Theta2")
MAT.close(file)

## Set some parameters

In [None]:
input_layer_size  = 400 # 20x20 Input Images of Digits
hidden_layer_size = 25  # 25 hidden units
num_labels = 10         # 10 labels, from 1 to 10   

## Neural network calculation

### Calculate activation

In [None]:
activation1 = sigmoid.(x * theta1');

### Pick the maximum activations

In [None]:
predictions = map(x -> x[2], argmax(hcat(ones(size(activation1)[1]), activation1) * theta2', dims=2));

### Calculate the accuracy
The answer would be 97.5% from the instruction.

In [None]:
accuracy = sum(predictions .== y) / m * 100

The interactive part of the exercise is skipped.