# Welcome to the Flux Breakdown
## This is a work in progress - Due to completion by the end of the week

In this demonstration, we take the most basic, but important, building block of deep learning - the perceptron, and build it up to contextualize images. Most of these blocks have brief explainations as to their intention, but if something is confusing, feel free to let me know!

In [None]:
using Flux

## Part One: A Single Perceptron:
Here we will build a single node, and demonstrate the properties of said node. Start with an x and y that we will use as the inputs and expected outputs of our system. Notice the dimentionality of y doesnt correlate with the output of a single perceptron, we will make use of this one later.

In [None]:
x = [1]
y = [0.6, 0.4]

Using the dense layer constructor, we will create and instance of a single perceptron with one input and one output. This will also be intialized with a sigmoid nonlinearity function (σ)

In [None]:
single_perceptron_model = Dense(1,1,σ)

We can look through the properties of this model, like its weight and bias terms. This is shown below, we can print out the arrays holding these values

In [None]:
single_perceptron_model.W

In [None]:
single_perceptron_model.b

Say we want to see how the model predicts a value on the basis of our input. By using it as a function with the parameter x, we can see how it predicts this value

In [None]:
single_perceptron_model(x)

Notice that this is no different that making use of the weight and the bias terms in the form of a linear equation alongside the sigmoid function. This is shown by calculating this directly below

In [None]:
σ.(single_perceptron_model.W*[1] + single_perceptron_model.b)

## Part Two: Multi-Dimentionality:
Here we will build a chain of outputs to the perceptron, and observe the changes in the weights and bias terms that allow us to visualize this

In [None]:
single_dense_model = Dense(1,2,σ)

The weight and bias arrays are no longer single dimentional. Finally, we can see that perceptrons are built on vector multiplication, and operations like these are optimized when we enter hardware accelerators, GPUs and TPUs

In [None]:
single_dense_model.W

In [None]:
single_dense_model.b

Just like before, we can test the input to the model, this should be the same as taking the multiplications of these vectors, which each element operating on the inner product

In [None]:
single_dense_model(x)

In [None]:
σ.(single_dense_model.W*x + single_dense_model.b)

## Part Three: Chaining Operations Together:
What happpens when we want to add continuous layers together to produce a model. We quickly observe the use of the Chain function, which is the basis of creating larger systems in flux. We know one option to chaining the output to first perceptron to the second, is by placing the call to run the model inside the other. 

In [None]:
single_dense_model(single_perceptron_model([1]))

But doing so is pretty inefficient, we can get the same result, while saving lines and saving the new model using Chain. Chain will also be helpful as we begin to introduce new deep learning elements to the party

In [None]:
small_mlp = Chain(single_perceptron_model,single_dense_model)

In [None]:
small_mlp(x)

## Part Four: Getting The Right Results:
Our outputs aren't quite what we wanted. How do we quantify this, and how can we adjust the perceptrons we have to get closer to our desired output. To do this, we need to be able to quantify how far we are, and which way to move

In [None]:
using Flux: mse

Lets start by defining loss using a mean squared error operation, which will aggregate the squared error among all of the expected and actual values. Using this, we can find the right direction to move in during training

In [None]:
loss(x,y) = mse(small_mlp(x),y)

In [None]:
loss(x,y)

define an optimization algo ... talk about this more in a sec

In [None]:
opt = ADAM()

We can use the train! function to shift our perceptron parameters in the right direction. Notice that after using the training, we are a little closer to the goal, but we aren't quite there, maybe if we do this enough times we can get it, but how will we do that?

In [None]:
Flux.train!(loss, params(small_mlp), [(x, y)], opt)

In [None]:
small_mlp(x)

In [None]:
loss(x,y)

In [None]:
∂f_∂x(x)

In [None]:
using Flux: @epochs

## Part X: Putting it All Together - Wroking with Images
In this final section, we take an MNIST dataset of image classification for handwrting, and use the building blocks of multilayer perceptrons to contruct identfication of the 9 digits that may exist in each image. Credit to the Flux Model Zoo Github for the original demonstration.

https://github.com/FluxML/model-zoo

In [None]:
@with_kw mutable struct Args
    η::Float64 = 3e-4       # learning rate
    batchsize::Int = 1024   # batch size
    epochs::Int = 10        # number of epochs
    device::Function = gpu  # set as gpu, if gpu available
end

In [None]:
function getdata(args)
    ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"

    # Loading Dataset
    xtrain, ytrain = MLDatasets.MNIST.traindata(Float32)
    xtest, ytest = MLDatasets.MNIST.testdata(Float32)

    # Reshape Data in order to flatten each image into a linear array
    xtrain = Flux.flatten(xtrain)
    xtest = Flux.flatten(xtest)

    # One-hot-encode the labels
    ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9)

    # Batching
    train_data = DataLoader(xtrain, ytrain, batchsize=args.batchsize, shuffle=true)
    test_data = DataLoader(xtest, ytest, batchsize=args.batchsize)

    return train_data, test_data
end

In [None]:
function build_model(; imgsize=(28,28,1), nclasses=10)
    return Chain(
    Dense(prod(imgsize), 32, relu),
            Dense(32, nclasses))
end

In [None]:
function loss_all(dataloader, model)
    l = 0f0
    for (x,y) in dataloader
        l += logitcrossentropy(model(x), y)
    end
    l/length(dataloader)
end

In [None]:
function accuracy(data_loader, model)
    acc = 0
    for (x,y) in data_loader
        acc += sum(onecold(cpu(model(x))) .== onecold(cpu(y)))*1 / size(x,2)
    end
    acc/length(data_loader)
end

In [None]:
function train(; kws...)
    # Initializing Model parameters 
    args = Args(; kws...)

    # Load Data
    train_data,test_data = getdata(args)

    # Construct model
    m = build_model()
    train_data = args.device.(train_data)
    test_data = args.device.(test_data)
    m = args.device(m)
    loss(x,y) = logitcrossentropy(m(x), y)
    
    ## Training
    evalcb = () -> @show(loss_all(train_data, m))
    opt = ADAM(args.η)

    @epochs args.epochs Flux.train!(loss, params(m), train_data, opt, cb = evalcb)

    @show accuracy(train_data, m)

    @show accuracy(test_data, m)
end