# Flux for ML/RL

This notebooks is going to lay out some basics about [Flux.jl](https://fluxml.ai). Unfortunately, flux currently takes quite awhile to precompile and load. This is being worked on, but I recommend evaluating the next cell before digging into the text which appears after. This will speed up after the first time as there will be cached pre-compiled version of the library (much like Plots from before).


In [None]:
using Flux, Random, Statistics, BenchmarkTools

## What is Flux?

Flux is a deep learning framework that uses source-to-source automatic differentiation through Zygote.jl. The resulting library is incredibly flexible and can deferintiate through many Julia functions right out of the box. The benefit of this is that _all_ of Flux's models are written in pure julia (even GPU operations!!), and the library can take full advantage of multiple dispatch. We will discuss the nice features which come from this down the road, but first lets start with a simple example (artificial regression with a linear model). We can then move to talk about how Flux can be used in RL research.


Because our problem is artificial, we will need to create a dataset.

In [None]:
Random.seed!(10293)

train_points = 2^14
val_points = 2^9
feature_size = 10
ϵ = 0.01f0

target_model = Chain(Dense(feature_size, 256, relu), Dense(256, 1)) # These layers default to using the global random seed!

X_train = randn(Float32, feature_size, train_points)
Y_train = target_model(X_train) + ϵ*randn(Float32, train_points)'

X_val = randn(Float32, feature_size, val_points)
Y_val = target_model(X_val) + ϵ*randn(Float32, val_points)'

Now with the dataset created, we will setup a model and do a simple training loop with mini-batch gradient descent. We will decompose some of the flux primitives afterwards.

In [None]:
batchsize = 64
opt = Descent(0.01)

model = Chain(Dense(feature_size, 64, relu), Dense(64, 1))
loss(x, y) = Flux.mse(model(x), y)

println("Initial:")
@show loss(X_train, Y_train)
@show loss(X_val, Y_val)
println()

for n ∈ 1:100
    train_loader = Flux.Data.DataLoader(X_train, Y_train, batchsize=batchsize, shuffle=true)
    Flux.train!(
        loss, Flux.params(model), train_loader, opt)
    if (n) % 10 == 0
        println("Epoch: $(n)")
        @show loss(X_train, Y_train)
        @show loss(X_val, Y_val)
        println()
    end
end



## Custom Training Loop

The first piece we need to decompose is the training loop. In the above example we are using Flux's built in `train!` function. The beauty of Julia and Flux is that this is written all using Julia (meaning we can customize our training loop w/o any extra computational cost). While not as useful for the purposes of ML, for RL this is a critical component as the training loop contains interactions with the environment and other various processing book keeping ideas.



In [None]:
function cust_train!(loss::Function, m, ps, data, opt)
    for d in data
        gs = gradient(ps) do
            training_loss = loss(m, d...)
            # Insert what ever code you want here that needs Training loss, e.g. logging
            return training_loss
        end
        # insert what ever code you want here that needs gradient
        # E.g. logging with TensorBoardLogger.jl as histogram so you can see if it is becoming huge
        Flux.Optimise.update!(opt, ps, gs)
        # Here you might like to check validation set accuracy, and break out to do early stopping
    end
end

## Custom Layer

Just like the training loop, all of Flux's layers are written in Julia. Below is an example of the a Dense layer, but there are plenty of other examples and layers (all written in Julia) found [here](https://github.com/FluxML/Flux.jl/tree/master/src/layers).

In [None]:
struct CustDense{S, B, F}
    W::S
    b::B
    σ::F
end

CustDense(W, b) = CustDense(W, b, identity)

function CustDense(in::Integer, out::Integer, σ = identity;
               initW = Flux.glorot_uniform, initb = Flux.zeros)
    return CustDense(initW(out, in), initb(out), σ)
end

(l::CustDense)(X) = l.σ.(l.W*X .+ l.b)
Flux.@functor CustDense


## Custom Optimiser

Again, there are plenty of Optimisers in [Flux](https://fluxml.ai/Flux.jl/stable/training/optimisers/) but there is ample ability to define and create your own which will have little to no extra overhead. You can model complex optimisers from the already implemented optimisers in the code base.

In [None]:
struct CustDescent
  eta::Float64
end

CustDescent() = CustDescent(0.1)

function Flux.Optimise.apply!(o::CustDescent, x, Δ)
  Δ .*= o.eta
end

## Putting it all together

Now we can put together these custom peices into something that looks almost exactly like before.

In [None]:
cust_model = Chain(CustDense(feature_size, 64, relu), CustDense(64, 1))
opt = CustDescent(0.01)

println("Initial:")
@show Flux.mse(cust_model(X_train), Y_train)
@show Flux.mse(cust_model(X_val), Y_val)
println()

for n ∈ 1:100
    train_loader = Flux.Data.DataLoader(X_train, Y_train, batchsize=batchsize, shuffle=true)
    cust_train!(cust_model, Flux.params(cust_model), train_loader, opt) do m, X, Y
        Flux.mse(m(X), Y)
    end
    if (n) % 10 == 0
        println("Epoch: $(n)")
        @show Flux.mse(cust_model(X_train), Y_train)
        @show Flux.mse(cust_model(X_val), Y_val)
        println()
    end
end


# Wrap-up

Flux is a flexible yet powerful library for working with all types of deep networks, which makes it ideal for research code! There are some rough edges (that are being actively worked on), but the whole project is open source and written in a single language (unlike many other popular packages). With the above examples you should be able to start digging into your own custom models and optimizers!
