In [1]:
# Use package versions builtin to this repository.
import Pkg, Random
Pkg.activate(@__DIR__)
Pkg.instantiate()

# Load Flux and PlotlyJS for sweet interactive graphics
using Flux, PlotlyJS

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[?25l[2K[?25h

# Flux by Example: Layers

Continuing from the previous example, we have run into some problems with using polynomials as the fundamental unit of computation for building our nonlinear function approximator.  Although our results improve as we increase the order of the polynomial, we rapidly run out of precision due to numbers being raised to very large powers; coefficients then needing to become extremely small, etc...

This brings us to the beginnings of modern deep learning, with the humble fully connected layer and activation function.  We will use an affine transformation and simple nonlinearity as a building block, and then _compose_ those simple building blocks so as to create a model where each piece is itself very simple, but the overall model expressiveness is sufficient for the most complex of functions.  Stating this mathematically, our basic building block (what we will refer to as a "fully connected layer" with a "relu activation") is:

$$
    f(x) = \text{relu}(Wx + b)
$$

Where $\text{relu}(x)$ is a simple nonlinearity, applied element-by-element to its input:

$$
    \text{relu}(x) = \begin{cases}
        x & \quad x > 0 \\
        0 & \quad x \le 0
    \end{cases}
$$

Defining this building block in Flux is, as always, very simple:

In [9]:
struct FullyConnected
    W
    b
end

Flux.@treelike FullyConnected

function (fc::FullyConnected)(x)
    return relu.(fc.W*x .+ fc.b)
end

Our model will then be defined as a _composition_ of these building blocks, each with their own $W$ and $b$ parameters.  For example, with a stack of three of these building blocks, our model would be:

$$
    model(x) = f_3(f_2(f_1(x)))
$$

Flux gives us a convenient abstraction for stacking multiple building blocks (often called "layers") on top of eachother; the `Chain()` method.  We will create a model here with three layers.

In [11]:
model = Chain(
    FullyConnected(param(randn(1,1)), param(randn(1))),
    FullyConnected(param(randn(1,1)), param(randn(1))),
    FullyConnected(param(randn(1,1)), param(randn(1))),
)

model(1.0)

Tracked 1×1 Array{Float64,2}:
 0.0

Once again, we define our training loop:

In [12]:
# Define training loop function; takes in a model to train, an
# optimizer and a list of tuples mapping input (`x`) to output (`y`).
function train(model, opt, training_data::Vector{T}) where {T <: Tuple}
    for (x, y) in training_data
        # Push `x` through the model
        y_hat = model(x)
        
        # Calculate the loss and backpropagate it
        loss = sum((y_hat .- y).^2)/length(y)
        Flux.back!(loss)
        
        # Update the weights by taking an optimizer step
        opt()
    end
end

train (generic function with 1 method)