In [1]:
# Use package versions builtin to this repository.
import Pkg, Random
Pkg.activate(@__DIR__)
Pkg.instantiate()

# Load Flux and PlotlyJS for sweet interactive graphics (well, once it's fixed.  :()
using Flux, PlotlyJS

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`

# Flux by Example: Models

Let's continue our example from earlier, doing polynomial approximation of arbitrary functions.  We are now going to structure things a bit more, making use of Julia syntax and operations.  We'll start by grouping our state and functionality together, building a custom type that we can use as a fundamental building block from here on out:

In [2]:
# Define a simple custom type that contains a single field; coeffs
struct Polynomial
    coeffs
end

# This tells Flux that Polynomial has children; e.g. fields that might
# have parameters inside of them, etc... It enables introspection.
Flux.@treelike Polynomial

# Function to evaluate a polynomial at a certain point
function (p::Polynomial)(x)
    return sum([p.coeffs[idx] .* x^(idx-1) for idx in 1:length(p.coeffs)])
end

In [3]:
# Quick sanity check; create polynomial
P = Polynomial([1,5,3])

# Ensure that it calculates properly:
println("Polynomial-calculated: $(P(1.5))")
println("Manually-calculated:   $(1 * 1.5^0 + 5 * 1.5 + 3 * 1.5^2)")

Polynomial-calculated: 15.25
Manually-calculated:   15.25


## The Training Loop

Great; now let's write ourselves a training loop.  A training loop, at its heart, does the following:

* Accept a batch of inputs (`x`) and outputs (`y`).

* Push the inputs (`x`) through the model to generate estimated outputs (`y_hat`)

* Calculate the difference between the estimated outputs (`y_hat`) and true outputs (`y`), typically called the "loss" of the model.

* Propagate the loss back through the model and use that to update the weights of the model to be more correct.

This is quite easy with Flux, here we go:

In [4]:
# Define training loop function; takes in a model to train, an
# optimizer and a list of tuples mapping input (`x`) to output (`y`).
function train(model, opt, training_data::Vector{T}) where {T <: Tuple}
    for (x, y) in training_data
        # Push `x` through the model
        y_hat = model(x)
        
        # Calculate the loss and backpropagate it
        loss = sum((y_hat .- y).^2)/length(y)
        Flux.back!(loss)
        
        # Update the weights by taking an optimizer step
        opt()
    end
end

train (generic function with 1 method)

In order to use this `train()` method we need three things; a model (`model`), an optimizer (`opt`) and training data (`training_data`).  Let's start with the model: in order for our model to "learn", we need to tell `Flux` that it can modify certain numbers within our model.  We do that using the `param()` function:

In [5]:
model = Polynomial(param([1,5,3]))

model(1.5)

15.25 (tracked)

Note that the model is functionally identical to the definition of `P` above, the only difference is that we have wrapped our coefficients in a `param()` call and that the result now says `(tracked)` after it, which is a hint that the data `model` is generating is tracking operations performed upon it so that backpropagation and properly update the parameters within the model.

Next up, the optimizer.  An optimizer is an object that knows which parameters should be tweaked within a model, and has a simple algorithm for nudging weights toward the optimal value whenever loss has been backpropagated, indicating which direction a weight should be nudged.  We will use the `Momentum` optimizer as it is a good default for many situations, and feed it the parameters of our model:

In [6]:
opt = Flux.Optimise.Momentum(params(model))

#43 (generic function with 1 method)

Note that here we use the `params()` method on our `model` to extract all parameters from it, and pass it to the `Momentum` constructor.  At this point we are finally ready to create our training dataset.  We'll evaluate ourselves on a variety of nonlinear functions, but let's start with the step function:

In [8]:
# Given a function to generate true outputs from, a range to draw
# random inputs over, and a number of draws to perform, build a
# training data set.
function create_training_set(func, min_val, max_val, N)
    # Create N random numbers, distribute them evenly between min_val and max_val
    xs = rand(N).*(max_val - min_val) .+ min_val
    
    # Push each number through func(), save it as ys
    ys = func.(xs)
    
    # Return array of tuples
    return [(xs[i], ys[i]) for i in 1:N]
end


# Define the step function
step(x) = Float64(x > 0.0)

# Create 300 random correspondences from [-3, 3]
step_data = create_training_set(step, -3, 3, 300)

# Scatter plot the dataset
plot([
    scatter(;
        x=[d[1] for d in step_data],
        y=[d[2] for d in step_data],
        name="step(x)",
        mode="markers",
    ),
])

## Running the Training Loop

Let's start with training on the step data; we will always redefine our model and optimizer so as to make jumping around cells interactively less confusing.  We're also going to initialize our models with random parameters to show that there is nothing "special" about the parameters we start with (other than that they are random):

In [9]:
# Let's start with training on the step data.  We redefine our model and optimizer
# here to make it easier to jump around in cells in the future.  We also give the
# Momentum optimizer a very small learning rate, because polynomials are trickys beasts.
model = Polynomial(param(randn(3)))
opt = Flux.Optimise.Momentum(params(model), 1e-6)

for epoch in 1:1000
    train(model, opt, step_data)
end

In [11]:
# Scatter plot the dataset
function plot_model_performance(model, func, minval, maxval)
    x_test = rand(minval:.01:maxval, 200)
    plot([
        scatter(;
            x=x_test,
            y=func.(x_test),
            mode="markers",
            name="target"
        ),
        scatter(;
            x=x_test,
            y=Flux.Tracker.data.(model.(x_test)),
            mode="markers",
            name="model output"
        ),
    ])
end

plot_model_performance(model, step, -3, 3)

In [12]:
# Okay, that didn't work well at all.  Let's try with a higher-order polynomial:
model = Polynomial(param(randn(4)))
opt = Flux.Optimise.Momentum(params(model), 1e-6)

for epoch in 1:1000
    train(model, opt, step_data)
end

plot_model_performance(model, step, -3, 3)

In [14]:
# Five-parameter model of a sinusoid
model = Polynomial(param(randn(4)))
opt = Flux.Optimise.Momentum(params(model), 1e-6)
sin_data = create_training_set(sin, -3, 3, 300)

for epoch in 1:1000
    train(model, opt, sin_data)
end

plot_model_performance(model, sin, -3, 3)