In [1]:
import Pkg; Pkg.add(Pkg.PackageSpec(url="https://github.com/JuliaComputing/JuliaAcademyData.jl"))
using JuliaAcademyData; activate("Deep learning with Flux")

[?25l    

[32m[1m    Cloning[22m[39m git-repo `https://github.com/JuliaComputing/JuliaAcademyData.jl`




[32m[1m   Updating[22m[39m git-repo `https://github.com/JuliaComputing/JuliaAcademyData.jl`


[?25h

[32m[1m   Updating[22m[39m registry at `~/.julia/registries/General`
######################################################################### 100.0%
[32m[1m  Resolving[22m[39m package versions...
[32m[1mUpdating[22m[39m `~/.julia/environments/v1.5/Project.toml`
 [90m [18b7da76] [39m[92m+ JuliaAcademyData v0.1.0 `https://github.com/JuliaComputing/JuliaAcademyData.jl#master`[39m
[32m[1mUpdating[22m[39m `~/.julia/environments/v1.5/Manifest.toml`
 [90m [18b7da76] [39m[92m+ JuliaAcademyData v0.1.0 `https://github.com/JuliaComputing/JuliaAcademyData.jl#master`[39m
┌ Info: Precompiling JuliaAcademyData [18b7da76-0988-5e3b-acac-6290be3a708f]
└ @ Base loading.jl:1278
[32m[1m Activating[22m[39m environment at `~/.julia/packages/JuliaAcademyData/1to3l/courses/Deep learning with Flux/Project.toml`
[32m[1m  Installed[22m[39m ZygoteRules ─────────────── v0.2.0
[32m[1m  Installed[22m[39m Missings ────────────────── v0.4.3
[32m[1m  Installed[22m[39m CatIndices 

<br/>

# Intro to Flux.jl

In the previous course, we learned how machine learning allows us to classify data as apples or bananas with a single neuron. However, some of those details are pretty fiddly! Fortunately, Julia has a powerful package that does much of the heavy lifting for us, called [`Flux.jl`](https://fluxml.github.io/).

*Using `Flux` will make classifying data and images much easier!*

## Using `Flux.jl`

We can get started with `Flux.jl` via:

In [None]:
# using Pkg; Pkg.add(["Flux", "Plots"])
using Flux, Plots

┌ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
└ @ Base loading.jl:1278
└ @ Base deprecated.jl:204
└ @ Base deprecated.jl:204
└ @ Base deprecated.jl:204


#### Helpful built-in functions

When working we'll `Flux`, we'll make use of built-in functionality that we've had to create for ourselves in previous notebooks.

For example, the sigmoid function, σ, that we have been using already lives within `Flux`:

In [None]:
?σ

In [None]:
plot(σ, -5, 5, label="\\sigma", xlabel="x", ylabel="\\sigma\\(x\\)")

Importantly, `Flux` allows us to *automatically create neurons* with the **`Dense`** function. For example, in the last notebook, we were looking at a neuron with 2 inputs and 1 output:

 <img src="https://raw.githubusercontent.com/JuliaComputing/JuliaAcademyData.jl/master/courses/Deep%20learning%20with%20Flux/data/single-neuron.png" alt="Drawing" style="width: 500px;"/>

 We could create a neuron with two inputs and one output via

In [None]:
model = Dense(2, 1, σ)

This `model` object comes with places to store weights and biases:

In [None]:
model.W

In [None]:
model.b

In [None]:
typeof(model.W)

In [None]:
x = rand(2)
model(x)

In [None]:
σ.(model.W*x + model.b)

Unlike in previous notebooks, note that `W` is no longer a `Vector` (1D `Array`) and `b` is no longer a number! Both are now stored in so-called `TrackedArray`s and `W` is effectively being treated as a matrix with a single row. We'll see why below.

Other helpful built-in functionality includes ways to automatically calculate gradients and also the cost function that we've used in the previous course -

$$L(w, b) = \sum_i \left[y_i - f(x_i, w, b) \right]^2$$

If you normalize by dividing by the total number of elements, this becomes the "mean square error" function, which in `Flux` is named **`Flux.mse`**.

In [None]:
methods(Flux.mse)

### Bringing it all together

Load the datasets that contain the features of the apple and banana images.

In [None]:
using CSV, DataFrames

apples = DataFrame(CSV.File(datapath("data/apples.dat"), delim='\t', allowmissing=:none, normalizenames=true))
bananas = DataFrame(CSV.File(datapath("data/bananas.dat"), delim='\t', allowmissing=:none, normalizenames=true));

In [None]:
x_apples  = [ [row.red, row.green] for row in eachrow(apples)]
x_bananas = [ [row.red, row.green] for row in eachrow(bananas)];

Concatenate the x (features) together to create a vector of all our datapoints, and create the corresponding vector of known labels:

In [None]:
xs = [x_apples; x_bananas]
ys = [fill(0, size(x_apples)); fill(1, size(x_bananas))];

In [None]:
model = Dense(2, 1, σ)

We can evaluate the model (currently initialized with random weights) to see what the output value is for a given input:

In [None]:
model(xs[1])

And of course we can examine the current loss value for that datapoint:

In [None]:
loss = Flux.mse(model(xs[1]), ys[1])

In [None]:
typeof(loss)

### Backpropagation

In [None]:
model.W

In [None]:
model.W.grad

In [None]:
using Flux.Tracker
back!(loss)

In [None]:
model.W.grad

Now we have all the tools necessary to build a simple gradient descent algorithm!

### The easy way

You don't want to manually write out gradient descent algorithms every time! Flux, of course, also brings in lots of optimizers that can do this all for you.

In [None]:
?SGD

In [None]:
?Flux.train!

So we can simply define our loss function, an optimizer, and then call `train!`. That's basic machine learning with Flux.jl.

In [None]:
model = Dense(2, 1, σ)
L(x,y) = Flux.mse(model(x), y)
opt = SGD(params(model))
Flux.train!(L, zip(xs, ys), opt)

## Visualize the result

In [None]:
contour(0:.1:1, 0:.1:1, (x, y) -> model([x,y])[].data, fill=true)
scatter!(first.(x_apples), last.(x_apples), label="apples")
scatter!(first.(x_bananas), last.(x_bananas), label="bananas")
xlabel!("mean red value")
ylabel!("mean green value")