# Machine Learning in Julia

## A neural network in ~10 lines of Julia code

Let's define a fully-connected two layer network.

![](imgs/network.png)

In [1]:
dense(W, b, σ = identity) = x -> σ.(W * x .+ b)

chain(layers...) = foldl(∘, reverse(layers))

network = chain(
    dense(randn(5,3), randn(5), tanh), # 3 input neurons  -> 5 hidden neurons
    dense(randn(2,5), randn(2)))       # 5 hidden neurons -> 2 output neurons

var"#1#2"{Matrix{Float64}, Vector{Float64}, typeof(identity)}([-1.0240762377060506 -1.2819325295117325 … -0.2740408115390764 -0.1850921098288674; 0.326120727701528 -1.763539716217972 … 0.3966862466089038 -0.05013472177249552], [-1.1133444413263527, -0.5947276781914773], identity) ∘ var"#1#2"{Matrix{Float64}, Vector{Float64}, typeof(tanh)}([-1.1309904765541536 0.8767624550654144 1.0114105689025537; -1.1577698643262149 -1.289108744963674 1.2703457217408483; … ; 0.2779846492258669 -0.6295128982264661 2.167848219368257; -2.060810155603256 0.06517671968177069 -0.8080143517657268], [-0.7864271126297795, -0.31309794603045304, 0.542174401969975, 0.3114129090871048, -1.2042857055671528], tanh)

In [2]:
x = rand(3); # some input

In [3]:
network(x)

2-element Vector{Float64}:
 0.8185020892599661
 0.039658469715307176

Let's unravel the compact definition of our neural network a bit:

*Dense layer:*
```julia
dense(W, b, σ = identity) = x -> σ.(W * x .+ b)
```
`σ` is the activation function, `W` is the weight matrix, `x` is the input to the layer, and `b` are the biases.

*Chaining layers:*
```julia
chain(layers...) = foldl(∘, reverse(layers))
```
The `layers...` means that the `chain` function takes an arbitrary number of layer (functions) as input. Those functions are then reversed (`reverse`) and composed (`∘`) via `foldl`.

In [4]:
f(x) = x^2
g(x) = 2x

g (generic function with 1 method)

In [5]:
(f ∘ g)(3) # == f(g(3))

36

In [6]:
f(g(3))

36

In [7]:
foldl(∘, [f,g])(3)

36

Our neural network `network` is now just a piece of code, a function to be specific. We learned how to use automatic differentiation (AD) to derive functions/code. So let's do it! **Let's (fake) train our neural network.**

In [8]:
using Zygote # reverse mode AD

# let's take `sum` as our cost function for now

dnetwork = gradient(model->sum(model(x)), m)[1]

LoadError: UndefVarError: m not defined

Training just amounts to updating the weights `W` and biases `b` according to the gradients.

In [9]:
network.f.W # weights

LoadError: type ComposedFunction has no field f

In [10]:
dnetwork.f.W # weight gradients

LoadError: UndefVarError: dnetwork not defined

In [11]:
η = 0.01 # learning rate

network.f.W .-= η * dnetwork.f.W # Gradient descent!

LoadError: type ComposedFunction has no field f

## Flux - A pure-Julia machine learning library

Web page: https://fluxml.ai/, Examples: [Model zoo](https://github.com/FluxML/model-zoo/)

<img src="https://fluxml.ai/logo.png" width=300>

<img src="imgs/flux.png" width=800>

In [19]:
using Flux

In [20]:
data, labels = rand(3, 100), fill(0.5, 2, 100); # fake data

In [21]:
loss(x, y) = sum(Flux.mse(m(x), y)) # mean squared error

loss (generic function with 1 method)

In [22]:
opt = Descent(0.01) # optimizer, i.e. gradient descent

Descent(0.01)

In [23]:
Flux.train!(loss, params(m), [(data,labels)], opt)

LoadError: UndefVarError: params not defined

In [24]:
m(rand(3)) # trained model

LoadError: UndefVarError: m not defined