# Deep Learning with Flux.jl

Julia is a general purpose language that is easy to write and with fast numerics. No other language satisfies both of those design goals.

Flux.jl is a machine learning library written in Julia.

There is a Model Zoo: https://github.com/FluxML/model-zoo

Flux runs of TPU: https://medium.com/syncedreview/google-cloud-tpus-now-speak-julia-cefd15a2a060

In [1]:
using Pkg
Pkg.activate(".")
Pkg.add(["Flux"])

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[?25l[2K[?25h[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `Manifest.toml`
[90m [no changes][39m


## MNIST MLP with Flux.jl

MNIST hand written digits can be classified based on images with simple methods like Multlayer Perceptron (MLP).

In [2]:
using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated
# using CuArrays

┌ Info: Recompiling stale cache file /Users/jpf/.julia/compiled/v1.0/Flux/QdkVy.ji for Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
└ @ Base loading.jl:1184


In [3]:
# Classify MNIST digits with a simple multi-layer-perceptron

In [4]:
imgs = MNIST.images()
# Stack images into one large batch
X = hcat(float.(reshape.(imgs, :))...) |> gpu

784×60000 Array{Float64,2}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  

In [5]:
labels = MNIST.labels()
# One-hot-encode the labels
Y = onehotbatch(labels, 0:9) |> gpu

10×60000 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 false   true  false  false  false  …  false  false  false  false  false
 false  false  false   true  false     false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false   true  false  false  false
 false  false   true  false  false     false  false  false  false  false
  true  false  false  false  false  …  false  false   true  false  false
 false  false  false  false  false     false  false  false   true  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false      true  false  false  false   true
 false  false  false  false   true     false  false  false  false  false

In [6]:
m = Chain(
  Dense(28^2, 32, relu),
  Dense(32, 10),
  softmax) |> gpu

Chain(Dense(784, 32, NNlib.relu), Dense(32, 10), NNlib.softmax)

In [7]:
m

Chain(Dense(784, 32, NNlib.relu), Dense(32, 10), NNlib.softmax)

In [8]:
params(m)

4-element Array{Any,1}:
 Flux.Tracker.TrackedReal{Float64}[0.0199117 (tracked) 0.00191755 (tracked) … -0.0423476 (tracked) -0.0371253 (tracked); 0.0415548 (tracked) 0.0331267 (tracked) … -0.00308399 (tracked) 0.0646995 (tracked); … ; 0.00261522 (tracked) -0.0612279 (tracked) … -0.0293012 (tracked) -0.0276119 (tracked); 0.0450792 (tracked) 0.0725693 (tracked) … 0.0393874 (tracked) -0.0424292 (tracked)]
 Flux.Tracker.TrackedReal{Float64}[0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked)  …  0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked), 0.0 (tracked)]                                           
 Flux.Tracker.TrackedReal{Float64}[0.314871 (tracked) 0.274123 (tracked) … 0.017549 (tracked) -0.162626 (tracked); 0.0753589 (tracked) -0.127534 (tracked) … 0.263062 (tracked) -0.163508 (tracked); … ; -0.274863 (tr

In [9]:
loss(x, y) = crossentropy(m(x), y)

accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))

dataset = repeated((X, Y), 200)
evalcb = () -> @show(loss(X, Y))
opt = ADAM(params(m))

#43 (generic function with 1 method)

In [10]:
Flux.train!(loss, dataset, opt, cb = throttle(evalcb, 10))

accuracy(X, Y)

loss(X, Y) = 2.312386281511893 (tracked)
loss(X, Y) = 1.5286909848020354 (tracked)
loss(X, Y) = 0.9610744448267889 (tracked)
loss(X, Y) = 0.6355869086998852 (tracked)
loss(X, Y) = 0.5087488408718913 (tracked)
loss(X, Y) = 0.42951634445743014 (tracked)
loss(X, Y) = 0.38243854703592883 (tracked)
loss(X, Y) = 0.3459428071944185 (tracked)
loss(X, Y) = 0.3214095167352304 (tracked)
loss(X, Y) = 0.2991887212550725 (tracked)
loss(X, Y) = 0.28387892812728444 (tracked)


0.9258833333333333

MLP convergence is monotonic!

In [11]:
# Test set accuracy
tX = hcat(float.(reshape.(MNIST.images(:test), :))...) |> gpu
tY = onehotbatch(MNIST.labels(:test), 0:9) |> gpu

accuracy(tX, tY)

0.9265

# MLP summary

1. 92% accuracy is pretty good and would make a handwritten digit recognizer almost usable.
2. 92.17% testing accuracy and 92.22% training accuracy means more training data will not improve the model much.
3. This model is really simple.

In [12]:
using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated, partition
# using CuArrays

# Classify MNIST digits with a convolutional network

imgs = MNIST.images()

labels = onehotbatch(MNIST.labels(), 0:9)

10×60000 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 false   true  false  false  false  …  false  false  false  false  false
 false  false  false   true  false     false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false   true  false  false  false
 false  false   true  false  false     false  false  false  false  false
  true  false  false  false  false  …  false  false   true  false  false
 false  false  false  false  false     false  false  false   true  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false      true  false  false  false   true
 false  false  false  false   true     false  false  false  false  false

In [13]:
dump(labels)

Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}
  height: Int64 10
  data: Array{Flux.OneHotVector}((60000,))
    1: Flux.OneHotVector
      ix: UInt32 0x00000006
      of: UInt32 0x0000000a
    2: Flux.OneHotVector
      ix: UInt32 0x00000001
      of: UInt32 0x0000000a
    3: Flux.OneHotVector
      ix: UInt32 0x00000005
      of: UInt32 0x0000000a
    4: Flux.OneHotVector
      ix: UInt32 0x00000002
      of: UInt32 0x0000000a
    5: Flux.OneHotVector
      ix: UInt32 0x0000000a
      of: UInt32 0x0000000a
    ...
    59996: Flux.OneHotVector
      ix: UInt32 0x00000009
      of: UInt32 0x0000000a
    59997: Flux.OneHotVector
      ix: UInt32 0x00000004
      of: UInt32 0x0000000a
    59998: Flux.OneHotVector
      ix: UInt32 0x00000006
      of: UInt32 0x0000000a
    59999: Flux.OneHotVector
      ix: UInt32 0x00000007
      of: UInt32 0x0000000a
    60000: Flux.OneHotVector
      ix: UInt32 0x00000009
      of: UInt32 0x0000000a


## What is a OneHotMatrix?

OneHotMatrix takes advantage of implicit data structures. You can store a 

```julia
struct OneHotVector <: AbstractVector{Bool}
  ix::UInt32
  of::UInt32
end

Base.size(xs::OneHotVector) = (Int64(xs.of),)

Base.getindex(xs::OneHotVector, i::Integer) = i == xs.ix

A::AbstractMatrix * b::OneHotVector = A[:, b.ix]

struct OneHotMatrix{A<:AbstractVector{OneHotVector}} <: AbstractMatrix{Bool}
  height::Int
  data::A
end

Base.size(xs::OneHotMatrix) = (Int64(xs.height),length(xs.data))

Base.getindex(xs::OneHotMatrix, i::Integer, j::Integer) = xs.data[j][i]
Base.getindex(xs::OneHotMatrix, ::Colon, i::Integer) = xs.data[i]
Base.getindex(xs::OneHotMatrix, ::Colon, i::AbstractArray) = OneHotMatrix(xs.height, xs.data[i])

A::AbstractMatrix * B::OneHotMatrix = A[:, map(x->x.ix, B.data)]
```


In [14]:
# Partition into batches of size 1,000
train = [(cat(float.(imgs[i])..., dims = 4), labels[:,i])
         for i in partition(1:60_000, 1000)]

train = gpu.(train);

In [15]:
# Prepare test set (first 1,000 images)
tX = cat(float.(MNIST.images(:test)[1:1000])..., dims = 4) |> gpu
tY = onehotbatch(MNIST.labels(:test)[1:1000], 0:9) |> gpu

10×1000 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 false  false  false   true  false  …  false  false   true  false  false
 false  false   true  false  false     false  false  false  false  false
 false   true  false  false  false      true   true  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false   true     false  false  false  false  false
 false  false  false  false  false  …  false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
  true  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false  false  false   true  false
 false  false  false  false  false     false  false  false  false   true

In [16]:
m = Chain(
  Conv((3,3), 1=>16, relu),
  x -> maxpool(x, (2,2)),
  Conv((3,3), 16=>10, relu),
  x -> maxpool(x, (2,2)),
  x -> reshape(x, :, size(x, 4)),
  Dense(250, 10), softmax) |> gpu
m

Chain(Conv((3, 3), 1=>16, NNlib.relu), getfield(Main, Symbol("##7#10"))(), Conv((3, 3), 16=>10, NNlib.relu), getfield(Main, Symbol("##8#11"))(), getfield(Main, Symbol("##9#12"))(), Dense(250, 10), NNlib.softmax)

In [17]:
m(train[1][1])

Tracked 10×1000 Array{Float64,2}:
 0.0999303  0.099913   0.0998538  …  0.0997857  0.0999946  0.0997655
 0.100447   0.100414   0.100206      0.100547   0.100413   0.100389 
 0.0997646  0.0998514  0.099899      0.0997027  0.0998744  0.0998905
 0.0997413  0.099776   0.0997121     0.0997578  0.0996898  0.0997585
 0.100273   0.100215   0.100175      0.100169   0.100281   0.10028  
 0.0999464  0.0999115  0.09995    …  0.0999775  0.0998559  0.0999404
 0.0999679  0.0999881  0.100085      0.0999584  0.100095   0.0999822
 0.0998387  0.0996606  0.100018      0.0997282  0.0997235  0.0997341
 0.100175   0.10019    0.100081      0.100211   0.100107   0.100319 
 0.0999167  0.100081   0.100021      0.100162   0.0999663  0.0999411

In [18]:
loss(x, y) = crossentropy(m(x), y)

accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))

evalcb = throttle(() -> @show(accuracy(tX, tY)), 10)
opt = AMSGrad(params(m))

#43 (generic function with 1 method)

In [19]:
Flux.@epochs 5 begin
    Flux.train!(loss, train, opt, cb = evalcb)
end


┌ Info: Epoch 1
└ @ Main /Users/jpf/.julia/packages/Flux/rcN9D/src/optimise/train.jl:93


accuracy(tX, tY) = 0.164
accuracy(tX, tY) = 0.435
accuracy(tX, tY) = 0.799
accuracy(tX, tY) = 0.851
accuracy(tX, tY) = 0.888


┌ Info: Epoch 2
└ @ Main /Users/jpf/.julia/packages/Flux/rcN9D/src/optimise/train.jl:93


accuracy(tX, tY) = 0.903
accuracy(tX, tY) = 0.914
accuracy(tX, tY) = 0.92
accuracy(tX, tY) = 0.934
accuracy(tX, tY) = 0.941


┌ Info: Epoch 3
└ @ Main /Users/jpf/.julia/packages/Flux/rcN9D/src/optimise/train.jl:93


accuracy(tX, tY) = 0.94
accuracy(tX, tY) = 0.946
accuracy(tX, tY) = 0.95
accuracy(tX, tY) = 0.952


┌ Info: Epoch 4
└ @ Main /Users/jpf/.julia/packages/Flux/rcN9D/src/optimise/train.jl:93


accuracy(tX, tY) = 0.955
accuracy(tX, tY) = 0.951
accuracy(tX, tY) = 0.955
accuracy(tX, tY) = 0.953
accuracy(tX, tY) = 0.958


┌ Info: Epoch 5
└ @ Main /Users/jpf/.julia/packages/Flux/rcN9D/src/optimise/train.jl:93


accuracy(tX, tY) = 0.959
accuracy(tX, tY) = 0.954
accuracy(tX, tY) = 0.956
accuracy(tX, tY) = 0.954
accuracy(tX, tY) = 0.956


## Go Deeper

1. Training time is much longer
2. Accuracy is better
3. Code is almost just as simple

### Julia is the best language for ML
1. Can write complex code in high level language
2. No need to have C++ library under the hood
3. Auto-diff / Backprop are being pioneered
4. The same code can run on CPU/GPU
5. Runs on TPUs

In [19]:
using Images

ArgumentError: ArgumentError: Package Images not found in current path:
- Run `Pkg.add("Images")` to install the Images package.
