![JuliaLogo](https://julialang.org/assets/infra/logo.svg)
# Welcome to the deep learning tutorial to Julia

Julia shines when it comes to letting packages interact seamlessly with one another. One such example is julia's deep learning capabilities.


**Question to you:** What are the key ingredients to do deep-learning?

## Flux.jl

In Julia there are two actively maintained deep-learning packages: [Knet.jl](https://github.com/denizyuret/Knet.jl) and [Flux.jl](https://github.com/FluxML/Flux.jl). Both are written in 100% julia and hence interact seamlessly with everything else in Julia.

Flux.jl is considered the easier package for starters as it is more similar to Keras, while Knet.jl stays more low-level in its API.
Even more important, Flux.jl is focussing on interoperability with other Julia packages, enabling NeuralDifferentialEquations and other advances in [scientific machine learning](https://sciml.ai/).

The most up-to-date blog-post I could find on comparing Knet.jl, Flux.jl and Tensorflow [is this one](https://estadistika.github.io//julia/python/packages/knet/flux/tensorflow/machine-learning/deep-learning/2019/06/20/Deep-Learning-Exploring-High-Level-APIs-of-Knet.jl-and-Flux.jl-in-comparison-to-Tensorflow-Keras.html). It is nice and simple comparison with real code, I can recommend it.

In [None]:
using Flux  # takes about a minute when run the first time

Let me start with citing Flux.ml

**Flux: The Julia Machine Learning Library**

Flux is a library for machine learning. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:

* **Doing the obvious thing.** Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.
* **You could have written Flux.** All of it, from LSTMs to GPU kernels, is straightforward Julia code. When in doubt, it’s well worth looking at the source. If you need something different, you can easily roll your own.
* **Play nicely with others.** Flux works well with Julia libraries from data frames and images to differential equation solvers, so you can easily build complex data processing pipelines that integrate Flux models.

## Flux.jl Basics
I want to go with you through the introductory example of the [Flux.jl documentation](https://fluxml.ai/Flux.jl/stable/models/basics/) which is very good and highly educational.

### Taking Gradients
Flux's core feature is taking gradients of Julia code. The gradient function takes another Julia function f and a set of arguments, and returns the gradient with respect to each argument.

In [None]:
f(x) = 3x^2 + 2x + 1;

In [None]:
# df/dx = 6x + 2
f'(2)  # after `using Flux` you have access to automatic differentiation of arbitrary functions (actually this is given by Zygote.jl which is a subpackage of the Flux eco system)

In [None]:
# d²f/dx² = 6
f''(2)

In [None]:
# d³f/dx³ = 0
f'''(2)  # takes quite long to compile, but works

This is fancy syntax for the underlying core function `gradient`

In [None]:
df(x) = gradient(f, x)[1]; # df/dx = 6x + 2
df(2)

In [None]:
d2f(x) = gradient(df, x)[1]; # d²f/dx² = 6
d2f(2)

You may ask how far does this go? Can everything be autodifferentiated? Actually almost everything, including arbitrary controlflows, recursions, loops, and even mutable datastructures. See https://fluxml.ai/Zygote.jl/latest/#Taking-Gradients-1 for details.

When a function has many parameters, we can get gradients of each one at the same time:

In [None]:
f(x, y) = sum((x .- y).^2);

@show gradient(f, 2, 2)
@show gradient(f, 1, 0)
gradient(f, [2, 1], [2, 0])

But machine learning models can have hundreds of parameters! To handle this, Flux lets you work with collections of parameters, via params. You can get the gradient of all parameters used in a program without explicitly passing them in.

In [None]:
x = [2, 1];
y = [2, 0];
gs = gradient(params(x, y)) do
    f(x, y)
end

@show gs[x]
@show gs[y];

Here, gradient takes a zero-argument function; no arguments are necessary because the params tell it what to differentiate.

This will come in really handy when dealing with big, complicated models. For now, though, let's start with something simple.

### Simple Models
Consider a simple linear regression, which tries to predict an output array y from an input x.

In [None]:
W = rand(2, 5)
b = rand(2)

predict(x) = W*x .+ b

function loss(x, y)
  ŷ = predict(x)
  sum((y .- ŷ).^2)
end

x, y = rand(5), rand(2) # Dummy data
loss(x, y) # ~ 3

To improve the prediction we can take the gradients of W and b with respect to the loss and perform gradient descent.

In [None]:
gs = gradient(() -> loss(x, y), params(W, b))

Now that we have gradients, we can pull them out and update W to train the model.

In [None]:
W̄ = gs[W]
W .-= 0.1 .* W̄
loss(x, y) # ~ 2.5

The loss has decreased a little, meaning that our prediction x is closer to the target y. If we have some data we can already try [training the model](https://fluxml.ai/Flux.jl/stable/training/training/).

All deep learning in Flux, however complex, is a simple generalisation of this example. Of course, models can look very different – they might have millions of parameters or complex control flow. Let's see how Flux handles more complex models.

### Building Layers
It's common to create more complex models than the linear regression above. For example, we might want to have two linear layers with a nonlinearity like sigmoid (σ) in between them. In the above style we could write this as:

In [None]:
using Flux

W1 = rand(3, 5)
b1 = rand(3)
layer1(x) = W1 * x .+ b1

W2 = rand(2, 3)
b2 = rand(2)
layer2(x) = W2 * x .+ b2

model(x) = layer2(σ.(layer1(x)))  # TODO run `?σ` to see what this is, try run `methods(σ)` to see where it comes from

model(rand(5)) # => 2-element vector

This works but is fairly unwieldy, with a lot of repetition – especially as we add more layers. One way to factor this out is to create a function that returns linear layers.

In [None]:
function linear(in, out)
  W = randn(out, in)
  b = randn(out)
  x -> W * x .+ b
end

linear1 = linear(5, 3) # we can access linear1.W etc
linear2 = linear(3, 2)

model(x) = linear2(σ.(linear1(x)))

model(rand(5)) # => 2-element vector

Another (equivalent) way is to create a struct that explicitly represents the affine layer.

In [None]:
struct Affine
  W
  b
end

Affine(in::Integer, out::Integer) =
  Affine(randn(out, in), randn(out))

# Overload call, so the object can be used as a function
(m::Affine)(x) = m.W * x .+ m.b

a = Affine(10, 5)

a(rand(10)) # => 5-element vector

Congratulations! You just built the Dense layer that comes with Flux. Flux has many interesting layers available, but they're all things you could have built yourself very easily.

(There is one small difference with Dense – for convenience it also takes an activation function, like ``Dense(10, 5, σ)``.)

### Stacking It Up
It's pretty common to write models that look something like:
```julia
layer1 = Dense(10, 5, σ)
# ...
model(x) = layer3(layer2(layer1(x)))
```
For long chains, it might be a bit more intuitive to have a list of layers, like this:

In [None]:
layers = [Dense(10, 5, σ), Dense(5, 2), softmax]
model(x) = foldl((x, m) -> m(x), layers, init = x)

model(rand(10)) # => 2-element vector

Handily, this is also provided for in Flux:

In [None]:
model2 = Chain(
  Dense(10, 5, σ),
  Dense(5, 2),
  softmax)

model2(rand(10)) # => 2-element vector

This quickly starts to look like a high-level deep learning library; yet you can see how it falls out of simple abstractions, and we lose none of the power of Julia code.

A nice property of this approach is that because "models" are just functions (possibly with trainable parameters), you can also see this as simple function composition.

In [None]:
m = Dense(5, 2) ∘ Dense(10, 5, σ)

m(rand(10))

Likewise, Chain will happily work with any Julia function.

In [None]:
m = Chain(x -> x^2, x -> x+1)

m(5) # => 26

### To be continued...
For further details you can continue at the official [Flux documentation](https://fluxml.ai/Flux.jl/stable/models/basics/#Layer-helpers-1).

Flux.jl also comes with a [model zoo](https://github.com/FluxML/model-zoo) where you can find a bunch of ready examples.

## Scientific Machine Learning

At the end I want to show you some cool new stuff you can do with Flux.jl which you cannot do easily with other deep learning frameworks.

### Differentiable Control Problems, Physical Systems, Chemistry, Biology ... include arbitrary domain knowledge

[This blog-post about differentiable control problems in Julia](https://fluxml.ai/2019/03/05/dp-vs-rl.html) really flashed me. It demonstrates that you can learn through arbitrary dynamical systems with Flux.jl.

One such dynamical system is a trebuchet:

![trebuchet-visualization](https://fluxml.ai/assets/2019-03-05-dp-vs-rl/trebuchet-basic.gif)

And you can straightforward incorporate it as part of your machine learning model:
![architecture-diagram-how-to-learn-through-dynamical-system](https://fluxml.ai/assets/2019-03-05-dp-vs-rl/trebuchet-flow.png)

The same logic applies to other fields like Physics where you have differential equations describing your system. Similar in Chemistry or Biology. Actually it is not bound to anything specific, any julia code will do, and hence **you can include really arbitrary domain specific knowledge into your machine learning model**.

### SciML

This idea is so far reaching that a whole new community is already born around it. It named itself scientific machine learning, short SciML https://sciml.ai/.

Here is a very good summary about their goal and their current reach: https://sciml.ai/2020/03/29/SciML.html.

# Thanks for participating ;-)

In case of any questions feel free to reach me at s.sahm@reply.de

If you are curious for more or want to do a Julia project, just tell me. I am always glad about new enthusiasts.

Believe me, it's the future of applied-math, including data-science.
![fans](https://images.unsplash.com/photo-1490078615078-3e40c20db36c?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=2125&q=80)
