# Interactive computing with Julia on Piz Daint

Julia is already over a decade old with the first stable 1.0.0 version released August 2018. Currently we are at version 1.6.0 and considering the release notes of this release, much work is going into improving the user experience, as if most major features are more or less complete.

![old_commit.png](old_commit.png)

The goal of these notebooks will be to show you how you can use Julia on Piz Daint, and hopefully leaves you with the impression that Julia can be considered as a language in which you can prototype quickly without sacrificing performance.

Note: I will not really go into distributed computing. For that I'd refer you to Samuel Omlin's work on https://github.com/eth-cscs/ImplicitGlobalGrid.jl which he presented at JuliaCon too: https://www.youtube.com/watch?v=vPsfZUqI4_0 the video has zero downvotes, so clearly it must be good!

What I will show is Julia's language features & CPU and GPU programming, all interactively in notebooks, so you can follow along!

## First steps

Some basic julia commands to get started

In [1]:
1 + 1

2

In [2]:
1 + 1 == 2

true

In [3]:
typeof(1)

Int64

In [4]:
0.1 + 0.2 == 0.3

false

In [5]:
0.1 + 0.2 ≈ 0.3

true

In [6]:
typeof(0.1)

Float64

## Using standard libraries

Julia comes with a few standard libraries, which can be made available in the current scope through `using`

In [7]:
using LinearAlgebra

In [8]:
A = Diagonal(1:10)

10×10 Diagonal{Int64, UnitRange{Int64}}:
 1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅   ⋅
 ⋅  2  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅   ⋅
 ⋅  ⋅  3  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅   ⋅
 ⋅  ⋅  ⋅  4  ⋅  ⋅  ⋅  ⋅  ⋅   ⋅
 ⋅  ⋅  ⋅  ⋅  5  ⋅  ⋅  ⋅  ⋅   ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  6  ⋅  ⋅  ⋅   ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  7  ⋅  ⋅   ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  8  ⋅   ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  9   ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  10

In [9]:
eigvals(A)

1:10

In [10]:
B = rand(10, 10) + 5I

10×10 Matrix{Float64}:
 5.17445    0.0565878  0.768612  0.49028    …  0.588398  0.227059   0.112751
 0.361142   5.90662    0.683257  0.711889      0.837828  0.919946   0.370709
 0.337357   0.91847    5.24899   0.0314777     0.238267  0.0578592  0.716263
 0.697695   0.320546   0.722656  5.71448       0.218815  0.659964   0.760069
 0.0451758  0.293261   0.557176  0.652809      0.582713  0.512084   0.176921
 0.488666   0.0282614  0.672091  0.0989394  …  0.403714  0.0956991  0.641458
 0.454971   0.936158   0.215611  0.776291      0.93223   0.0805882  0.819934
 0.783897   0.637117   0.174138  0.219187      5.13362   0.540986   0.314249
 0.411182   0.588639   0.440068  0.845902      0.881879  5.77491    0.970982
 0.453765   0.247235   0.981043  0.729701      0.621563  0.873831   5.62427

In [11]:
typeof(B)

Matrix{Float64} (alias for Array{Float64, 2})

In [12]:
eigvals(B)

10-element Vector{ComplexF64}:
  4.195679624249867 + 0.0im
  4.290202866284265 - 0.17358033694116112im
  4.290202866284265 + 0.17358033694116112im
  4.910378142016265 - 0.22643088322870217im
  4.910378142016265 + 0.22643088322870217im
  5.413238859626805 - 0.26049958210751645im
  5.413238859626805 + 0.26049958210751645im
  5.840002796174662 - 0.16716688735074997im
  5.840002796174662 + 0.16716688735074997im
 10.164862367168405 + 0.0im

In [13]:
# the expected value of rand(10, 10) + 5I is a matrix filled with .5 on the off-diagonals and 5.5 on the diagonals:
eigvals(fill(0.5, 10, 10) + 5I)

10-element Vector{Float64}:
  5.0
  5.0
  5.0
  5.0
  5.0
  5.0
  5.0
  5.0
  5.0
 10.0

In [14]:
using SparseArrays

In [15]:
A_sparse = sprand(1000, 1000, 1 / 1000) + 10I

1000×1000 SparseMatrixCSC{Float64, Int64} with 2005 stored entries:
⠑⢄⢠⢀⡀⠀⠀⠕⡐⢠⠂⡲⠠⠀⠀⢀⠄⠀⠀⡠⠀⠐⠑⠀⠀⠀⠐⠨⠀⠒⠨⠀⠀⠄⠈⢄⢒⠐⠠⠠
⠃⠠⠝⢄⠤⡂⠈⡀⢔⠈⠐⢀⠲⠌⡠⠀⠀⠁⠀⠀⡨⠠⠀⠀⠀⡈⠀⠀⠀⠀⠥⠌⠆⠐⠁⠔⢘⠔⠡⠀
⠀⠄⡀⡺⠙⢄⠁⢀⠀⡀⡄⠂⢀⠁⢀⠀⢀⠠⠀⠄⠄⠀⢀⠈⢔⢀⢀⣄⡀⢠⠄⠀⡒⡁⢀⠠⠀⠀⠂⠄
⠀⠀⠃⠂⠐⠀⠓⢬⡜⢄⠀⠂⠀⢠⡀⠏⠂⠠⢀⠐⠂⠠⠀⡀⠠⠀⡂⡈⠘⠀⠀⠠⠂⠀⠡⠀⠈⡀⠠⡀
⡠⠠⡄⣈⡀⢄⠀⡣⠑⢼⠀⠀⠀⠀⠀⠀⠄⠆⡄⠀⡊⠀⠤⠐⢀⠁⠀⠡⢈⠀⠥⢀⠀⠈⠄⣂⠊⠀⡄⠄
⡀⠉⠁⠂⡀⠁⡀⡀⢀⠀⢗⢬⠀⠀⠸⢁⠀⠁⠓⢀⣜⠐⠐⠀⠂⠁⠀⠑⠀⢍⠢⠀⡁⠐⠠⠀⠄⠀⠤⠀
⠀⠡⢨⠄⡈⠂⢈⠊⠂⠠⢉⠄⡑⢔⠀⠁⠀⠈⠀⡀⠁⢓⠜⢒⠀⢢⠐⠈⠀⡀⠐⠡⠀⣢⢠⠠⠀⡀⢔⠀
⠐⠉⠐⠠⢁⠀⠨⠀⠀⠄⡦⠒⠀⠀⡑⣜⠄⣄⠡⠀⡀⡋⠚⠀⠄⠂⠀⠄⠂⠂⡀⠀⢀⠄⠂⢆⣘⠂⠄⠀
⠀⠑⠀⡁⠉⠠⠀⡂⠀⠈⠂⢄⠪⢀⢀⠐⠳⢌⠐⠠⡰⠁⠀⠀⠂⠄⠐⠐⠀⠀⠂⢀⠩⠠⢊⢀⢀⠀⠀⠈
⠀⠀⠀⡀⠀⠗⢐⠁⠡⠀⠀⢈⠊⠀⠐⠄⢀⠈⢕⢌⠀⠀⠄⡄⡂⡃⠈⠡⠀⠈⠀⠀⢀⠂⡀⢐⠐⠚⠐⠩
⠄⠀⡈⠀⠐⠀⠀⠄⠄⠀⠒⠀⢀⠈⠂⡀⣠⣠⡀⠍⡑⢄⠘⠀⠀⢑⠄⡀⠈⠁⠑⡀⢂⡈⠄⠀⡠⡀⠬⠀
⠁⢀⠂⠀⠐⠂⠀⠤⣈⢈⠀⠀⢐⡀⡰⢠⠀⡄⠀⢀⠀⠀⠕⢄⠑⢜⢂⠊⠢⠌⡀⠀⠡⠐⢈⢐⠀⠂⠠⢒
⠠⠂⠈⠠⠀⡀⠂⡁⠠⠀⠊⢀⠈⠅⠈⠀⠀⠠⠂⠤⠄⠁⢀⠠⠑⢄⠄⠀⢂⠐⡁⠄⠀⠖⠀⠂⡦⠈⢕⢀
⠠⠆⡠⠀⣡⠀⢀⠀⠐⠁⡐⠁⠀⠰⠠⠈⠀⠱⢃⢄⢑⢁⠠⠀⢠⠀⠑⢄⠀⠀⠀⢆⠨⢂⠀⠆⠂⡀⠰⠀
⠀⠤⠄⢐⢄⡢⠈⠀⠄⠐⠀⢄⢠⠀⠀⠂⠱⠀⠔⠀⠄⡂⠀⠈⡀⠀⠂⠉⠑⢔⠀⠁⣄⡈⠀⠁⠰⠈⠤⠄
⠈⠀⠀⠀⠒⠤⠀⠢⢠⠀⠀⠂⠀⠨⠐⡐⠄⠀⡊⠨⠀⠸⠈⠐⠂⠠⠀⠀⠈⠀⠑⢜⡀⠀⠀⡀⠀⠌⡤⠁
⠁⠠⠀⠢⢀⡀⠄⠰⠤⠀⠀⡐⠘⡀⠔⢂⠀⢈⠂⡀⠀⠰⠐⠁⠠⠈⠀⠘⠁⠄⠁⠀⠑⢆⠀⢰⠀⠈⠈⡒
⠀⠀⠠⠄⠰⠀⠂⣀⠂⠰⠂⡀⠀⠤⠐⠀⠀⠁⡀⠀⠲⡈⠐⠂⠀⠀⠀⢀⡀⠀⠀⠄⠀⠠⠵⢴⠀⢄⠁⠀
⠈⠀⢀⠠⠈⠑⠁⠀⠂⠀⠄⠰⢀⡐⠂⢠⠀⠐⡡⠠⠓⢈⢐⠁⠅⡎⠠⠉⡀⠖⠄⠐⠩⠀⠢⢀⠕⢤⠑⠀
⡀⠀⠐⠀⡀⠣⢠⠀⠈⠑⠂⠈⠈⠔⠀⢐⠀⢒⠒⠌⠄⡂⢀⢀⠠⠂⠀⠈⠁⠄⠁⠁⠰⠄⠀⠀⠰⠆⠕⢌

In [16]:
A_sparse * rand(1000)

1000-element Vector{Float64}:
 2.9385247542078083
 0.48594363720573464
 6.138050062254244
 2.4083624086583955
 1.4876286998296215
 6.608012725204055
 6.86949316394347
 4.1365746304881545
 9.766379992589348
 4.917861260630062
 3.371593275201896
 7.08714706607816
 2.8974144436299327
 ⋮
 6.060395090991673
 6.191962478117563
 4.870910160375369
 7.657550441077634
 8.536259803581736
 0.5546443853595124
 7.121487986038009
 6.4077117529451355
 4.8124286738176165
 4.539160937266617
 4.680368281983247
 6.125577997748812

But not everything is implemented in Julia's base libraries! You might need external packages too

In [17]:
eigvals(A_sparse)

LoadError: MethodError: no method matching eigvals!(::SparseMatrixCSC{Float64, Int64})
[0mClosest candidates are:
[0m  eigvals!([91m::SymTridiagonal{var"#s832", V} where {var"#s832"<:Union{Float32, Float64}, V<:AbstractVector{var"#s832"}}[39m) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/tridiag.jl:293
[0m  eigvals!([91m::SymTridiagonal{var"#s832", V} where {var"#s832"<:Union{Float32, Float64}, V<:AbstractVector{var"#s832"}}[39m, [91m::UnitRange[39m) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/tridiag.jl:296
[0m  eigvals!([91m::SymTridiagonal{var"#s832", V} where {var"#s832"<:Union{Float32, Float64}, V<:AbstractVector{var"#s832"}}[39m, [91m::Real[39m, [91m::Real[39m) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/tridiag.jl:301
[0m  ...

# Installing more packages

Julia has a builtin package manager called `Pkg` which has the unique feature that it works almost surely

In [18]:
using Pkg

In [19]:
Pkg.add("ArnoldiMethod") # written by Lauri Nymann (Aalto university master student then) and me (Harmen Stoppels)

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/1.6.0/daint-gpu/environments/1.6.0-daint-gpu/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/1.6.0/daint-gpu/environments/1.6.0-daint-gpu/Manifest.toml`


In [20]:
using ArnoldiMethod: partialschur, partialeigen

In [21]:
# Compute the schur decomposition for the eigenspace with the 5 eigenvalues of largest magnitude; this is eigs from Matlab but everything
# is written from scratch in Julia (except BLAS routines)
schur, history = partialschur(A_sparse, nev=5);
history

[32mConverged[39m: 5 of 5 eigenvalues in 64 matrix-vector products

In [22]:
# Check if AQ = QR and Q'Q = I:
norm(A_sparse * schur.Q - schur.Q * schur.R), norm(schur.Q' * schur.Q - I)

(1.2619053693804693e-5, 6.1337236670035536e-15)

But enough with the Julia-in-Jupyter-as-a-Matlab-alternative examples! Let's take a look at some fun packages.

Another way to use the package manager (which is a REPL / Jupyter specific syntax) is `]`

In [23]:
] add UnicodePlots Distributions

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/1.6.0/daint-gpu/environments/1.6.0-daint-gpu/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/1.6.0/daint-gpu/environments/1.6.0-daint-gpu/Manifest.toml`


In [24]:
using UnicodePlots, Distributions

In [25]:
barplot(["x", "y", "z"], [10.0, 1.5, 3.0], title="My first barplot")

[1m                 My first barplot[22m
[90m     ┌                                        ┐[39m 
   [0mx[90m ┤[39m[32m■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■[39m[0m 10.0 [90m [39m 
   [0my[90m ┤[39m[32m■■■■■[39m[0m 1.5                               [90m [39m 
   [0mz[90m ┤[39m[32m■■■■■■■■■■[39m[0m 3.0                          [90m [39m 
[90m     └                                        ┘[39m 

In [26]:
heatmap(exp.(-range(0, 1, length=20) * range(0, 1, length=20)' ./ 100), colormap=:inferno)

[90m      ┌────────────────────┐[39m 
   [90m20[39m[90m │[39m[38;5;229;48;5;229m▄[38;5;227;48;5;227m▄[38;5;221;48;5;221m▄[38;5;214;48;5;214m▄[38;5;214;48;5;214m▄[38;5;208;48;5;208m▄[38;5;208;48;5;202m▄[38;5;166;48;5;167m▄[38;5;167;48;5;167m▄[38;5;167;48;5;167m▄[38;5;131;48;5;131m▄[38;5;125;48;5;89m▄[38;5;89;48;5;89m▄[38;5;89;48;5;53m▄[38;5;53;48;5;53m▄[38;5;53;48;5;53m▄[38;5;53;48;5;17m▄[38;5;17;48;5;17m▄[38;5;17;48;5;16m▄[38;5;16;48;5;16m▄[0m[90m│[39m  [90m┌[39m[90m─[39m[90m─[39m[90m┐[39m [90m1.0[39m
     [90m │[39m[38;5;229;48;5;229m▄[38;5;228;48;5;228m▄[38;5;221;48;5;221m▄[38;5;220;48;5;214m▄[38;5;214;48;5;214m▄[38;5;208;48;5;208m▄[38;5;208;48;5;208m▄[38;5;202;48;5;202m▄[38;5;166;48;5;167m▄[38;5;167;48;5;167m▄[38;5;167;48;5;167m▄[38;5;131;48;5;131m▄[38;5;131;48;5;125m▄[38;5;125;48;5;89m▄[38;5;89;48;5;89m▄[38;5;89;48;5;53m▄[38;5;53;48;5;53m▄[38;5;53;48;5;53m▄[38;5;53;48;5;17m▄[38;5;17;48;5;17m▄[0m[90m│[39m  [90m│[39m[

In [27]:
histogram(rand(Normal(10.0, 1.0), 1000), nbins=20)

[90m                ┌                                        ┐[39m 
   [0m[90m[[0m 5.5[90m, [0m 6.0[90m)[0m[90m ┤[39m[0m 1                                      [90m [39m 
   [0m[90m[[0m 6.0[90m, [0m 6.5[90m)[0m[90m ┤[39m[0m 1                                      [90m [39m 
   [0m[90m[[0m 6.5[90m, [0m 7.0[90m)[0m[90m ┤[39m[0m 2                                      [90m [39m 
   [0m[90m[[0m 7.0[90m, [0m 7.5[90m)[0m[90m ┤[39m[32m▇[39m[0m 4                                     [90m [39m 
   [0m[90m[[0m 7.5[90m, [0m 8.0[90m)[0m[90m ┤[39m[32m▇▇▇▇[39m[0m 20                                 [90m [39m 
   [0m[90m[[0m 8.0[90m, [0m 8.5[90m)[0m[90m ┤[39m[32m▇▇▇▇▇▇▇▇▇▇▇[39m[0m 57                          [90m [39m 
   [0m[90m[[0m 8.5[90m, [0m 9.0[90m)[0m[90m ┤[39m[32m▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇[39m[0m 86                     [90m [39m 
   [0m[90m[[0m 9.0[90m, [0m 9.5[90m)[0m[90m ┤[39m[32m▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇

These packages are installed in your home directory by default. In some cases you might want to move them to `$SCRATCH` if they get really big! You can set `JULIA_DEPOT_PATH=...` in for instance `~/.jupyterhub.env`

Whenever you install a package, they will be part of an environment, and environments are a combination of Project.toml (human readable) & Manifest.toml (not so reabable). For the notebooks I've provided both the Project.toml & Manifest.toml files, so you can install the exact same packages as a first step. It works like this:

In [28]:
] activate .

[32m[1m  Activating[22m[39m environment at `~/tutorial_julia_interactive/01_introduction/Project.toml`


In [29]:
] instantiate

## Where to find packages

There are many more registered and curated Julia packages avaiable in the official registry. An overview of some of the popular ones can be found here: https://juliahub.com/ui/Packages

# Julia language features

## Free functions

Everything function is a free function, functions are not "owned" by an object; you won't see `matrix.multiply(other_matrix)` in julia, only `multiply(matrix, other_matrix)`.

In [30]:
+

+ (generic function with 288 methods)

In [31]:
sqrt

sqrt (generic function with 20 methods)

In [32]:
length(methods(+))

288

In [33]:
methods(+).ms[1:10]

Bjarne Strostroup reflecting on this feature in his "How can you be so certain?" paper:

> Unified function call: The notational distinction between `x.f(y)` and `f(x,y)` comes from the flawed OO notion that there always is a single most important object for an operation. I made a mistake adopting that. It was a shallow understanding at the time (but extremely fashionable). Even then, I pointed to `sqrt(2)` and `x+y` as examples of problems caused by that view. With generic programming,  the `x.f(y)` vs. `f(x,y)` distinction becomes a library design and usage issue (an inflexibility). [...] Again, the issues and solutions go back decades. Allowing virtual arguments for `f(x,y,z)` gives us multimethods.

## Multiple dispatch, multimethods, open functions

Often when people are asked what they like most about Julia the answer is 'multiple dispatch', in my opinion it's not this one thing, but rather multiple features that ultimately enable generic programming.

In [34]:
abstract type Animal end

struct Dog <: Animal
    name::String
end

struct Cat <: Animal
    name::String
end

meets(x::Dog, y::Cat) = "$(x.name) barks at $(y.name)"
meets(x::Dog, y::Dog) = "$(x.name) sniffs $(y.name)"

meets (generic function with 2 methods)

In [35]:
meets(Dog("Bo"), Cat("Lilly"))

"Bo barks at Lilly"

In [36]:
meets(Dog("Bo"), Dog("Rex"))

"Bo sniffs Rex"

In [37]:
meets(Cat("Lilly"), Cat("Lucy")) # we don't need to implement meets(::Cat, ::Cat) in a world where cats avoid eachother anyways

LoadError: MethodError: no method matching meets(::Cat, ::Cat)
[0mClosest candidates are:
[0m  meets([91m::Dog[39m, ::Cat) at In[34]:11

In [38]:
meets

meets (generic function with 2 methods)

You can go further and actually do dispatch on function name + 3 argument types

In [39]:
group_interaction(::Dog, ::Dog, ::Cat) = "chaos"
group_interaction(::Cat, ::Dog, ::Cat) = "even more chaos"

group_interaction (generic function with 2 methods)

And multiple dispatch continues to work like you would expect even when type inference fails (in C++ for 2 args you could use the visitor pattern, how about $n$ args?)

In [40]:
vector_of_animals = [Dog("Bo"), Dog("Rex"), Cat("Lilly")]

3-element Vector{Animal}:
 Dog("Bo")
 Dog("Rex")
 Cat("Lilly")

In [41]:
group_interaction(vector_of_animals[1], vector_of_animals[2], vector_of_animals[3])

"chaos"

In [42]:
@code_warntype first(vector_of_animals)

Variables
  #self#[36m::Core.Const(first)[39m
  a[36m::Vector{Animal}[39m

Body[91m[1m::Animal[22m[39m
[90m1 ─[39m %1 = Base.eachindex(a)[36m::Base.OneTo{Int64}[39m
[90m│  [39m %2 = Base.first(%1)[36m::Core.Const(1)[39m
[90m│  [39m %3 = Base.getindex(a, %2)[91m[1m::Animal[22m[39m
[90m└──[39m      return %3


## Late, but native compilation, specialized on all function argument types

Generally though, types can be inferred, and then julia is no different from a static language -- it's just that compilation is deferred to a very late moment!

In [43]:
f(x, y) = 2 * x * y

f (generic function with 1 method)

In [44]:
@code_typed debuginfo=:none f(1.0, 2.0)

CodeInfo(
[90m1 ─[39m %1 = Base.sitofp(Float64, 2)[36m::Float64[39m
[90m│  [39m %2 = Base.mul_float(%1, x)[36m::Float64[39m
[90m│  [39m %3 = Base.mul_float(%2, y)[36m::Float64[39m
[90m└──[39m      return %3
) => Float64

In [45]:
@code_typed debuginfo=:none f(1, 2)

CodeInfo(
[90m1 ─[39m %1 = Base.mul_int(2, x)[36m::Int64[39m
[90m│  [39m %2 = Base.mul_int(%1, y)[36m::Int64[39m
[90m└──[39m      return %2
) => Int64

In [46]:
@code_native debuginfo=:none f(1.0, 2.0) # uses floating point registers, 2 * ... multiplication simplified to addition by llvm

	[0m.text
	[96m[1mvaddsd[22m[39m	[0m%xmm0[0m, [0m%xmm0[0m, [0m%xmm0
	[96m[1mvmulsd[22m[39m	[0m%xmm1[0m, [0m%xmm0[0m, [0m%xmm0
	[96m[1mretq[22m[39m
	[96m[1mnopl[22m[39m	[33m([39m[0m%rax[33m)[39m


In [47]:
@code_native debuginfo=:none f(1, 2)

	[0m.text
	[96m[1mimulq[22m[39m	[0m%rsi[0m, [0m%rdi
	[96m[1mleaq[22m[39m	[33m([39m[0m%rdi[0m,[0m%rdi[33m)[39m[0m, [0m%rax
	[96m[1mretq[22m[39m
	[96m[1mnopl[22m[39m	[33m([39m[0m%rax[33m)[39m


In [48]:
f # even though we have only 1 method, codegen is specialized on input arguments

f (generic function with 1 method)

Note! Contrary to popular believe, it's very much unnecessary to add type annotation to function arguments -- the Julia compiler (almost always) specializes on the types, and the only reason to use type annotations like `f(x::Real, y::Int) = ...` is to restrict types. Generally though, generic programming is easier if you do not add types.

## Example: forward autodiff using dual numbers

They are expressions of the form $a + b \varepsilon$, where $a$ and $b$ are real numbers, and $\varepsilon$ is a symbol taken to satisfy $\varepsilon^2 = 0$. 

They are rather useful to obtain derivatives of functions; for an analytic real function $f$ if you would extend it to dual numbers and formally expand $f(a+\varepsilon)$ around $f(a)$:

$$f(a+\varepsilon)=\sum _{n=0}^{\infty }{\frac {f^{(n)}(a)\varepsilon ^{n}}{n!}}=f(a)+f'(a)\varepsilon$$

You can see that it maps to another dual number where the $\varepsilon$ component is the derivative of $f$ in $a$.

Implementing this in Julia is rather straight-forward, and you can stay close to the maths:

In [49]:
struct Dual{T<:Real} <: Number
    x::T
    ε::T
end

In [50]:
Dual(1.0, 1.0)

Dual{Float64}(1.0, 1.0)

In [51]:
import Base: +, /, -, *, promote_rule, show, convert

To see why $/$ is implemented like that, multiply $(a+b\varepsilon)/(c+d\varepsilon)$ with $1 = (c-d\varepsilon)/(c-d\varepsilon)$

In [52]:
+(x::Dual, y::Dual) = Dual(x.x + y.x, x.ε + y.ε)

# Or if you feel more fancy
x::Dual / y::Dual = Dual(x.x / y.x, (y.x * x.ε - x.x * y.ε) / y.ε^2)
x::Dual * y::Dual = Dual(x.x * y.x, x.ε * y.x + x.x * y.ε)
x::Dual - y::Dual = Dual(x.x - y.x, x.ε - y.ε)
-(x::Dual) = Dual(-x.x, -x.ε)

- (generic function with 257 methods)

In [53]:
# conversion of values
convert(::Type{Dual{T}}, x::Real) where {T} = Dual{T}(x, zero(x))

# promotion of types: 2 * (3 + 4ε) -- only used for arithmetic
promote_rule(::Type{Dual{T}}, ::Type{<:Number}) where {T} = Dual{T}

promote_rule (generic function with 133 methods)

In [54]:
# add some constructors
Dual(x) = Dual(x, one(x))

Dual

In [55]:
# pretty print :rainbow:
show(io::IO, x::Dual) = print(io, x.x, " + ", x.ε, "ε")

show (generic function with 316 methods)

In [56]:
Dual(1.0)

1.0 + 1.0ε

In [57]:
p(x) = 2 * x * (3 * x - 1)

p (generic function with 1 method)

In [58]:
p′(x) = 12 * x - 2 # true derivative

p′ (generic function with 1 method)

In [59]:
p(Dual(4.0))

88.0 + 46.0ε

In [60]:
p′(4.0)

46.0

In [61]:
# And for simplicity, let's define a function that lifts x to Dual(x) before applying f:
∂(f, x) = f(Dual(x))

∂ (generic function with 1 method)

In [62]:
# Compute derivatives for a range of values x = 1.0, 2.0, ..., 10.0
# the . is for applying the function element-wise -- they are fused
∂.(p, 1.0:10.0)

10-element Vector{Dual{Float64}}:
    4.0 + 10.0ε
   20.0 + 22.0ε
   48.0 + 34.0ε
   88.0 + 46.0ε
  140.0 + 58.0ε
  204.0 + 70.0ε
  280.0 + 82.0ε
  368.0 + 94.0ε
 468.0 + 106.0ε
 580.0 + 118.0ε

### Let's add a test

In [63]:
using Test

In [64]:
xs = range(-10.0, 10.0, length=100)
@test all(map(x -> ∂(p, x).ε ≈ p′(x), xs))

[32m[1mTest Passed[22m[39m

### Mixing our dual number type with a dual-number-unaware packages

Suppose we have a polynomial of which we have to compute its roots

In [65]:
poly(x) = (x - 1.0) * (x - 2.0) * (x - 3.0) * (x - 4.0) + 0.5

poly (generic function with 1 method)

In [66]:
using UnicodePlots; lineplot(xs, poly.(xs), xlim=(0, 6), ylim=(-1, 1))

[90m      ┌────────────────────────────────────────┐[39m 
    [90m1[39m[90m │[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⡇[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⡇[39m[0m⠀[32m⠘[39m[32m⡆[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⢰[39m[32m⠁[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[90m│[39m 
     [90m │[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⢇[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⡜[39m[0m⠀[0m⠀[0m⠀[32m⠸[39m[32m⡀[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⢸[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[90m│[39m 
     [90m │[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⢸[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⢠[39m[32m⠃[39m[0m⠀[0m⠀[0m⠀[0m⠀[32m⢣[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⢸[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[90m│[39m 
     [90m │[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⢸[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⡜[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[32m⠸[39m[32m⡀[3

In [67]:
"""
Newton's method for a scalar function f, initial guess x₀, and a fixed number of iterations `maxiter`
"""
function newton(f, x₀, maxiter = 15)
    x = x₀
    for i = 1:maxiter
        fₓ = ∂(f, x)
        x -= fₓ.x / fₓ.ε
    end
    return x
end

newton

In [68]:
roots = map(x -> newton(poly, x), (1.0, 2.0, 3.0, 4.0))

(1.1010336740340934, 1.7631871208960497, 3.23681287910395, 3.898966325965907)

In [69]:
map(poly, roots)

(-1.1102230246251565e-16, 0.0, 3.885780586188048e-16, 6.106226635438361e-16)

## How about more precision?

The MultiFloats.jl was released somewhere around Christmas 2020 (by David K. Zhang). It's a Julia package that generates Julia code for arithmetic of numbers represented as an unevaluated sum $x_1 + x_2 + ... + x_N$ where $x_i$ are IEEE floating point numbers of the same type, and $x_{i+1}$ is in the round off of $x_i$.

In [70]:
] add MultiFloats

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/tutorial_julia_interactive/01_introduction/Project.toml`
[32m[1m  No Changes[22m[39m to `~/tutorial_julia_interactive/01_introduction/Manifest.toml`


In [71]:
using MultiFloats

In [72]:
sqrt(2.0), sqrt(Float64x2(2.0)), sqrt(Float64x4(2.0))

(1.4142135623730951, 1.414213562373095048801688724209682, 1.414213562373095048801688724209698078569671875376948073176679738104)

In [74]:
x = sqrt(Float64x4(2.0))
dump(x)

MultiFloat{Float64, 4}
  _limbs: NTuple{4, Float64}
    1: Float64 1.4142135623730951
    2: Float64 -9.667293313452913e-17
    3: Float64 4.1386753086994136e-33
    4: Float64 4.935546991468362e-50


In [75]:
# The basic rule here is that x._limbs[i+1] is in the roundoff of x._limbs[i]
eps(x._limbs[1]), x._limbs[2]

(2.220446049250313e-16, -9.667293313452913e-17)

In [76]:
roots_4x = map(x -> newton(poly, Float64x4(x)), (1.0, 2.0, 3.0, 4.0))

(1.10103367403409329796845946056800123532647743613376112006906367685, 1.763187120896049703422495432699541749624341579312803486771580059661, 3.23681287910395029657750456730045825037565842068719651322841994032, 3.898966325965906702031540539431998764673522563866238879930936323147)

In [77]:
map(poly, roots_4x)

(-4.7477838728798994e-66, -3.7982270983039195e-65, 7.596454196607839e-65, -3.7982270983039195e-65)

### Analyzing the generated code & nudging the compiler to inline things

Using the `@inline` macro we can mark functions s.t. they are highly likely to inline

In [78]:
@inline function newton(f, x₀, maxiter = 15)
    x = x₀
    for i = 1:maxiter
        fₓ = ∂(f, x)
        x -= fₓ.x / fₓ.ε
    end
    return x
end

@inline function poly(x) 
    (x - 1.0) * (x - 2.0) * (x - 3.0) * (x - 4.0) + 0.5
end

@inline function ∂(f, x)
    f(Dual(x))
end

@inline function +(x::Dual, y::Dual)
    Dual(x.x + y.x, x.ε + y.ε)
end

@inline function /(x::Dual, y::Dual)
    Dual(x.x / y.x, (y.x * x.ε - x.x * y.ε) / y.ε^2)
end

@inline function *(x::Dual, y::Dual)
    Dual(x.x * y.x, x.ε * y.x + x.x * y.ε)
end

@inline function -(x::Dual, y::Dual)
    Dual(x.x - y.x, x.ε - y.ε)
end

@inline function -(x::Dual)
     Dual(-x.x, -x.ε)
end

- (generic function with 263 methods)

In [79]:
# Note that the number of iterations are now compile-time constants
# so get_roots can have better codegen than just newton(...) with a "dynamic" number of iterations
get_roots(x, iter::Val{N}) where {N} = newton(poly, x, N)

get_roots (generic function with 1 method)

In [80]:
# First off: get_roots using Float64 - the loop is entirely unrolled and it looks pretty
@code_native debuginfo=:none get_roots(1.0, Val(4))

	[0m.text
	[96m[1mmovabsq[22m[39m	[93m$.rodata.cst8[39m[0m, [0m%rax
	[96m[1mvmovsd[22m[39m	[33m([39m[0m%rax[33m)[39m[0m, [0m%xmm8                   [90m# xmm8 = mem[0],zero[39m
	[96m[1mvaddsd[22m[39m	[0m%xmm0[0m, [0m%xmm8[0m, [0m%xmm5
	[96m[1mmovabsq[22m[39m	[33m$46913579698736[39m[0m, [0m%rax           [90m# imm = 0x2AAAEB40CA30[39m
	[96m[1mvmovsd[22m[39m	[33m([39m[0m%rax[33m)[39m[0m, [0m%xmm9                   [90m# xmm9 = mem[0],zero[39m
	[96m[1mvaddsd[22m[39m	[0m%xmm0[0m, [0m%xmm9[0m, [0m%xmm6
	[96m[1mmovabsq[22m[39m	[33m$46913579698744[39m[0m, [0m%rax           [90m# imm = 0x2AAAEB40CA38[39m
	[96m[1mvmovsd[22m[39m	[33m([39m[0m%rax[33m)[39m[0m, [0m%xmm11                  [90m# xmm11 = mem[0],zero[39m
	[96m[1mvaddsd[22m[39m	[0m%xmm0[0m, [0m%xmm11[0m, [0m%xmm7
	[96m[1mmovabsq[22m[39m	[33m$46913579698752[39m[0m, [0m%rax           [90m# imm = 0x2AAAEB40CA40[39m
	[96m[1mvmovsd[22m

In [81]:
@code_native debuginfo=:none get_roots(Float64x2(1.0), Val(4))

	[0m.text
	[96m[1mmovq[22m[39m	[0m%rdi[0m, [0m%rax
	[96m[1mvmovsd[22m[39m	[33m([39m[0m%rsi[33m)[39m[0m, [0m%xmm9                   [90m# xmm9 = mem[0],zero[39m
	[96m[1mvmovsd[22m[39m	[33m8[39m[33m([39m[0m%rsi[33m)[39m[0m, [0m%xmm8                  [90m# xmm8 = mem[0],zero[39m
	[96m[1mmovl[22m[39m	[33m$4[39m[0m, [0m%ecx
	[96m[1mmovabsq[22m[39m	[93m$.rodata.cst8[39m[0m, [0m%rdx
	[96m[1mvmovsd[22m[39m	[33m([39m[0m%rdx[33m)[39m[0m, [0m%xmm0                   [90m# xmm0 = mem[0],zero[39m
	[96m[1mvmovsd[22m[39m	[0m%xmm0[0m, [33m-64[39m[33m([39m[0m%rsp[33m)[39m
	[96m[1mmovabsq[22m[39m	[33m$46913579698880[39m[0m, [0m%rdx           [90m# imm = 0x2AAAEB40CAC0[39m
	[96m[1mvmovsd[22m[39m	[33m([39m[0m%rdx[33m)[39m[0m, [0m%xmm0                   [90m# xmm0 = mem[0],zero[39m
	[96m[1mvmovsd[22m[39m	[0m%xmm0[0m, [33m-72[39m[33m([39m[0m%rsp[33m)[39m
	[96m[1mmovabsq[22m[39m	[33m$46913579

## Vectorizing the autodiff root finder

In [82]:
] add VectorizationBase

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/tutorial_julia_interactive/01_introduction/Project.toml`
[32m[1m  No Changes[22m[39m to `~/tutorial_julia_interactive/01_introduction/Manifest.toml`


In [83]:
using VectorizationBase

In [84]:
# Remember when we did map(x -> newton(poly, Float64x4(x)), (1.0, 2.0, 3.0, 4.0))
# Turns out we can store 1.0, 2.0, 3.0 and 4.0 in a single register
v = Vec(1.0, 2.0, 3.0, 4.0)

Vec{4, Float64}<1.0, 2.0, 3.0, 4.0>

In [85]:
roots_vec = newton(poly, v)

Vec{4, Float64}<1.1010336740340934, 1.7631871208960497, 3.23681287910395, 3.898966325965907>

In [86]:
poly(roots_vec)

Vec{4, Float64}<-1.586116450056502e-16, -4.3599143977964324e-17, 4.1563002304956903e-16, 6.113681357543031e-16>

The VectorizationBase package uses a very much undervalued yet incredibly powerful feature: generated functions. Without going into details: the function body of the Vec constructor is only generated when you call it -- which allows the package to use certain constants (e.g. simd width...) as compile time values

In [87]:
# if you pass it more values than fit in your register, it prints them here as if it were multiple hardware registers
w = Vec(1.0, 2.0, 3.0, 4.0, 1.2, 2.2, 3.2, 4.2)

2 x Vec{4, Float64}
Vec{4, Float64}<1.0, 2.0, 3.0, 4.0>
Vec{4, Float64}<1.2, 2.2, 3.2, 4.2>

In [88]:
poly(newton(poly, w))

2 x Vec{4, Float64}
Vec{4, Float64}<-1.586116450056502e-16, -4.3599143977964324e-17, 4.1563002304956903e-16, 6.113681357543031e-16>
Vec{4, Float64}<-1.586116450056502e-16, -4.3599143977964324e-17, 4.1563002304956903e-16, 6.113681357543031e-16>

In [89]:
v = Vec(1.0, 2.0, 3.0, 4.0)

# finally: dual-number-autodiff'ed, unrolled, vectorized, 😌
@code_native debuginfo=:none get_roots(v, Val(4))

	[0m.text
	[96m[1mvmovupd[22m[39m	[33m([39m[0m%rsi[33m)[39m[0m, [0m%ymm5
	[96m[1mmovabsq[22m[39m	[93m$.rodata.cst8[39m[0m, [0m%rax
	[96m[1mvbroadcastsd[22m[39m	[33m([39m[0m%rax[33m)[39m[0m, [0m%ymm0
	[96m[1mvaddpd[22m[39m	[0m%ymm0[0m, [0m%ymm5[0m, [0m%ymm4
	[96m[1mmovabsq[22m[39m	[33m$46913579717624[39m[0m, [0m%rax           [90m# imm = 0x2AAAEB4113F8[39m
	[96m[1mvbroadcastsd[22m[39m	[33m([39m[0m%rax[33m)[39m[0m, [0m%ymm1
	[96m[1mvaddpd[22m[39m	[0m%ymm1[0m, [0m%ymm5[0m, [0m%ymm6
	[96m[1mmovabsq[22m[39m	[33m$46913579717632[39m[0m, [0m%rax           [90m# imm = 0x2AAAEB411400[39m
	[96m[1mvbroadcastsd[22m[39m	[33m([39m[0m%rax[33m)[39m[0m, [0m%ymm2
	[96m[1mvaddpd[22m[39m	[0m%ymm2[0m, [0m%ymm5[0m, [0m%ymm7
	[96m[1mmovabsq[22m[39m	[33m$46913579717640[39m[0m, [0m%rax           [90m# imm = 0x2AAAEB411408[39m
	[96m[1mvbroadcastsd[22m[39m	[33m([39m[0m%rax[33m)[39m[0m, [0m%ymm3

## To wrap this up: microbenchmarking

Apart from staring at assembly code, how about we actually run this

In [90]:
] add BenchmarkTools

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/tutorial_julia_interactive/01_introduction/Project.toml`
[32m[1m  No Changes[22m[39m to `~/tutorial_julia_interactive/01_introduction/Manifest.toml`


In [91]:
using BenchmarkTools

In [92]:
# Non-vectorized double precision
@benchmark get_roots(start, iterations) setup=(start=2.0; iterations=Val(1000))

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     11.325 μs (0.00% GC)
  median time:      11.333 μs (0.00% GC)
  mean time:        11.357 μs (0.00% GC)
  maximum time:     29.736 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

In [93]:
# Double-double precision version
@benchmark get_roots(start, iterations) setup=(start=Float64x2(2.0); iterations=Val(1000))

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     128.537 μs (0.00% GC)
  median time:      128.564 μs (0.00% GC)
  mean time:        129.209 μs (0.00% GC)
  maximum time:     253.002 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

In [94]:
# Vectorized version
@benchmark get_roots(start, iterations) setup=(start=Vec(1.0, 2.0, 3.0, 4.0); iterations=Val(1000))

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     18.305 μs (0.00% GC)
  median time:      18.339 μs (0.00% GC)
  mean time:        18.891 μs (0.00% GC)
  maximum time:     43.056 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Julia could be used to implement trig & special functions. In fact many trig functions in Julia are already implemented in Julia.

In [95]:
@which sin(1.0) # for instance this guy