In [None]:
# Check multithreading config:
Base.Threads.nthreads()

In [None]:
# Check active package versions:
using Pkg; pkg"status"

<h1 style="text-align: center;">
    <span style="display: block; text-align: center;">
        Introduction to
    </span>
    <span style="display: block; text-align: center;">
        <img alt="Julia" src="images/logos/julia-logo.svg" style="height: 2em; display: inline-block; margin: 1em;"/>
    </span>
    <span style="display: block; text-align: center;">
        Part 1
    </span>
</h1>

<div style="text-align: center;">
    <div style="text-align: center; display: inline-block; vertical-align: middle;">
        Ludger Pähler <br/>
        <small>
        TUM, Chair of Aerodynamics and Fluid mechanics <br/>
            <a href="mailto:ludger.paehler@tum.de" target="_blank">ludger.paehler@tum.de</a>
        </small>
    </div>
    <div style="text-align: center; display: inline-block;">
        <img src="images/logos/tum-logo.svg" style="height: 8em; display: inline-block;  vertical-align: middle; margin: 1em;"/>
    </div>
</div>

<div style="text-align: center;">
    <p style="text-align: center; display: inline-block; vertical-align: middle;">
        Oliver Schulz<br>
        <small>
            Max Planck Institute for Physics <br/>
            <a href="mailto:oschulz@mpp.mpg.de" target="_blank">oschulz@mpp.mpg.de</a>
        </small>
    </p>
    <p style="text-align: center; display: inline-block; vertical-align: middle;">
        <img src="images/logos/mpg-logo.svg" style="height: 5em; display: inline-block; vertical-align: middle; margin: 1em;"/>
        <img src="images/logos/mpp-logo.svg" style="height: 5em; display: inline-block; vertical-align: middle; margin: 1em;"/>
    </p>
</div>

<p style="text-align: center;">
    Leibniz Supercomputing Centre - LRZ, Jan. 27th 2019
</p>

## Scope of this course

TODO!

## Why Julia?

### Science needs code - but how to write it?

* Choice of programming language(s) matter!

* Need to balance:
    * Learning time
    * Productivity
    * Performance

* Usually involves compromises

### Programming Language Options

* C++:
    * Pro: Very fast (in expert hands)
    * Pro: Really cool new concepts (even literally) in C++11/14/17/...
    * Con: Complex, takes long time learn and much longer to master
    * Con: Straightforward tasks often result in lengthy code
    * Con: No memory management (General protection faults)  
    * Con: No universal package management
    * Con: Composability isn't great

### Programming Language Options

* Python:
    * Pro: Broad user base, popular first programming language
    * Pro: Easy to learn, good standard library
    * Con: Can't write time-critical loops in Python,  
      workarounds like Numba/Cython have many limitations,  
      don't compose well
    * Con: Language itself fairly primitive, not very expressive
    * Con: Duck-Typing necessitates lots of test code
    * Con: No effective multi-threading
    * Con: Composability isn't great

### What else is there?

* Fortran:
    * Pro: Math can be really fast
    * Con: Old language, few modern concepts
    * Con: Shrinking user base
    * Con: Composability isn't great
    * Do you *really* want to ...?


* Scala, Go, Kotlin etc.:
    * Pro: Lot's of individual strenghts
    * Con: Math either fast *or* generic *or* or complicated
    * Con: Calling C, Fortran or Phython code often difficult
    * Con: Composability isn't great

### The 97 and the 3 Percent

> We should forget about small efficiencies, say about 97% of the time: *premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%*.

Donald E. Knuth

* Some programming languages (e.g. Python) great for the 97% -  
  but can't make the 3% fast.
* Some other languages (e.g. C/C++, Fortran) can handle the 3% -  
  but makes the 97% complicated.

### The Two-language Problem

* Common approach nowadays:  
  Write time critical code in C/C++, rest in Python

* Pro: End-user can code comfortably in Python, with good performance

* Con: Complexity of C/C++ **plus** complexity of Python

* Con: Need proficiency in **two** languages, barrier that prevents  
  non-expert users from contributing to important parts of code

* Con: Limits generic implementation of algorithms

* Con: Severely limits metaprogramming, automatic differentiation, etc.

## The Expression Problem

> The expression problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).

Philip Wadler

* In other words: The capability to add both new subtypes and new functionality for a type defined in a package you don't own
* Object oriented languages typically can't do this  
  (Ruby has a dirty way, Scala a clean workaround)
* If you have programming experience, you have felt this, even if you didn't name it
* Result: Packages tend not to compose well


### We were looking for a language ...

* as fast as C/C++/Fortran
* as easy to learn and productive as Python
* with a solution for the expression problem
* with first class math support (vectors, matrices, etc.)
* with true functional programming
* with great Fortran/C/C++/Python integration
* with true metaprogramming (like Lisp or Scala)
* good at parallel and distributed programming
* suitable for for interactive, small and large applications

### Julia

* Designed for scientific/technical computing
* Originated at MIT, first public version 2012
* Covers the whole wish-list
* Clear focus on user productivity and software quality
* Rapid growth of user base and software packages
* Current version: Julia v1.3

### Julia Language Properties

* Fast: JAOT compilation to native CPU and GPU code
* Multiple-dispatch (more powerful than object-oriented):  
  solves the expression problem
* Dynamically typed
* Very powerful type system, types are first-class values
* Functional programming and metaprogramming
* First-class math support (like Fortran or Matlab)
* ...

### Julia Language Properties, cont.

* ...
* Local and distributed code execution
* State-of-the-art multi-threading: parallel code  
  can call parallel code that can call parallel code, ...,  
  without oversubscribing threads
* Software package management:  
  Trivial to create and install packages
* Excellent REPL (console)
* Easy to call Fortran, C/C++ and Python code

### Julia use cases in HPC

TODO!

Celeste, Clima, ...

### When (not) to use Julia

* *Do* use Julia for computations, visualization, data processing ... pretty much anything scientific/technical

* *Do not* use Julia for scripts what will only run for a second (code gen overhead), use Python or shell scripts

* *Do not* use Julia for non-computing web apps, etc. (*at least not yet*), use Go or Node.js

## Installing Julia

TODO!

TODO, Julia, Anaconda, ...

## Julia 101

### Hello World

In [None]:
println("Hello, World!")

### Verbs and nouns - functions and types

* Julia is not Java: Verbs aren't owned by nouns

* Julia has: types, functions and methods

* Methods belong to *functions*, not to types!

### Functions

Short on-liner function:

In [None]:
f(x) = x^2

In [None]:
f(3)

Function that need more than one line:

In [None]:
function f(x)
    # ... something ...
    x^2
end

is equivalent to

In [None]:
function f(x)
    # ... something ...
    return x^2
end

**Note:** `return` is optional, and often not used explicitly. Last expression in a function, block, etc. is automatically returned (like in Mathematica).

### Multiple dispatch and the expression problem

TODO!

use Unitful and Measurements as example, show off zero overhead of units too

In [None]:
f(4.2, [1, 2, 3, 4])

In [None]:
@code_llvm debuginfo=:none f(20, 2.1)

In [None]:
@code_native debuginfo=:none f(20, 2.1)

In [None]:
foo(x::Integer, y::Number) = x * y
foo(x::Integer, y::AbstractString) = join(fill(y, x))

In [None]:
foo(3, 4)

In [None]:
foo(3, "abc")

In [None]:
foo(4.5, 3)

In [None]:
foo("a", "b")

### Syntax: Variables

TODO!

### Control flow

TODO!

for, while, etc.

## Scoping

TODO!

## Arrays, memory layout and linear algebra

TODO!

### Array comprehension and generators

TODO!

### Broadcasting

In [None]:
A = [1.1, 2.2, 3.3]
B = [4.4, 5.5, 6.6]
broadcast((x, y) -> (x + y)^2, A, B)

Shorter broadcast syntax:

In [None]:
(A .+ B) .^ 2

#### Loop Fusion and SIMD Vectorization

In [None]:
foo(X, Y) = (X .+ Y) .^ 2
@code_llvm debuginfo=:none foo(A, B)

In [None]:
@code_native foo(A, B)

TODO!

### How Julia works

TODO!

compilation stages, `@code_...`, ...

### Package management


TODO!

### No free lunch

TODO!

Package loading and code-gen time, mitigations (Revise and PackageCompiler)

### Performance tips

TODO!

Let's define a function

In [None]:
f(x, y) = x * y
f(20, 2.1)

Multiplication is also defined for vectors, so this works, too:

### Functional Programming

In [None]:
A = rand(10)
idxs = findall(x -> 0.2 < x < 0.6, A)

In [None]:
A[idxs]

Even types are first-class values:

In [None]:
mytype = Number

In [None]:
subtypes(mytype)

Julia type hierarchy extends all the way down to primitive types:

In [None]:
Float64 <: AbstractFloat <: Real <: Number <: Any

This is efficient (not runtime reflection):

In [None]:
half_dynrange(T::Type{<:Number}) = (Int(typemax(T)) - Int(typemin(T))) / 2
half_dynrange(Int16)

In [None]:
@code_llvm half_dynrange(Int16)

#### Let's Make a Plot

In [None]:
using Plots
range = -π:0.01:π
plot(range, sin.(range) + rand(length(range)))

#### Histograms are easy, too

In [None]:
using Distributions
dist = Normal(0.0, 5.0)

In [None]:
stephist(rand(dist, 10000))

### Running Julia code on GPUs

### SIMD

TODO!

### Threads, partr

TODO!

### Processes, Clusters, MPI

TODO!

### Heterogeneous computing, CuArrays and CUDAnative, outlook (TPUs, AMD GPU native), maybe keep very short and refert to part 2?

In [None]:
ENV["JULIA_DEBUG"] = "CUDAnative"

using GPUArrays, CuArrays, CUDAnative, FFTW

f(a, b) = a^2 + b^2

A_cpu = rand(10^6); B_cpu = rand(10^6);
result_cpu = sum(f.(A_cpu, B_cpu))

A_gpu = CuArray(A_cpu); B_gpu = CuArray(B_cpu);
result_gpu = sum(f.(A_gpu, B_gpu))

result_cpu ≈ result_gpu

Can also write CUDA kernels in Julia by hand via [CUDAnative.jl](https://github.com/JuliaGPU/CUDAnative.jl).

TODO!

## Benchmarking and profiling, digging deeper

TODO!

## Docs and help: docs, books, videos, discourse, slack, stackoverflow, JuliaCon 2020

TODO!

## Statistics

TODO!

## Visualization/Plotting: Plots, Makie, plotting recipes

TODO!

## Calling code written in other language, REPL modes

### Shell REPL mode

TODO!

### PyCall, RCall

TODO!

#### Python integration via PyCall.jl

In [None]:
using PyCall

In [None]:
numpy = pyimport("numpy")
numpy.zeros(5)

In [None]:
A = rand(5)

In [None]:
py"""type($A)"""

### Cxx

TODO!

#### C++ Integration via Cxx.jl

In [None]:
using Cxx
cxxinclude("iostream")

message = "Hello, World"
out_stream = icxx" std::cout << $message << std::endl; "

In [None]:
message = "Hello, World!"
l = signed(icxx""" std::string($message).size(); """)

In [None]:
icxx""" $out_stream << "Message is " << $l << " characters long." << std::endl; 0; """;

### GitCommand

TODO!

## Tour of useful libraries / ecosystems

### DiffEq

TODO!

### Optim, JuMP, etc.

TODO!

### TypedTables and DataFrames

TODO!

### ArraysOfArrays and StructArrays

TODO!

### ValueShapes and BAT

TODO!

### ForwardDiff, Zygote (more on day 2)

TODO!

### Flux (refer to part 2)

TODO!