<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/Julia_prog_language.svg/1280px-Julia_prog_language.svg.png" width=300>
</center>

# Introduction to Julia

## Dr. Josh Day

- GitHub: `@joshday`
- Email: josh@seqstat.com
- Slides: https://github.com/joshday/Talks


# Julia Resources

- [https://julialang.org](https://julialang.org)
- [https://juliabox.com](https://juliabox.com) (run Julia on the cloud, free tutorials)
- [https://juliaobserver.com/](https://juliaobserver.com/) (finding packages)
- [https://discourse.julialang.org](https://discourse.julialang.org) (ask for help)
- [http://julialang.slack.com/](http://julialang.slack.com/) (ask for help)
- [https://docs.julialang.org/en/](https://docs.julialang.org/en/) (documentation)


- Note: If looking for Julia tutorials, check the date they were created and the Julia version used.
    - Julia 1.0 has very few deprecation warnings (use Julia 0.7 as a stepping stone)
    - Some syntax is broken between Julia 0.6 and Julia 1.0!

# Motivation

- Do we need another language?
- Let's start with Sapir-Worf Hypothesis

# Sapir-Worf Hypothesis

- Your language influences/determines how you think

**How you solve problems is influenced by your tools**

- e.g. with R, avoid loops 

# The Two-Language Problem

- Write your prototype in an easy language (R)
- Write your final version in a fast language (C++)

# Julia

I claim that:

### 1) Julia is less controlling over how you solve problems
### 2) Julia solves the two-language problem

# What is Julia?
> Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments

- Julia is more than just "Fast R"
    - Performance comes from features that work well together.  
    - You can't just take the magic dust that makes Julia fast and sprinkle it on [language of choice]
    
## Julia Features

- Type system
- Multiple dispatch
- Type Inference
- Metaprogramming (macros)
- Just-in-time (JIT) compilation using LLVM
- Clean, familiar syntax
- Most of Julia is written in Julia!

# Benchmarks

<center><img src="https://julialang.org/images/benchmarks.svg" width=900></center>

# Julia is Just-In-Time Compiled

- The first time a function is run, Julia compiles it

In [None]:
y = rand(10^6)

@time sum(y)

@time sum(y)

# Generic Code Gets Specialized

- Julia specializes on **types of arguments** (without you telling Julia what those types are)

In [None]:
f(x) = x + 4

@time f(1.0)
@time f(1.0)

In [None]:
@time f(1)
@time f(1)

# Most of Julia is Written in Julia

- Easy to find out what's going on inside a function 
    - `@edit`
- Also most Julia packages are 100% Julia
    - **Tensorflow**:
    ![](tensorflow.png)
    - **Flux.jl**
    ![](flux.png)

# Julia Makes Use of Metaprogramming

- A function of an expression that can be altered before it is evaluated
- Code that writes code
- Can do everything a function can do plus much more

In [None]:
macro thing(x)
    println(typeof(x))
    :("This is a thing")
end

@thing 1 + 2 + 3

In [None]:
@code_llvm f(1)

- What is the difference between `show` and `@show`?

In [None]:
val = 100
@show val;

In [None]:
show(val)

## One of the Most Useful Macros: `@time`

- Provides elapsed time **as well as allocations**
- Removing temporary allocations (garbage collection) is expensive!
- It's impossible to oversell how useful this is

In [None]:
@time rand(10^6);

# Julia Has a Great Type System

In [None]:
rand(2, 2)

In [None]:
typeof(1.0)

In [None]:
typeof(1)

# Types Define Sets of Things

![](tree.png)

- Abstract types "don't exist".  They define a set of things that behave similarly.
- Concrete types "are real".  They exist in a set of things defined by an abstract type.
    - Concrete types do not have subtypes

Type tree from `Any` to `Float64`:
- Any (abstract)
    - Number (abstract)
        - Real (abstract)
            - AbstractFloat (abstract)
                - Float64 (concrete)
                
Is a `Float64` a `Number`? Yes

Is a `Float64` an `AbstractFloat`? Yes

One "set" is smaller than the other: `AbstractFloat <: Number`

In [None]:
supertype(Float64)

In [None]:
supertype(AbstractFloat)

In [None]:
supertype(Real)

In [None]:
supertype(Number)

# Programs are Organized Around Multiple Dispatch

- The idea that different code gets called depending on the types of the arguments
- Multiple dispatch is amazing

In [None]:
f(x::Number) = "This is a Number"
f(x::String) = "This is a String"
f(x) = "This is something else"

In [None]:
f(1)

In [None]:
f("asdf")

In [None]:
f([1, 2])

## Quintessential R vs. Julia

- Consider R's `pnorm`, `dnorm`, `qnorm`, etc. family of functions.
- In Julia, multiple dispatch is used to create a grammar/interface for "how to talk about" a set of things
    - What is the interface for probability distributions?

In [None]:
using Distributions

d = Normal(0, 1)
d2 = Gamma(3, 5);

In [None]:
mean(d), var(d), cdf(d, 1)

In [None]:
mean(d2), var(d2), cdf(d2, 1)

## A Concrete Example

- Here is a very naive Newton's algorithm for finding the quantile `q` of a distribution `d`
- I haven't told Julia anything about types, but this will work as long as 
    1. `d` is something that I can calculate the `mean`, `cdf`, and `pdf` of
    2. `q` is a Number
- Also because of the JIT, I get specialized code for each distribution!

In [None]:
function my_quantile(d, q)
    θ = mean(d)
    for i in 1:20
        θ -= (cdf(d, θ) - q) / pdf(d, θ)
    end
    θ
end

In [None]:
my_quantile(Normal(), .5)

In [None]:
my_quantile(Gamma(5, 1), .3)

# Julia's Growth (Number of Packages)

![](https://pkg.julialang.org/img/allver.svg)

# Julia's Growth (GitHub Stars)

![](https://pkg.julialang.org/img/stars.svg)

# Pass by Reference

- R makes a copy of function arguments
    - You can't do any damage
    - But you lose performance
- In Julia, you're free to really mess with objects inside a function
- By convention, if you are **mutating** an argument, end the function with `!`

In [None]:
# Don't do this
function totally_safe_function(x)
    x .= 0
end

val = [1,2,3,4]

totally_safe_function(val)

val

# Broadcasting

- `sin` of a vector is not defined
- Most languages use the syntactic sugar that `sin` of a vector means "apply `sin` to each element of the vector"
- Julia doesn't, because
    1. It's wrong
    2. It's unnecessary and can be generalized to all functions of singletons

In [None]:
sin(rand(5))

- Dot syntax does broadcasting/maps the function to each element

In [None]:
sin.(rand(5))

- Multiple broadcasting functions can be chained together (without creating temporary copies)

In [None]:
cos.(sin.(abs.(rand(2, 2))))

# Julia is Lazy

- Many types in Julia are lazy
- `AbstractRange` subtypes store the information for generating a range of numbers, not the numbers themselves.

In [None]:
rng = 1:100

In [None]:
typeof(rng)

In [None]:
fieldnames(typeof(rng))

In [None]:
rng.start, rng.stop

In [None]:
rng[50]

- You can typically turn a lazy type into a "real thing" with `collect`

In [None]:
collect(rng)

# For Loops

- **In Julia, loops are fast.  Don't avoid them.**
- It took me several weeks of Julia programming before I shook my R habit of vectorizing everything


- Tip: Use `eachindex` to iterate over the elements of a collection

In [None]:
x = rand(5)

for i in eachindex(x)
    println(x[i])
end

In [None]:
for xi in x
    println(xi)
end

In [None]:
for (i, xi) in enumerate(x)
    println("Element $i is $xi")
end

# Anonymous Functions and the `do` Syntax

- An **anonymous function** is a function you probabily won't use again
    - Created with syntax: `(x,y,z) -> x + y + z`
- `do` blocks:
    - An easy way of writing longer anonymous functions
    - For functions that accept a function as its first argument
- The following are different ways of doing the same thing

In [None]:
map(abs, [-1, -2, -3])

In [None]:
map(x -> abs(x), [-1, -2, -3])

In [None]:
map([-1, -2, -3]) do x
    abs(x)
end

# Tuples and NamedTuples

- Efficient way to join heterogenous objects together in a type-stable way

In [None]:
("I", "am", "a", "tuple", 1 , 2, 3.0)

- You can also give items a name

In [None]:
nt = (x = 1, y = 2)

In [None]:
nt.x

# Creating Your Own Types

- `struct`: Contents that won't change
- `mutable struct`: Contents that may change

In [None]:
struct Population
    x::Vector{Int}
end

struct SampleWithReplacement
    x::Vector{Int}
end

In [None]:
SampleWithReplacement(pop::Population, n) = SampleWithReplacement(rand(pop.x, n))

In [None]:
pop = Population(collect(1:10))

In [None]:
pop.x

In [None]:
SampleWithReplacement(pop, 8)

# Interop

- You don't need to leave your favorite R/Python/C/Fortran/C++ code behind
- All are easily callable from Julia

## R

You can "send" objects from Julia to R using interpolation syntax `$`

In [None]:
using RCall

x = randn(100)

R"hist($x)"

In [None]:
R"library(ggplot2); qplot($x)"

## Python

In [None]:
using PyCall

@pyimport numpy.random as nr

nr.rand(3, 4)

# REPL Modes

- Your first experience with Julia is probably through the REPL (read-eval-print-loop)

There are several **REPL Modes** that can be activated by certain characters:

- `?` (help)
- `]` (package manager)
- `;` (shell)
- `$` (R via [RCall.jl](https://github.com/JuliaInterop/RCall.jl))

# Linear Algebra

- **I could do multiple lectures on numerical linear algebra in Julia.  It's fantastic.**
- Call `BLAS` functions directly
- In-place (mutating) operations lead to huge performance gains!
- Types for storing matrix factorizations for quickly solving linear systems, etc.

In [None]:
using LinearAlgebra

In [None]:
x = randn(10, 2)

lu(x)

In [None]:
c = cholesky(x'x)

In [None]:
inv(c)  # You very rarely need to do this

In [None]:
svd(x)

In [None]:
eigen(x'x)

# Standard Library

- Some things you may expect to already be in Julia need to be loaded
    - `using Statistics`
    - `using LinearAlgebra`
    - `using DelimitedFiles`

# [State of the Art Packages in Julia 1.0](http://www.stochasticlifestyle.com/some-state-of-the-art-packages-in-julia-v1-0/) (Read this.  Seriously.)

# [OnlineStats.jl](https://github.com/joshday/OnlineStats.jl) (Single-pass algorithms for statistics)

- Essentially my PhD research
- Handles both data that is streaming and larger than memory

In [None]:
using OnlineStats

x = randn(10^6)

o = Series(Mean(), Variance(), P2Quantile(.5))

for xi in x
    fit!(o, xi)
end

o

In [None]:
using Plots

h = fit!(Hist(100), randn(10^6))

plot(h)

# [Flux.jl](https://github.com/FluxML/Flux.jl) (Neural Networks)

Flux has powerful high-level features, and common architectures can be defined in a few lines.

```julia
model = Chain(
    Dense(768, 128, σ),
    LSTM(128, 256),
    LSTM(256, 128),
    Dense(128, 10),
    softmax
)

loss(x, y) = crossentropy(model(x), y)

Flux.train!(loss, data, ADAM(...))
```

# [DifferentialEquations.jl](https://github.com/JuliaDiffEq/DifferentialEquations.jl)

![](diffeq.png)

# [JuMP.jl](https://github.com/JuliaOpt/JuMP.jl) (Optimization)

```julia
using JuMP
using Clp

m = Model(solver = ClpSolver())
@variable(m, 0 <= x <= 2 )
@variable(m, 0 <= y <= 30 )

@objective(m, Max, 5x + 3*y )
@constraint(m, 1x + 5y <= 3.0 )

print(m)

status = solve(m)

println("Objective value: ", getobjectivevalue(m))
println("x = ", getvalue(x))
println("y = ", getvalue(y))
```

# [Interact.jl](https://github.com/JuliaGizmos/Interact.jl)

In [1]:
using Interact, Plots, Random

@manipulate for i in 1:50
    Random.seed!(123)
    scatter(rand(i), alpha = i/50, ylim=(0,1), xlim=(0,50))
    plot!(sin, 0, i)
end

ArgumentError: ArgumentError: Package Interact not found in current path:
- Run `import Pkg; Pkg.add("Interact")` to install the Interact package.
