*Why should you care about all this?*

# Composability

* light-weight parametric type system
* multiple-dispatch

Let's say we've developed some application code:

In [1]:
foo(x, y) = x*y + x - y

foo (generic function with 1 method)

In [2]:
foo(1, 2)

1

In [3]:
The code was intended to be used on regular CPU arrays:

In [4]:
a = [1,2]
b = [3,4]
foo.(a, b)

2-element Array{Int64,1}:
 1
 6

Due to being dynamically typed, we can **change the container type** as long as all necessary operations are available:

In [9]:
using CuArrays
gpu_a = CuArray(a)
gpu_b = CuArray(b)
foo.(gpu_a, gpu_b)

2-element CuArray{Int64,1}:
 1
 6

We can also **change the element type**, e.g. to work with `Dual` numbers that track derivatives of operations:

In [6]:
using ForwardDiff
diff_a = ForwardDiff.Dual.(a, 1)
diff_b = ForwardDiff.Dual.(b, 1)
foo.(diff_a, diff_b)

2-element Array{ForwardDiff.Dual{Nothing,Int64,1},1}:
 Dual{Nothing}(1,4)
 Dual{Nothing}(6,6)

This is part of Julia's automatic differentiation ecosystem:

In [7]:
bar = x -> sum(foo.(x, b))
grad = x -> ForwardDiff.gradient(bar, x)
grad(a)

2-element Array{Int64,1}:
 4
 5

Going even further, we can **compose the infrastructure** and accomplish dual numbers on the GPU:

In [8]:
gpu_diff_a = CuArray(diff_a)
gpu_diff_b = CuArray(diff_b)
foo.(gpu_diff_a, gpu_diff_b)

2-element CuArray{ForwardDiff.Dual{Nothing,Int64,1},1}:
 Dual{Nothing}(1,4)
 Dual{Nothing}(6,6)

The package ecosystem is still figuring out interface design to maximize opportunities here, but this is very powerful!

# Speed

Language designer around JIT compiler:
* fine-grained specialization
* avoid program uncertainty
* compileable abstractions

Reuse of high-quality libraries:
* BLAS
* LAPACK
* ARPACK

![Julia performance](img/julia_perf.png)

# First-class compiler
*Note: personal bias.*

## CUDAnative.jl

![Compiler interfaces](img/interfaces.png)

## SIMD.jl

We already have `@simd` annotated for loops, helping the compiler to autovectorize.

With SIMD.jl, we can express SIMD-vectorized operations naturally.

```julia
using SIMD
function vadd!{N,T}(xs::Vector{T}, ys::Vector{T}, ::Type{Vec{N,T}})
    @assert length(ys) == length(xs)
    @assert length(xs) % N == 0
    @inbounds for i in 1:N:length(xs)
        xv = vload(Vec{N,T}, xs, i)
        yv = vload(Vec{N,T}, ys, i)
        xv += yv
        vstore(xv, xs, i)
    end
end
```

## Cassette.jl
*Cassette lets you easily extend the Julia language by directly injecting the Julia compiler with new, context-specific behaviors.*