# Julia's secret sauce

Julia looks/feels like Matlab/Python but runs like C/Fortran. How?

  * **Just-in-time compilation** (JIT)
    - user-level code is compiled to machine code on-the-fly  
      
      
      
  * **Meticulous type system**
    - designed to maximize impact of JIT
    - type inference
    - type stability
    - multiple dispatch



## Just-in-time compilation 

In [3]:
f(x) = x^3 - 2     # define a simple function
@time f(0.3);      # run it once
@time f(0.3);      # run it again

  0.012669 seconds (299 allocations: 17.557 KiB)
  0.000005 seconds (5 allocations: 176 bytes)


The second evaluation is thousands of times faster than the first! Why? 

  * first run includes a compilation of user code to machine code
  * second run just executes the machine code

### Compilation to machine code, in stages

In [5]:
@code_lowered f(7.0)   # show f(x) in Julia's abstract syntax tree

CodeInfo(:(begin 
        nothing
        return (Base.literal_pow)(Main.^, x, (Core.apply_type)(Base.Val, 3)) - 2
    end))

In [6]:
@code_typed f(7.0)     # show f(x) in abstract syntax tree with types determined 

CodeInfo(:(begin 
        return (Base.sub_float)((Base.mul_float)((Base.mul_float)(x, x)::Float64, x)::Float64, (Base.sitofp)(Float64, 2)::Float64)::Float64
    end))=>Float64

In [7]:
@code_llvm f(7.0)       # show f(x) in LLVM (compiler) intermediate language


define double @julia_f_60969(double) #0 !dbg !5 {
top:
  %1 = fmul double %0, %0
  %2 = fmul double %1, %0
  %3 = fadd double %2, -2.000000e+00
  ret double %3
}


In [8]:
@code_native f(7.0)     # show f(x) in Intel IA-64 assembly language

	.text
Filename: In[1]
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 1
	movapd	%xmm0, %xmm1
	mulsd	%xmm1, %xmm1
	mulsd	%xmm0, %xmm1
	movabsq	$140452003107272, %rax  # imm = 0x7FBD87C0D1C8
	addsd	(%rax), %xmm1
	movapd	%xmm1, %xmm0
	popq	%rbp
	retq
	nopw	%cs:(%rax,%rax)


## Type inference

Julia infers the types of untyped variables --crucial for compiling to machine code!

In [4]:
@code_llvm f(7.0)   # f applied to a Float64 (C's "double" type)


define double @julia_f_60989(double) #0 !dbg !5 {
top:
  %1 = fmul double %0, %0
  %2 = fmul double %1, %0
  %3 = fadd double %2, -2.000000e+00
  ret double %3
}


In [5]:
@code_llvm f(7f0)   # f applied to a Float32  (C's "float" type)


define float @julia_f_60993(float) #0 !dbg !5 {
top:
  %1 = fmul float %0, %0
  %2 = fmul float %1, %0
  %3 = fadd float %2, -2.000000e+00
  ret float %3
}


In [6]:
@code_llvm f(7)     # f applied to an Int64   (C's "int" type)


define i64 @julia_f_60995(i64) #0 !dbg !5 {
top:
  %1 = mul i64 %0, %0
  %2 = mul i64 %1, %0
  %3 = add i64 %2, -2
  ret i64 %3
}


In [7]:
@code_llvm f(7 + 2im) # f applied to a Complex is more complex


define void @julia_f_60996(%Complex.62* noalias nocapture sret, %Complex.62* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %2 = getelementptr inbounds %Complex.62, %Complex.62* %1, i64 0, i32 0
  %3 = load i64, i64* %2, align 8
  %4 = mul i64 %3, %3
  %5 = getelementptr inbounds %Complex.62, %Complex.62* %1, i64 0, i32 1
  %6 = load i64, i64* %5, align 8
  %7 = mul i64 %6, %6
  %8 = sub i64 %4, %7
  %9 = shl i64 %6, 1
  %10 = mul i64 %3, %9
  %11 = mul i64 %8, %3
  %12 = mul i64 %10, %6
  %13 = mul i64 %8, %6
  %14 = mul i64 %3, %10
  %15 = add i64 %14, %13
  %16 = add i64 %11, -2
  %17 = sub i64 %16, %12
  %.sroa.0.0..sroa_idx = getelementptr inbounds %Complex.62, %Complex.62* %0, i64 0, i32 0
  store i64 %17, i64* %.sroa.0.0..sroa_idx, align 8
  %.sroa.2.0..sroa_idx1 = getelementptr inbounds %Complex.62, %Complex.62* %0, i64 0, i32 1
  store i64 %15, i64* %.sroa.2.0..sroa_idx1, align 8
  ret void
}


## Type stability

Just-in-time compilation produces most efficient machine code when types of temporaries and return values can be determined at compile-time. 

In Julia, `sqrt` is real-to-real, and `sqrt(-1)` is an error! Matlab avoids this problem by 
making all numbers complex, and thus doubling the cost and size of all real-valued mathematics --in Matlab, a real-valued scalar is a 1 x 1 complex matrix!

In [13]:
sqrt(1.0)      

1.0

In [14]:
sqrt(-1.0)

LoadError: DomainError:
sqrt will only return a complex result if called with a complex argument. Try sqrt(complex(x)).

In [27]:
sqrt(-1.0 + 0.0im)   # In Julia, if you want a complex sqrt result, use the complex sqrt function

0.0 + 1.0im

## Multiple dispatch

Many built-in functions have multiple versions specialized for particular input types. Selection is done at compile time if types can be inferred, run time if they can't.

In [15]:
methods(\)

 Type inference and multiple dispatch allow the just-in-time compiler to translate complex chains of type-stable Julia code into efficient machine code.

## Iterated logistic map in Julia, C++, and Matlab

Define $f(x) = 4x(1-x)$, generate millionth iterate function $f^N(x)$

In [1]:
# define function that, given an f, returns iterated function f^N
function iterator(f, N)
    
    # construct f^N
    function fN(x)
      for i ∈ 1:N             
        x = f(x)
      end
      x
    end    
    
    fN     # return f^N
end

# define logistic map function
f(x) = 4*x*(1-x)

# use iterator function to constuct millionth iterate of logistic map
fᴺ = iterator(f, 10^6)  

(::fN) (generic function with 1 method)

In [4]:
@time fᴺ(0.67);
@time fᴺ(0.67);

  0.003977 seconds (5 allocations: 176 bytes)
  0.003629 seconds (5 allocations: 176 bytes)


0.10116885334547539

### Equivalent C++ code

note: starting semicolon tells Julia to execute Unix shell code

In [16]:
; pwd

/home/gibson/gitworking/whyjulia


In [18]:
; cat fmillion.cpp

#include <stdlib.h>
#include <iostream>
#include <iomanip>
#include <ctime>

using namespace std;

double f(double x) {
  return 4*x*(1-x);
}

int main(int argc, char* argv[]) {
  double x = argc > 1 ? atof(argv[1]) : 0.0;

  double t0 = clock();
  for (int n=0; n<1000000; ++n)
    x = f(x);
  double t1 = clock();

  cout << "t = " << (t1-t0)/CLOCKS_PER_SEC << " seconds" << endl;
  cout << setprecision(17);
  cout << "x = " << x << endl;
  
  return 0;
}
  


In [19]:
; g++ -O3 -o fmillion fmillion.cpp

### Execution time for C++

In [5]:
; fmillion 0.67

t = 0.003234 seconds
x = 0.10116885334547539


### Execution time for Julia

In [7]:
print("t="); 
@time x = fᴺ(0.67);
@show x;

t=  0.003590 seconds (5 allocations: 176 bytes)
x = 0.10116885334547539


Execution times $t$ are comparable. Sometimes Julia is faster, sometimes C. Millionth iterates $x$ are the same, indicate same sequence of floating-point operations. 

### Execution time in Matlab

```
>> tic(); x=fN(0.67); t=toc();
>> t,x
t = 0.048889000000000
x = 0.101168853345475
```
Same number, but about five to ten times slower than Julia or C++.