# Julia's secret sauce

Julia looks and feels like Matlab/Python but runs like C/Fortran. How?

  * **Just-in-time compilation** (JIT)
    - user-level code is compiled to machine code on-the-fly  
      
      
      
  * **Meticulous type system**
    - designed to maximize impact of JIT
    - type inference
    - type stability
    - multiple dispatch



## Just-in-time compilation 

In [1]:
f(x) = x^3 - 2     # define a simple function
@time f(0.3);      # run it once
@time f(0.3);      # run it again

  0.004629 seconds (1.20 k allocations: 74.862 KiB)
  0.000002 seconds (5 allocations: 176 bytes)


The second evaluation is thousands of times faster than the first! Why? 

  * first run includes a compilation of user code to machine code
  * second run just executes the machine code

### Compilation to machine code, in stages

In [2]:
@code_lowered f(7.0)   # show f(x) in Julia's abstract syntax tree

CodeInfo(:(begin 
        nothing
        return (Base.literal_pow)(Main.^, x, (Core.apply_type)(Base.Val, 3)) - 2
    end))

In [3]:
@code_typed f(7.0)     # show f(x) in abstract syntax tree with types determined 

CodeInfo(:(begin 
        return (Base.sub_float)((Base.mul_float)((Base.mul_float)(x, x)::Float64, x)::Float64, (Base.sitofp)(Float64, 2)::Float64)::Float64
    end))=>Float64

In [4]:
@code_llvm f(7.0)       # show f(x) in LLVM (compiler) intermediate language


define double @julia_f_60926(double) #0 !dbg !5 {
top:
  %1 = fmul double %0, %0
  %2 = fmul double %1, %0
  %3 = fadd double %2, -2.000000e+00
  ret double %3
}


In [5]:
@code_native f(7.0)     # show f(x) in Intel IA-64 assembly language

	.text
Filename: In[1]
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 1
	movapd	%xmm0, %xmm1
	mulsd	%xmm1, %xmm1
	mulsd	%xmm0, %xmm1
	movabsq	$140280141402224, %rax  # imm = 0x7F9583FF5070
	addsd	(%rax), %xmm1
	movapd	%xmm1, %xmm0
	popq	%rbp
	retq
	nopw	%cs:(%rax,%rax)


## Type inference

Julia infers the types of untyped variables --crucial for compiling to machine code!

In [6]:
@code_llvm f(7.0)   # f applied to a Float64 (C's "double" type)


define double @julia_f_60926(double) #0 !dbg !5 {
top:
  %1 = fmul double %0, %0
  %2 = fmul double %1, %0
  %3 = fadd double %2, -2.000000e+00
  ret double %3
}


In [7]:
@code_llvm f(7f0)   # f applied to a Float32  (C's "float" type)


define float @julia_f_61104(float) #0 !dbg !5 {
top:
  %1 = fmul float %0, %0
  %2 = fmul float %1, %0
  %3 = fadd float %2, -2.000000e+00
  ret float %3
}


In [8]:
@code_llvm f(7)     # f applied to an Int64   (C's "int" type)


define i64 @julia_f_61106(i64) #0 !dbg !5 {
top:
  %1 = mul i64 %0, %0
  %2 = mul i64 %1, %0
  %3 = add i64 %2, -2
  ret i64 %3
}


In [9]:
@code_llvm f(7 + 2im) # f applied to a Complex is more complex


define void @julia_f_61110(%Complex.62* noalias nocapture sret, %Complex.62* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %2 = getelementptr inbounds %Complex.62, %Complex.62* %1, i64 0, i32 0
  %3 = load i64, i64* %2, align 8
  %4 = mul i64 %3, %3
  %5 = getelementptr inbounds %Complex.62, %Complex.62* %1, i64 0, i32 1
  %6 = load i64, i64* %5, align 8
  %7 = mul i64 %6, %6
  %8 = sub i64 %4, %7
  %9 = shl i64 %6, 1
  %10 = mul i64 %3, %9
  %11 = mul i64 %8, %3
  %12 = mul i64 %10, %6
  %13 = mul i64 %8, %6
  %14 = mul i64 %3, %10
  %15 = add i64 %14, %13
  %16 = add i64 %11, -2
  %17 = sub i64 %16, %12
  %.sroa.0.0..sroa_idx = getelementptr inbounds %Complex.62, %Complex.62* %0, i64 0, i32 0
  store i64 %17, i64* %.sroa.0.0..sroa_idx, align 8
  %.sroa.2.0..sroa_idx1 = getelementptr inbounds %Complex.62, %Complex.62* %0, i64 0, i32 1
  store i64 %15, i64* %.sroa.2.0..sroa_idx1, align 8
  ret void
}


## Type stability

Just-in-time compilation produces most efficient machine code when types of temporaries and return values can be determined at compile-time. I.e. the return types of functions should depend only on the *types* of their arguments, and not the *values* of the arguments. This is called *type stability*. 

For example, the `sqrt(x)` always returns a real number for a real argument. So `sqrt(-1)` is an error! 

In [10]:
sqrt(2)      

1.4142135623730951

In [11]:
sqrt(-1.0)

LoadError: DomainError:
sqrt will only return a complex result if called with a complex argument. Try sqrt(complex(x)).

In Julia, if you want a complex sqrt result, use the complex sqrt function

In [12]:
sqrt(-1.0 + 0.0im)  

0.0 + 1.0im

Matlab avoids this problem and other type problems by making all numbers complex 1 x 1 matrices, thus more than doubling the size and cost of all real-valued mathematics. 

Note that you can write functions in Julia whose return type depends on the value of the arguments, but they won't compile efficiently. In Julia this is known as *type instability*.

## Multiple dispatch

Many built-in functions have multiple versions specialized for particular input types. Selection is done at compile time if the types of arguments can be inferred, and at run time if they can't.

In [13]:
methods(+)

 Type inference and multiple dispatch allow the just-in-time compiler to translate complex chains of type-stable Julia code into efficient machine code.

## Iterated logistic map in Julia, C++, and Matlab

Define $f(x) = 4x(1-x)$, generate millionth iterate function $f^N(x) = f(\,f(\ldots(\,f(\,f(x))))$

In [14]:
# define function that, given an f, returns iterated function f^N
function iterator(f, N)
    
    # construct f^N
    function fN(x)
      for i ∈ 1:N             
        x = f(x)
      end
      x
    end    
    
    fN     # return f^N
end

# define logistic map function
f(x) = 4*x*(1-x)

# use iterator function to constuct millionth iterate of logistic map
fᴺ = iterator(f, 10^6)  

(::fN) (generic function with 1 method)

In [15]:
@time fᴺ(0.67);
@time fᴺ(0.67);

  0.015250 seconds (1.79 k allocations: 100.209 KiB)
  0.004310 seconds (5 allocations: 176 bytes)


### Equivalent C++ code

note: starting semicolon tells Julia to execute Unix shell code

In [16]:
; pwd

/home/gibson/gitworking/whyjulia


In [17]:
; cat fmillion.cpp

#include <stdlib.h>
#include <iostream>
#include <iomanip>
#include <ctime>

using namespace std;

double f(double x) {
  return 4*x*(1-x);
}

int main(int argc, char* argv[]) {
  double x = argc > 1 ? atof(argv[1]) : 0.0;

  double t0 = clock();
  for (int n=0; n<1000000; ++n)
    x = f(x);
  double t1 = clock();

  cout << "t = " << (t1-t0)/CLOCKS_PER_SEC << " seconds" << endl;
  cout << setprecision(17);
  cout << "x = " << x << endl;
  
  return 0;
}
  


In [18]:
; g++ -O3 -o fmillion fmillion.cpp

### Execution time for C++

In [19]:
; fmillion 0.67

t = 0.009918 seconds
x = 0.10116885334547539


### Execution time for Julia

In [20]:
print("t="); 
@time x = fᴺ(0.67);
@show x;

t=  0.005099 seconds (6 allocations: 224 bytes)
x = 0.10116885334547539


Execution times $t$ are comparable. Sometimes Julia is faster, sometimes C. Millionth iterates $x$ are the same, indicate same sequence of floating-point operations. 

### Execution time in Matlab

```
>> tic(); x=fN(0.67); t=toc();
>> t,x
t = 0.048889000000000
x = 0.101168853345475
```
Same result $x$, but about five to ten times slower than Julia or C++.