## Specialization

The first time a newly defined funciton is executed, it will be compiled to native code and executed.  On subsequent calls that native compiled code is executed without the overhead of the compilation process.

You can clearly observe this with the funciton defined below.

In [1]:
quadratic(x) = x.^2 .+ 2x .+ 5

quadratic (generic function with 1 method)

To provide some work for the compiler, you can call the funciton with a vector.

In [2]:
x = rand(10_000);

In [3]:
@time quadratic(x);
@time quadratic(x);

  0.203708 seconds (720.81 k allocations: 37.400 MiB, 40.49% gc time, 99.96% compilation time)
  0.000051 seconds (4 allocations: 156.344 KiB)


The first call indeed takes orders of magnitude longer than the second or any subsequent calls for that matter.

However, not that when we call the function with an argument of a different type, it will be compiled again.

In [4]:
x = rand(Int, (10_000, ));

In [5]:
@time quadratic(x);
@time quadratic(x);

  0.137798 seconds (623.40 k allocations: 31.789 MiB, 99.94% compilation time)
  0.000040 seconds (4 allocations: 156.344 KiB)


You can use the `MethodAnalysis` package to gain insight into the instancees that Julia generated for your funcitons by calling the `methodinstances` function.

In [6]:
using MethodAnalysis

In [9]:
methods(quadratic)

In [10]:
methodinstances(quadratic)

2-element Vector{Core.MethodInstance}:
 MethodInstance for quadratic(::Vector{Float64})
 MethodInstance for quadratic(::Vector{Int64})

Clearly, two instances of the function have been produced, one for a `Vector` of `Float64`, the other for `Int64` elements.

When you call the function with a scalar floating point value, a third instance is created, but given the fact that this is a very simple function, the compilation hardly takes any time at all.

In [11]:
@time quadratic(3.0);
@time quadratic(12.0);

  0.000001 seconds
  0.000001 seconds


In [12]:
methodinstances(quadratic)

3-element Vector{Core.MethodInstance}:
 MethodInstance for quadratic(::Vector{Float64})
 MethodInstance for quadratic(::Vector{Int64})
 MethodInstance for quadratic(::Float64)

## Just Ahead Of Time compilation

When you call a function for the first time, the source code is converted into native machine code in a process with many steps.  The result of these steps can be inspected using macros.

To illustrate this, you can define another function.

In [1]:
axpy(alpha, x, y) = alpha*x + y

axpy (generic function with 1 method)

One of the first steps you can inspect is the result of macro expansion, e.g., what happens when you use the `@time` macro.

In [19]:
@macroexpand @time axpy(2.0, 3.0, 5.0)

quote
    [90m#= timing.jl:253 =#[39m
    begin
        [90m#= timing.jl:258 =#[39m
        $(Expr(:meta, :force_compile))
        [90m#= timing.jl:259 =#[39m
        local var"#86#stats" = Base.gc_num()
        [90m#= timing.jl:260 =#[39m
        local var"#88#elapsedtime" = Base.time_ns()
        [90m#= timing.jl:261 =#[39m
        Base.cumulative_compile_timing(true)
        [90m#= timing.jl:262 =#[39m
        local var"#89#compile_elapsedtimes" = Base.cumulative_compile_time_ns()
        [90m#= timing.jl:263 =#[39m
        local var"#87#val" = $(Expr(:tryfinally, :(axpy(2.0, 3.0, 5.0)), quote
    var"#88#elapsedtime" = Base.time_ns() - var"#88#elapsedtime"
    [90m#= timing.jl:265 =#[39m
    Base.cumulative_compile_timing(false)
    [90m#= timing.jl:266 =#[39m
    var"#89#compile_elapsedtimes" = Base.cumulative_compile_time_ns() .- var"#89#compile_elapsedtimes"
end))
        [90m#= timing.jl:268 =#[39m
        local var"#90#diff" = Base.GC_Diff(Base.gc_num(), va

As you can see, a lot of bookkeeping is required to take track of the time and number of allocation accurately,  After writing the timing results, the result of the Julia expression is returned.

This function is of course most useful when you develop your own macros.

The code is subsequently converted into an internal representation, i.e., the code is "lowered".

In [26]:
@code_lowered axpy(2.0, 3.0, 5.0)

CodeInfo(
[90m1 ─[39m %1 = alpha * x
[90m│  [39m %2 = %1 + y
[90m└──[39m      return %2
)

In the next phase, types are derived for this code, which you can inspect as well.

In [27]:
@code_typed axpy(2.0, 3.0, 5.0)

CodeInfo(
[90m1 ─[39m %1 = Base.mul_float(alpha, x)[36m::Float64[39m
[90m│  [39m %2 = Base.add_float(%1, y)[36m::Float64[39m
[90m└──[39m      return %2
) => Float64

Since all function arguments are `Float64` in this case, all intermediate types as well as the return type of the funciton will be `Float64` as well.

On the other hand, if the first two arguments are `Int`, you observe a different intermediate type, as well as some type conversions.

In [28]:
@code_typed axpy(2, 3, 5.0)

CodeInfo(
[90m1 ─[39m %1 = Base.mul_int(alpha, x)[36m::Int64[39m
[90m│  [39m %2 = Base.sitofp(Float64, %1)[36m::Float64[39m
[90m│  [39m %3 = Base.add_float(%2, y)[36m::Float64[39m
[90m└──[39m      return %3
) => Float64

Obviously, when the function is called with arguments that are all `Int`, the result type of the function will be `Int64`.

In a subsequence phase, the intermediate code is translated to LLVM representation, the virtual machine code used by the underlying compiler.

In [29]:
@code_llvm axpy(2.0, 3.0, 5.0)

[90m;  @ In[13]:1 within `axpy`[39m
[95mdefine[39m [36mdouble[39m [93m@julia_axpy_2860[39m[33m([39m[36mdouble[39m [0m%0[0m, [36mdouble[39m [0m%1[0m, [36mdouble[39m [0m%2[33m)[39m [0m#0 [33m{[39m
[91mtop:[39m
[90m; ┌ @ float.jl:385 within `*`[39m
   [0m%3 [0m= [96m[1mfmul[22m[39m [36mdouble[39m [0m%0[0m, [0m%1
[90m; └[39m
[90m; ┌ @ float.jl:383 within `+`[39m
   [0m%4 [0m= [96m[1mfadd[22m[39m [36mdouble[39m [0m%3[0m, [0m%2
[90m; └[39m
  [96m[1mret[22m[39m [36mdouble[39m [0m%4
[33m}[39m


Finally, native code is produced in which you recognize the familiar machine instructions.

In [30]:
@code_native axpy(2.0, 3.0, 5.0)

	[0m.text
	[0m.file	[0m"axpy"
	[0m.globl	[0mjulia_axpy_2894                 [90m# -- Begin function julia_axpy_2894[39m
	[0m.p2align	[33m4[39m[0m, [33m0x90[39m
	[0m.type	[0mjulia_axpy_2894[0m,[0m@function
[91mjulia_axpy_2894:[39m                        [90m# @julia_axpy_2894[39m
[90m; ┌ @ In[13]:1 within `axpy`[39m
	[0m.cfi_startproc
[90m# %bb.0:                                # %top[39m
[90m; │┌ @ float.jl:385 within `*`[39m
	[96m[1mvmulsd[22m[39m	[0m%xmm1[0m, [0m%xmm0[0m, [0m%xmm0
[90m; │└[39m
[90m; │┌ @ float.jl:383 within `+`[39m
	[96m[1mvaddsd[22m[39m	[0m%xmm2[0m, [0m%xmm0[0m, [0m%xmm0
[90m; │└[39m
	[96m[1mretq[22m[39m
[91m.Lfunc_end0:[39m
	[0m.size	[0mjulia_axpy_2894[0m, [0m.Lfunc_end0-julia_axpy_2894
	[0m.cfi_endproc
[90m; └[39m
                                        [90m# -- End function[39m
	[0m.section	[0m".note.GNU-stack"[0m,[0m""[0m,[0m@progbits


## Type stability



The performance of your code can suffer significantly when type instability occurs.  This happens when the return type of a function is ambiguous.  As a trivial example, consider calling the function `distance` with a vector of floating point values, and a vector for which each element is type `Any`.

In [4]:
distance(v1, v2) = sqrt((v1[1] - v2[1])^2 + (v1[2] - v2[2])^2)

distance (generic function with 1 method)

In [5]:
using BenchmarkTools

In [11]:
@btime distance(v1, v2) setup=(v1=rand(2); v2=rand(2));
@btime distance(v1, v2) setup=(v1=Any[rand(), rand()]; v2=Any[rand(), rand()]);

  1.884 ns (0 allocations: 0 bytes)
  74.452 ns (6 allocations: 96 bytes)


The difference in performance between the two computation is substantial.

Fortunately, the `@code_warntype` macro can be used to check for this.  It will print declarations that lead to type stability in red.

In [None]:
@code_warntype distance(rand(2), rand(2))

MethodInstance for distance(::Vector{Float64}, ::Vector{Float64})
  from distance(v1, v2) in Main at In[4]:1
Arguments
  #self#[36m::Core.Const(distance)[39m
  v1[36m::Vector{Float64}[39m
  v2[36m::Vector{Float64}[39m
Body[36m::Float64[39m
[90m1 ─[39m %1  = Base.getindex(v1, 1)[36m::Float64[39m
[90m│  [39m %2  = Base.getindex(v2, 1)[36m::Float64[39m
[90m│  [39m %3  = (%1 - %2)[36m::Float64[39m
[90m│  [39m %4  = Core.apply_type(Base.Val, 2)[36m::Core.Const(Val{2})[39m
[90m│  [39m %5  = (%4)()[36m::Core.Const(Val{2}())[39m
[90m│  [39m %6  = Base.literal_pow(Main.:^, %3, %5)[36m::Float64[39m
[90m│  [39m %7  = Base.getindex(v1, 2)[36m::Float64[39m
[90m│  [39m %8  = Base.getindex(v2, 2)[36m::Float64[39m
[90m│  [39m %9  = (%7 - %8)[36m::Float64[39m
[90m│  [39m %10 = Core.apply_type(Base.Val, 2)[36m::Core.Const(Val{2})[39m
[90m│  [39m %11 = (%10)()[36m::Core.Const(Val{2}())[39m
[90m│  [39m %12 = Base.literal_pow(Main.:^, %9, %11)[36m::Flo

In [12]:
@code_warntype distance(Any[rand(), rand()], Any[rand(), rand()])

MethodInstance for distance(::Vector{Any}, ::Vector{Any})
  from distance(v1, v2) in Main at In[4]:1
Arguments
  #self#[36m::Core.Const(distance)[39m
  v1[36m::Vector{Any}[39m
  v2[36m::Vector{Any}[39m
Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1  = Base.getindex(v1, 1)[91m[1m::Any[22m[39m
[90m│  [39m %2  = Base.getindex(v2, 1)[91m[1m::Any[22m[39m
[90m│  [39m %3  = (%1 - %2)[91m[1m::Any[22m[39m
[90m│  [39m %4  = Core.apply_type(Base.Val, 2)[36m::Core.Const(Val{2})[39m
[90m│  [39m %5  = (%4)()[36m::Core.Const(Val{2}())[39m
[90m│  [39m %6  = Base.literal_pow(Main.:^, %3, %5)[91m[1m::Any[22m[39m
[90m│  [39m %7  = Base.getindex(v1, 2)[91m[1m::Any[22m[39m
[90m│  [39m %8  = Base.getindex(v2, 2)[91m[1m::Any[22m[39m
[90m│  [39m %9  = (%7 - %8)[91m[1m::Any[22m[39m
[90m│  [39m %10 = Core.apply_type(Base.Val, 2)[36m::Core.Const(Val{2})[39m
[90m│  [39m %11 = (%10)()[36m::Core.Const(Val{2}())[39m
[90m│  [39m %12 = Base.literal_pow(Mai

The example above is of course contrived, a more realistic example is given below.  The function simply computes the sum of the elements of a vector.

In [10]:
sum_all(v) = begin
    total = 0
    for value in v
        total += value
    end
    return total
end

sum_all (generic function with 1 method)

When you call the function with a vector of floating point values, the type of the variable `total` can not be determined unambiguously.  It is initialized with `0`, an `Int64` value.  Depending on whether the length of the vector is non-zero, floating point values are added to it.  At the return statement, `total` may be eitehr `Int64` or `Float64`.  This is indicated by the `Union` type.

In [11]:
v = rand(1000);
@code_warntype sum_all(v)

MethodInstance for sum_all(::Vector{Float64})
  from sum_all(v) in Main at In[10]:1
Arguments
  #self#[36m::Core.Const(sum_all)[39m
  v[36m::Vector{Float64}[39m
Locals
  @_3[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  total[91m[1m::Union{Float64, Int64}[22m[39m
  value[36m::Float64[39m
Body[91m[1m::Union{Float64, Int64}[22m[39m
[90m1 ─[39m       (total = 0)
[90m│  [39m %2  = v[36m::Vector{Float64}[39m
[90m│  [39m       (@_3 = Base.iterate(%2))
[90m│  [39m %4  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %5  = Base.not_int(%4)[36m::Bool[39m
[90m└──[39m       goto #4 if not %5
[90m2 ┄[39m %7  = @_3[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (value = Core.getfield(%7, 1))
[90m│  [39m %9  = Core.getfield(%7, 2)[36m::Int64[39m
[90m│  [39m       (total = total + value)
[90m│  [39m       (@_3 = Base.iterate(%2, %9))
[90m│  [39m %12 = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %13 = Base.not_int(%12)[36m::Bool[39m
[

This can be easily remedied by making sure that `total` is initialized by a zero value of a type that corresponds to the elements of the vector `v`), the argument of the function.

In [1]:
sum_all_2(v) = begin
    total = zero(eltype(v))
    for value in v
        total += value
    end
    return total
end

sum_all_2 (generic function with 1 method)

When calling `sum_all_2` with a vector containing `Float64` values, `total` will be initialized to a `Float64` 0.0 and its type is unambiguous.

In [13]:
v = rand(1000);
@code_warntype sum_all_2(v)

MethodInstance for sum_all_2(::Vector{Float64})
  from sum_all_2(v) in Main at In[12]:1
Arguments
  #self#[36m::Core.Const(sum_all_2)[39m
  v[36m::Vector{Float64}[39m
Locals
  @_3[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  total[36m::Float64[39m
  value[36m::Float64[39m
Body[36m::Float64[39m
[90m1 ─[39m %1  = Main.eltype(v)[36m::Core.Const(Float64)[39m
[90m│  [39m       (total = Main.zero(%1))
[90m│  [39m %3  = v[36m::Vector{Float64}[39m
[90m│  [39m       (@_3 = Base.iterate(%3))
[90m│  [39m %5  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_3[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (value = Core.getfield(%8, 1))
[90m│  [39m %10 = Core.getfield(%8, 2)[36m::Int64[39m
[90m│  [39m       (total = total + value)
[90m│  [39m       (@_3 = Base.iterate(%3, %10))
[90m│  [39m %13 = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %14

Similarly, when the function `sum_all_2` is called with a vector of `Int32`, the code that is generated initializes `total` to an `Int32` 0 and again, its type is unambiguous.

In [2]:
v = rand(Int32, 1000);
@code_warntype sum_all_2(v)

MethodInstance for sum_all_2(::Vector{Int32})
  from sum_all_2(v) in Main at In[1]:1
Arguments
  #self#[36m::Core.Const(sum_all_2)[39m
  v[36m::Vector{Int32}[39m
Locals
  @_3[33m[1m::Union{Nothing, Tuple{Int32, Int64}}[22m[39m
  total[36m::Int32[39m
  value[36m::Int32[39m
Body[36m::Int32[39m
[90m1 ─[39m %1  = Main.eltype(v)[36m::Core.Const(Int32)[39m
[90m│  [39m       (total = Main.zero(%1))
[90m│  [39m %3  = v[36m::Vector{Int32}[39m
[90m│  [39m       (@_3 = Base.iterate(%3))
[90m│  [39m %5  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_3[36m::Tuple{Int32, Int64}[39m
[90m│  [39m       (value = Core.getfield(%8, 1))
[90m│  [39m %10 = Core.getfield(%8, 2)[36m::Int64[39m
[90m│  [39m       (total = total + value)
[90m│  [39m       (@_3 = Base.iterate(%3, %10))
[90m│  [39m %13 = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %14 = Base.not_int(%13

Some care is required to ensure type stability, so the JAOT compilation can generate efficient code.