In [2]:
using Pkg;
Pkg.activate(".")
Pkg.add("BenchmarkTools")
Pkg.add("DataFrames")

using BenchmarkTools
using DataFrames

[32m[1m  Activating[22m[39m project at `~/Documents/GitHub/Phys215-202223-1/04-Fast-Calls`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Documents/GitHub/Phys215-202223-1/04-Fast-Calls/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Documents/GitHub/Phys215-202223-1/04-Fast-Calls/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Documents/GitHub/Phys215-202223-1/04-Fast-Calls/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Documents/GitHub/Phys215-202223-1/04-Fast-Calls/Manifest.toml`


# Session 4: Fast function calls

Since function is the basic procedural structure in Julia, one needs to understand the behavior of different ways of implementing the same function.
Implementation / coding style can drastically change the performance of Julia code.

- Using globals
- Inlining
- Constant propagation
- Macros
- Generated functions (optional)
- Named parameters (optional)

## Session 4 OKR

**OBJECTIVE**: Compare benchmark times of different implementation of functions that can be expressed as a recursion relation.
 - [ ] **KR1:** Benchmarked at least two(2) different implementation of the same function or process (e.g. raising each element of an array to some power `p`, random array may be used) that utilizes some parameter that can be considered a constant or declared globally.
 Typical methods: (1) Global variable, (2) Constant global variable, and (3) Named parameter variable.
 - [ ] **KR2:** Replicated the naive implementation of the polynomial in the textbook.
 - [ ] **KR3:** Replicated the naive implementation of the Horner's method for the same polynomial.
 - [ ] **KR4:** Replicated the macro implementation of the Horner's method of the same polynomial.
 - [ ] **KR5:** Table showing how many _minutes_ will the function evaluations in both KR3 and KR4 be reduced if KR2 requires 24hours of runtime.

# Global variables

Function call is affected by the presence of global variables in them.
The best option is to use a constant.

In [3]:
p = 2;

function raisetop(x::Vector)
    s = zero(eltype(x));
    for xel in x
        s = s + xel^p
    end
    return s
end

raisetop (generic function with 1 method)

In [4]:
data = rand(100_000);

In [5]:
mark0 = @benchmark raisetop($data)

BenchmarkTools.Trial: 854 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m5.606 ms[22m[39m … [35m  7.402 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 17.73%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m5.723 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m5.851 ms[22m[39m ± [32m364.007 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.80% ±  4.66%

  [39m [39m [39m [39m▆[39m█[34m▅[39m[39m▁[39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m▄[39m▇[39m█[39m█[34m█[3

In [6]:
@code_warntype raisetop(data)

MethodInstance for raisetop(::Vector{Float64})
  from raisetop(x::Vector) in Main at In[3]:3
Arguments
  #self#[36m::Core.Const(raisetop)[39m
  x[36m::Vector{Float64}[39m
Locals
  @_3[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  s[91m[1m::Any[22m[39m
  xel[36m::Float64[39m
Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1  = Main.eltype(x)[36m::Core.Const(Float64)[39m
[90m│  [39m       (s = Main.zero(%1))
[90m│  [39m %3  = x[36m::Vector{Float64}[39m
[90m│  [39m       (@_3 = Base.iterate(%3))
[90m│  [39m %5  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_3[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (xel = Core.getfield(%8, 1))
[90m│  [39m %10 = Core.getfield(%8, 2)[36m::Int64[39m
[90m│  [39m %11 = s[91m[1m::Any[22m[39m
[90m│  [39m %12 = (xel ^ Main.p)[91m[1m::Any[22m[39m
[90m│  [39m       (s = %11 + %12)
[90m│  [39m       (@

In [7]:
const pconst = 2;

function raisetop_const(x::Vector)
    s = zero(eltype(x));
    for xel in x
        s = s + xel^pconst # <<the only difference!
    end
    return s
end

raisetop_const (generic function with 1 method)

In [8]:
mark1 = @benchmark raisetop_const($data)

BenchmarkTools.Trial: 8266 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m584.707 μs[22m[39m … [35m 1.066 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m585.081 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m600.909 μs[22m[39m ± [32m45.292 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▅[39m▂[39m▁[32m▂[39m[39m [39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[39m█[39m█[

In [9]:
speedup1 = median(mark0.times) / median(mark1.times);
table = DataFrame("Method"=>["Global","Constant"],"Speedup" => [1.0, speedup1]);

print(table)

[1m2×2 DataFrame[0m
[1m Row [0m│[1m Method   [0m[1m Speedup [0m
[1m     [0m│[90m String   [0m[90m Float64 [0m
─────┼───────────────────
   1 │ Global    1.0
   2 │ Constant  9.78071

In [10]:
@code_warntype raisetop_const(data)

MethodInstance for raisetop_const(::Vector{Float64})
  from raisetop_const(x::Vector) in Main at In[7]:3
Arguments
  #self#[36m::Core.Const(raisetop_const)[39m
  x[36m::Vector{Float64}[39m
Locals
  @_3[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  s[36m::Float64[39m
  xel[36m::Float64[39m
Body[36m::Float64[39m
[90m1 ─[39m %1  = Main.eltype(x)[36m::Core.Const(Float64)[39m
[90m│  [39m       (s = Main.zero(%1))
[90m│  [39m %3  = x[36m::Vector{Float64}[39m
[90m│  [39m       (@_3 = Base.iterate(%3))
[90m│  [39m %5  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_3[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (xel = Core.getfield(%8, 1))
[90m│  [39m %10 = Core.getfield(%8, 2)[36m::Int64[39m
[90m│  [39m %11 = s[36m::Float64[39m
[90m│  [39m %12 = (xel ^ Main.pconst)[36m::Float64[39m
[90m│  [39m       (s = %11 + %12)
[90m│  [39m      

**Global variables are of type `Any` in current Julia version**

In [11]:
const pInt::Int64 = 2 #forced type, available in Julia 1.8 onwards

function raisetop_Int64(x::Vector)
    s = zero(eltype(x));
    for xel in x
        s = s + xel^pconst # <<the only difference!
    end
    return s
end

raisetop_Int64 (generic function with 1 method)

In [12]:
@code_warntype raisetop_Int64(data)

MethodInstance for raisetop_Int64(::Vector{Float64})
  from raisetop_Int64(x::Vector) in Main at In[11]:3
Arguments
  #self#[36m::Core.Const(raisetop_Int64)[39m
  x[36m::Vector{Float64}[39m
Locals
  @_3[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  s[36m::Float64[39m
  xel[36m::Float64[39m
Body[36m::Float64[39m
[90m1 ─[39m %1  = Main.eltype(x)[36m::Core.Const(Float64)[39m
[90m│  [39m       (s = Main.zero(%1))
[90m│  [39m %3  = x[36m::Vector{Float64}[39m
[90m│  [39m       (@_3 = Base.iterate(%3))
[90m│  [39m %5  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_3[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (xel = Core.getfield(%8, 1))
[90m│  [39m %10 = Core.getfield(%8, 2)[36m::Int64[39m
[90m│  [39m %11 = s[36m::Float64[39m
[90m│  [39m %12 = (xel ^ Main.pconst)[36m::Float64[39m
[90m│  [39m       (s = %11 + %12)
[90m│  [39m     

In [13]:
mark1a = @benchmark raisetop_Int64($data)

BenchmarkTools.Trial: 8249 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m584.694 μs[22m[39m … [35m 1.102 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m585.313 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m601.814 μs[22m[39m ± [32m41.651 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▅[39m▃[39m▂[39m▁[32m▁[39m[39m▁[39m▃[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[39m█[39m█[

In [14]:
speedup1a = median(mark0.times) / median(mark1a.times)
push!(table, ["ConstTyped", speedup1a]);

print(table)

[1m3×2 DataFrame[0m
[1m Row [0m│[1m Method     [0m[1m Speedup [0m
[1m     [0m│[90m String     [0m[90m Float64 [0m
─────┼─────────────────────
   1 │ Global      1.0
   2 │ Constant    9.78071
   3 │ ConstTyped  9.77683

In [15]:
function raisetop_param(x::Vector; pow=2)
    s = zero(eltype(x));
    for xel in x
        s = s + xel^pow # <<the only difference!
    end
    return s    
end

raisetop_param (generic function with 1 method)

In [16]:
mark2 = @benchmark raisetop_param($data, pow=p)

BenchmarkTools.Trial: 9748 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m503.014 μs[22m[39m … [35m843.231 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m508.281 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m510.265 μs[22m[39m ± [32m 13.298 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▇[39m [39m▃[39m█[34m▇[39m[32m▅[39m[39m▅[39m▃[39m▃[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[39m█[39

In [17]:
mark2a = @benchmark raisetop_param($data, pow=2)

BenchmarkTools.Trial: 8446 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m584.690 μs[22m[39m … [35m995.246 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m584.812 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m589.314 μs[22m[39m ± [32m 19.534 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▃[39m▅[39m▂[32m [39m[39m▅[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[39m

In [18]:
mark2b = @benchmark raisetop_param($data, pow=pconst)

BenchmarkTools.Trial: 8430 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m584.689 μs[22m[39m … [35m 1.065 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m584.807 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m590.417 μs[22m[39m ± [32m27.836 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▅[32m▄[39m[39m▁[39m [39m [39m [39m [39m [39m [39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[32m█[39m[3

In [19]:
speedup2 = median(mark0.times) / median(mark2.times)
speedup2a = median(mark0.times) / median(mark2a.times)
speedup2b = median(mark0.times) / median(mark2b.times)

push!(table, ["Parametrized", speedup2]);
push!(table, ["Parametrized const.exp", speedup2a]);
push!(table, ["Parametrized const.imp", speedup2b]);

print(table)

[1m6×2 DataFrame[0m
[1m Row [0m│[1m Method                 [0m[1m Speedup  [0m
[1m     [0m│[90m String                 [0m[90m Float64  [0m
─────┼──────────────────────────────────
   1 │ Global                   1.0
   2 │ Constant                 9.78071
   3 │ ConstTyped               9.77683
   4 │ Parametrized            11.2585
   5 │ Parametrized const.exp   9.78521
   6 │ Parametrized const.imp   9.78529

In [20]:
@code_warntype raisetop_param(data; pow=pconst)

MethodInstance for (::var"#raisetop_param##kw")(::NamedTuple{(:pow,), Tuple{Int64}}, ::typeof(raisetop_param), ::Vector{Float64})
  from (::var"#raisetop_param##kw")(::Any, ::typeof(raisetop_param), x::Vector) in Main at In[15]:1
Arguments
  _[36m::Core.Const(var"#raisetop_param##kw"())[39m
  @_2[36m::NamedTuple{(:pow,), Tuple{Int64}}[39m
  @_3[36m::Core.Const(raisetop_param)[39m
  x[36m::Vector{Float64}[39m
Locals
  pow[36m::Int64[39m
  @_6[36m::Int64[39m
Body[36m::Float64[39m
[90m1 ─[39m %1  = Base.haskey(@_2, :pow)[36m::Core.Const(true)[39m
[90m│  [39m       Core.typeassert(%1, Core.Bool)
[90m│  [39m       (@_6 = Base.getindex(@_2, :pow))
[90m└──[39m       goto #3
[90m2 ─[39m       Core.Const(:(@_6 = 2))
[90m3 ┄[39m %6  = @_6[36m::Int64[39m
[90m│  [39m       (pow = %6)
[90m│  [39m %8  = (:pow,)[36m::Core.Const((:pow,))[39m
[90m│  [39m %9  = Core.apply_type(Core.NamedTuple, %8)[36m::Core.Const(NamedTuple{(:pow,)})[39m
[90m│  [39m %10 = Base.s

## Using `Base.map()`

In [21]:
function raisetop_map(data; pow=pconst)
    raised = zeros(size(data))
    map!(x->x^pow, raised, data)
    return sum(raised)
end

raisetop_map (generic function with 1 method)

In [22]:
mark3 = @benchmark raisetop_map($data, pow=pconst)

BenchmarkTools.Trial: 4870 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m601.702 μs[22m[39m … [35m  6.707 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 63.26%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m  1.046 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m  1.019 ms[22m[39m ± [32m444.079 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m4.57% ±  9.05%

  [39m▇[39m▃[39m▂[39m [39m [39m [39m [32m█[39m[34m█[39m[39m▅[39m▄[39m▃[39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[39m█[39m█[

In [23]:
speedup3 = median(mark0.times) / median(mark3.times)

push!(table, ["Base.mapped!", speedup3]);

print(table)

[1m7×2 DataFrame[0m
[1m Row [0m│[1m Method                 [0m[1m Speedup  [0m
[1m     [0m│[90m String                 [0m[90m Float64  [0m
─────┼──────────────────────────────────
   1 │ Global                   1.0
   2 │ Constant                 9.78071
   3 │ ConstTyped               9.77683
   4 │ Parametrized            11.2585
   5 │ Parametrized const.exp   9.78521
   6 │ Parametrized const.imp   9.78529
   7 │ Base.mapped!             5.4683

# Inlining

Used in generating the LLVM Intermediate Representation (IR) that can reduce expressions *at* compile time to reduce function calls.

In [24]:
function f(x)
    a = 5x
    b = a + 2
end

g(x) = f(4x)

g (generic function with 1 method)

In [25]:
@code_typed g(3)

CodeInfo(
[90m1 ─[39m %1 = Base.mul_int(4, x)[36m::Int64[39m
[90m│  [39m %2 = Base.mul_int(5, %1)[36m::Int64[39m
[90m│  [39m %3 = Base.add_int(%2, 2)[36m::Int64[39m
[90m└──[39m      return %3
) => Int64

In [26]:
@code_llvm g(3)

[90m;  @ In[24]:6 within `g`[39m
[95mdefine[39m [36mi64[39m [93m@julia_g_4497[39m[33m([39m[36mi64[39m [95msignext[39m [0m%0[33m)[39m [0m#0 [33m{[39m
[91mtop:[39m
[90m; ┌ @ In[24]:2 within `f`[39m
[90m; │┌ @ int.jl:88 within `*`[39m
    [0m%1 [0m= [96m[1mmul[22m[39m [36mi64[39m [0m%0[0m, [33m20[39m
[90m; │└[39m
[90m; │ @ In[24]:3 within `f`[39m
[90m; │┌ @ int.jl:87 within `+`[39m
    [0m%2 [0m= [96m[1mor[22m[39m [36mi64[39m [0m%1[0m, [33m2[39m
[90m; └└[39m
  [96m[1mret[22m[39m [36mi64[39m [0m%2
[33m}[39m


## Inlining can be controlled

The automatic inlining may be based on the number of lines or operations that your code does.

Since the advantage of inline is related to constants that may be involved, it is best to _force_ Julia to do inlining using the macro `@inline`.
The inline seems to be a default in the later versions of Julia.

Inlining can be disabled in Julia using the `@noinline` macro.

In [27]:
@noinline function f_ni(x)
    a = x*5
    b = a + 3
end

g_ni(x) = f_ni(2*x)

g_ni (generic function with 1 method)

In [28]:
@code_typed g_ni(3)

CodeInfo(
[90m1 ─[39m %1 = Base.mul_int(2, x)[36m::Int64[39m
[90m│  [39m %2 = invoke Main.f_ni(%1::Int64)[36m::Int64[39m
[90m└──[39m      return %2
) => Int64

In [29]:
@code_llvm g_ni(3)

[90m;  @ In[27]:6 within `g_ni`[39m
[95mdefine[39m [36mi64[39m [93m@julia_g_ni_4549[39m[33m([39m[36mi64[39m [95msignext[39m [0m%0[33m)[39m [0m#0 [33m{[39m
[91mtop:[39m
[90m; ┌ @ int.jl:88 within `*`[39m
   [0m%1 [0m= [96m[1mshl[22m[39m [36mi64[39m [0m%0[0m, [33m1[39m
[90m; └[39m
  [0m%2 [0m= [96m[1mcall[22m[39m [36mi64[39m [93m@j_f_ni_4551[39m[33m([39m[36mi64[39m [95msignext[39m [0m%1[33m)[39m [0m#0
  [96m[1mret[22m[39m [36mi64[39m [0m%2
[33m}[39m


The main difference is the `invoke` that translates to a `call` step instead of doing the operation almost _symbolically_ before encoding.

However, the advantage of inlining depends on the complexity of the function and of the call overhead.

In [54]:
mark0 = @benchmark for _ in 1:10_000_000 g_ni(3) end

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.698 ns[22m[39m … [35m64.792 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.710 ns              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m1.722 ns[22m[39m ± [32m 0.635 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m▃[39m [39m▅[39m [39m▅[39m█[39m [39m▅[39m [39m▄[39m▃[34m [39m[39m [39m [39m▂[39m [39m [39m [39m [39m [39m [39m [39m▄[39m [39m [39m▂[39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▂[39m▁[39m▂[39m▅[39m▁[39m▆[

In [55]:
mark1 = @benchmark for _ in 1:10_000_000 g(3) end

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.700 ns[22m[39m … [35m8.283 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.714 ns             [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m1.722 ns[22m[39m ± [32m0.199 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▃[39m [39m▃[39m [39m▄[39m▆[39m [39m▃[39m [39m█[39m▁[39m [34m▄[39m[39m [39m▁[39m [39m [39m▃[39m [39m [39m▂[39m [39m [39m [39m [32m▂[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▂[39m▁[39m▃[39m▁[39m▄[39m▅[39m▁[39m

# Propagation of constant value

Sometimes, constants are evaluated at compile time (JIT) resulting to more efficient execution at runtime.
> When a function is called with an argument that is known at compile time, the invocation can happen once at compile time, and the call is then replaced with a constant value at runtime.

Such constant propagation happens even across functions resulting further optimization.

Consider the two functions that follow.

In [33]:
sqr(x) = x*x
sqr2() = sqr(2)

sqrlog(x) = log(x*x)
sqrlog2() = sqrlog(2.0)

sqrlog2 (generic function with 1 method)

In [34]:
@code_typed sqr2()

CodeInfo(
[90m1 ─[39m     return 4
) => Int64

In [35]:
sqr2() === 4

true

In [36]:
@code_llvm sqr2()

[90m;  @ In[33]:2 within `sqr2`[39m
[95mdefine[39m [36mi64[39m [93m@julia_sqr2_4704[39m[33m([39m[33m)[39m [0m#0 [33m{[39m
[91mtop:[39m
  [96m[1mret[22m[39m [36mi64[39m [33m4[39m
[33m}[39m


In [37]:
@code_typed sqrlog2()

CodeInfo(
[90m1 ─[39m     return 1.3862943611198906
) => Float64

The result is truly a constant being equivalent to the actual memory value.

In [38]:
@code_llvm sqrlog2()

[90m;  @ In[33]:5 within `sqrlog2`[39m
[95mdefine[39m [36mdouble[39m [93m@julia_sqrlog2_4710[39m[33m([39m[33m)[39m [0m#0 [33m{[39m
[91mtop:[39m
  [96m[1mret[22m[39m [36mdouble[39m [33m0x3FF62E42FEFA39EF[39m
[33m}[39m


In [39]:
sqrlog2() === log(4.0)

true

## Notes

> Constant propagation is likely to be applied only to functions that are pure – in other words, functions that do not have side effects. This means that these functions not only do not modify or mutate any of its arguments, but they also do not change any global state.

# Julia macro

Macros are used to make a sequence of computing instructions available to the programmer as a single program statement, making the programming task less tedious and less error-prone.
(Thus, they are called "macros" because a "big" block of code can be expanded from a "small" sequence of characters.)[^1]

In general, a macro is a way to generate the code before compilation.
> Macros are usually used as a means to reduce repetitive code, whereby large volumes of code with a common pattern can be generated from a smaller set of primitives. [textbook]

[^1]:https://en.wikipedia.org/wiki/Macro_(computer_science)

# Polynomial evaluation

Special functions have already been implemented in several packages.
However, it is interesting to know that often a more efficient implementation can be obtained via macros.

Consider a simple polynomial evaluation via the naive expression.
$$
p(x) = \sum_{i=0}^{n} a_i x^i = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n
$$

A simplistic implementation is then as follows.

In [40]:
function poly_naive(x, a...) #uses splat operator `...`
    p = zero(x)
    for i in eachindex(a)
        p = p + a[i] * x^(i-1)
    end
    return p
end

poly_naive (generic function with 1 method)

In [41]:
f_naive(x) = poly_naive(x, 1,2,3,4,5,6,7,8,9)

f_naive (generic function with 1 method)

In [42]:
x = 3.5

3.5

In [43]:
mark0 = @benchmark f_naive($x)

BenchmarkTools.Trial: 10000 samples with 990 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m43.345 ns[22m[39m … [35m155.839 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m44.796 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m45.245 ns[22m[39m ± [32m  3.663 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [34m█[39m[39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m▃[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m█[39m▇[39m▅[3

## Horner's method for polynomial evaluation

This method uses the following recursion relation
$$\begin{eqnarray}
b_n &=& a_n \\
b_{n-1} &=& a_{n-1} + b_n x \\
b_{n-2} &=& a_{n-2} + b_{n-1} x \\
&\vdots& \\
b_0 &=& a_0 + b_1 x
\end{eqnarray}$$
such that $p(x) = b_0$.

In [44]:
function poly_h(x, a...)
    b = zero(x)
    for i in reverse(eachindex(a))
        b = a[i] + b*x
    end
    return b
end

poly_h (generic function with 1 method)

In [45]:
f_h(x) = poly_h(x, 1,2,3,4,5,6,7,8,9)

f_h (generic function with 1 method)

In [46]:
mark1 = @benchmark f_h($x)

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m4.886 ns[22m[39m … [35m65.340 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m5.018 ns              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m5.105 ns[22m[39m ± [32m 1.522 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m▆[34m█[39m[39m█[39m▄[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▂[39m▂[39m▃[39m▅[39m▅[39m▅[

In [47]:
speedup1 = median(mark0.times) / median(mark1.times)

table = DataFrame("Method"=>["Naive","Horner"],"Speedup" => [1.0, speedup1]);

print(table)

[1m2×2 DataFrame[0m
[1m Row [0m│[1m Method [0m[1m Speedup [0m
[1m     [0m│[90m String [0m[90m Float64 [0m
─────┼─────────────────
   1 │ Naive   1.0
   2 │ Horner  8.92705

## Using macro with Horner's method

This requires the recognition that there's an existing function called `muladd()` and exploit this for generating the expanded code for the Horner's method.
With `muladd()` we have the following recursion.
```
b = a[n]
b = muladd( x, b, a[n-1] ) = muladd( x, a[n], a[n-1] )
b = muladd( x, b, a[n-2] ) = muladd( x, muladd( x, a[n], a[n-1] ), a[n-2] )
b = muladd( x, b, a[n-3] ) = muladd( x, muladd( x, muladd( x, a[n], a[n-1] ), a[n-2] ), a[n-3] )
...
b = = muladd( x, ..., muladd( x, muladd( x, a[n], a[n-1] ), a[n-2] ), a[n-3], ..., a[1] )
```

In [48]:
macro horner(x, p...)
    ex = esc(p[end])
    for i in length(p)-1:-1:1
        ex = :(muladd(t,$(ex), $(esc(p[i]))))
    end
    Expr(:block, :(t=$(esc(x))), ex)
end

@horner (macro with 1 method)

In [49]:
f_h_macro(x) = @horner(x, 1,2,3,4,5,6,7,8,9)

f_h_macro (generic function with 1 method)

In [50]:
mark2 = @benchmark f_h_macro($x)

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m0.041 ns[22m[39m … [35m0.125 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m0.046 ns             [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m0.049 ns[22m[39m ± [32m0.005 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m█[39m [39m [39m▃[34m [39m[39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▁[39m▁[39m▂[39m▁[39m▁[39m▄[39m▁[39m

In [51]:
speedup2 = median(mark0.times) / median(mark2.times)

push!(table, ["Macro", speedup2]);

print(table)

[1m3×2 DataFrame[0m
[1m Row [0m│[1m Method [0m[1m Speedup   [0m
[1m     [0m│[90m String [0m[90m Float64   [0m
─────┼───────────────────
   1 │ Naive     1.0
   2 │ Horner    8.92705
   3 │ Macro   973.825

In [52]:
for r in eachrow(table)
    println("$(24*60/r.Speedup) mins for $(r.Method) method.")
end

1440.0 mins for Naive method.
161.3074050689997 mins for Horner method.
1.4787047893929828 mins for Macro method.


In [53]:
transform!(table,:Speedup=>ByRow(x->24*60/x)=>:"Time(mins)")
print(table)

[1m3×3 DataFrame[0m
[1m Row [0m│[1m Method [0m[1m Speedup   [0m[1m Time(mins) [0m
[1m     [0m│[90m String [0m[90m Float64   [0m[90m Float64    [0m
─────┼───────────────────────────────
   1 │ Naive     1.0       1440.0
   2 │ Horner    8.92705    161.307
   3 │ Macro   973.825        1.4787

# Fin. [Back](./).