# Session 4: Fast function calls

**OBJECTIVE: Compare benchmark times of different implementation of functions that can be expressed as a recursion relation.**

\begin{itemize}
\item KR1: Benchmarked at least two(2) different implementation of the same function or process (e.g. raising each element of an array to some power `p`, random array may be used) that utilizes some parameter that can be considered a constant or declared globally. Typical methods: (1) Global variable, (2) Constant global variable, and (3) Named parameter variable.
\item KR2: Replicated the naive implementation of the polynomial in the textbook.
\item KR3: Replicated the naive implementation of the Horner’s method for the same polynomial.
\item KR4: Replicated the macro implementation of the Horner’s method of the same polynomial.
\item KR5: Table showing how many minutes will the function evaluations in both KR3 and KR4 be reduced if KR2 requires 24hours of runtime.
\end{itemize}

## KR1

**Benchmarked at least two(2) different implementation of the same function or process (e.g. raising each element of an array to some power `p`, random array may be used) that utilizes some parameter that can be considered a constant or declared globally. Typical methods: (1) Global variable, (2) Constant global variable, and (3) Named parameter variable.**

In [14]:
using BenchmarkTools
using DataFrames

┌ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]
└ @ Base loading.jl:1342


Here we attempt to see how globals affect the performance in Julia. Benchmarking tools shall be employed to assess these differences. First, shown below is the working function that we'll use.

In [57]:
p = 2;

function raisetop(x::Vector)
    s = zero(eltype(x));
    for y in x
        s = s + y^p
    end
    return s
end

raisetop (generic function with 1 method)

In [58]:
data = rand(100_000);

In [59]:
mark0 = @benchmark raisetop($data)

BenchmarkTools.Trial: 857 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m4.129 ms[22m[39m … [35m12.051 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m5.490 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m5.809 ms[22m[39m ± [32m 1.301 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m2.14% ± 5.51%

  [39m [39m▃[39m█[39m█[39m▅[39m▄[39m▂[39m▄[39m▅[39m▄[39m▂[39m [39m▁[39m▂[39m [39m▁[34m [39m[39m▁[39m [32m▂[39m[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▇[39m█[39m█[39m█[39m█[39m█[39m█[39m█[

As we can see, each execution takes about 5 milliseconds. It might seem fast but for a very basic function, the effects will be very noticeable once we start computing for millions of operations.

In [60]:
@code_warntype raisetop(data)

Variables
  #self#[36m::Core.Const(raisetop)[39m
  x[36m::Vector{Float64}[39m
  @_3[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  s[91m[1m::Any[22m[39m
  y[36m::Float64[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1  = Main.eltype(x)[36m::Core.Const(Float64)[39m
[90m│  [39m       (s = Main.zero(%1))
[90m│  [39m %3  = x[36m::Vector{Float64}[39m
[90m│  [39m       (@_3 = Base.iterate(%3))
[90m│  [39m %5  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_3::Tuple{Float64, Int64}[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (y = Core.getfield(%8, 1))
[90m│  [39m %10 = Core.getfield(%8, 2)[36m::Int64[39m
[90m│  [39m %11 = s[91m[1m::Any[22m[39m
[90m│  [39m %12 = (y ^ Main.p)[91m[1m::Any[22m[39m
[90m│  [39m       (s = %11 + %12)
[90m│  [39m       (@_3 = Base.iterate(%3, %10))
[90m│  [39m %15 = (@_3 === nothing)[36m::Bool[39m


Looking closer, we can see that there is type-inconsistency in the process which mainly contributes to the uneccesary excess in runtime. We have shown in the previous exercise how type-inconsistency is generally not favored in terms of performance. The presence of `ANY` implies the  lack of type inference across the entire function.

One way to fix this is by declaring a global variable before anythings else as shown in the first line below. Note that the `const` declataration in Julia means we can change its value but not the type.

In [75]:
const p2 = 2;

function raisetop_const(x::Vector)
    s = zero(eltype(x));
    for y in x
        s = s + y^p2 # <<the only difference!
    end
    return s
end

raisetop_const (generic function with 1 method)

In [76]:
mark1 = @benchmark raisetop_const($data)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 97.600 μs[22m[39m … [35m 16.174 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m102.800 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m125.652 μs[22m[39m ± [32m214.624 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m█[34m▆[39m[39m▅[39m▄[39m▃[32m▃[39m[39m▂[39m▂[39m▁[39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[34m█[39m[39

In [77]:
speedup1 = median(mark0.times) / median(mark1.times);
table = DataFrame("Method"=>["Global","Constant"],"Speedup" => [1.0, speedup1]);

print(table)

[1m2×2 DataFrame[0m
[1m Row [0m│[1m Method   [0m[1m Speedup [0m
[1m     [0m│[90m String   [0m[90m Float64 [0m
─────┼───────────────────
   1 │ Global     1.0
   2 │ Constant  53.4076

With one little tweak, the latter implementation of the process was around 56x faster.

In [78]:
@code_warntype raisetop_const(data)

Variables
  #self#[36m::Core.Const(raisetop_const)[39m
  x[36m::Vector{Float64}[39m
  @_3[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  s[36m::Float64[39m
  y[36m::Float64[39m

Body[36m::Float64[39m
[90m1 ─[39m %1  = Main.eltype(x)[36m::Core.Const(Float64)[39m
[90m│  [39m       (s = Main.zero(%1))
[90m│  [39m %3  = x[36m::Vector{Float64}[39m
[90m│  [39m       (@_3 = Base.iterate(%3))
[90m│  [39m %5  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_3::Tuple{Float64, Int64}[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (y = Core.getfield(%8, 1))
[90m│  [39m %10 = Core.getfield(%8, 2)[36m::Int64[39m
[90m│  [39m %11 = s[36m::Float64[39m
[90m│  [39m %12 = (y ^ Main.p2)[36m::Float64[39m
[90m│  [39m       (s = %11 + %12)
[90m│  [39m       (@_3 = Base.iterate(%3, %10))
[90m│  [39m %15 = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m

As we can see, the last line shows us `return s` where `s::Float64`, implying that the function demonstrates type-inference.

Lastly, we can turn the variable as a function argument, denoted as `pow=2` below. This enables us to do away with using global variables in the first place.

In [83]:
function raisetop_param(x::Vector; pow=2)
    s = zero(eltype(x));
    for y in x
        s = s + y^pow # <<the only difference!
    end
    return s    
end

raisetop_param (generic function with 1 method)

In [84]:
mark2 = @benchmark raisetop_param($data, pow=p)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 98.300 μs[22m[39m … [35m337.400 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m 98.500 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m107.805 μs[22m[39m ± [32m 20.188 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▃[39m▂[39m [39m [39m [39m [32m [39m[39m [39m▂[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▁[39m [39m [39m [39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[39

In [85]:
mark2a = @benchmark raisetop_param($data, pow=2)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 97.600 μs[22m[39m … [35m445.300 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m 98.000 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m112.182 μs[22m[39m ± [32m 25.377 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▄[39m▂[39m [39m [39m [39m [39m▃[32m▃[39m[39m▂[39m▁[39m▁[39m▁[39m▁[39m [39m▂[39m▂[39m▂[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[39

In [86]:
mark2b = @benchmark raisetop_param($data, pow=p2)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 97.600 μs[22m[39m … [35m 2.325 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m 97.700 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m104.550 μs[22m[39m ± [32m30.374 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▃[39m▂[39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[39m█[39m█

In [89]:
speedup2 = median(mark0.times) / median(mark2.times)
speedup2a = median(mark0.times) / median(mark2a.times)
speedup2b = median(mark0.times) / median(mark2b.times)

push!(table, ["Parametrized", speedup2]);
push!(table, ["Parametrized const.exp", speedup2a]);
push!(table, ["Parametrized const.imp", speedup2b]);

print(table)

[1m11×2 DataFrame[0m
[1m Row [0m│[1m Method                 [0m[1m Speedup [0m
[1m     [0m│[90m String                 [0m[90m Float64 [0m
─────┼─────────────────────────────────
   1 │ Global                   1.0
   2 │ Constant                53.4076
   3 │ Parametrized            55.7391
   4 │ Parametrized const.exp  56.0235
   5 │ Parametrized const.imp  56.1955
   6 │ Parametrized            55.7391
   7 │ Parametrized const.exp  56.0235
   8 │ Parametrized const.imp  56.1955
   9 │ Parametrized            55.7391
  10 │ Parametrized const.exp  56.0235
  11 │ Parametrized const.imp  56.1955

The table summarizes the runtimes fare if there is a (1) global variable, (2) if we decalare the global variable as a constant, and lastly, (3) if they are parametrized. As we can see, avoiding type-incosistency has increases the runtimes by about 50x.

In [90]:
@code_warntype raisetop_param(data; pow=p2)

Variables
  #unused#[36m::Core.Const(var"#raisetop_param##kw"())[39m
  @_2[36m::NamedTuple{(:pow,), Tuple{Int64}}[39m
  @_3[36m::Core.Const(raisetop_param)[39m
  x[36m::Vector{Float64}[39m
  pow[36m::Int64[39m
  @_6[36m::Int64[39m

Body[36m::Float64[39m
[90m1 ─[39m %1  = Base.haskey(@_2, :pow)[36m::Core.Const(true)[39m
[90m│  [39m       %1
[90m│  [39m       (@_6 = Base.getindex(@_2, :pow))
[90m└──[39m       goto #3
[90m2 ─[39m       Core.Const(:(@_6 = 2))
[90m3 ┄[39m %6  = @_6[36m::Int64[39m
[90m│  [39m       (pow = %6)
[90m│  [39m %8  = (:pow,)[36m::Core.Const((:pow,))[39m
[90m│  [39m %9  = Core.apply_type(Core.NamedTuple, %8)[36m::Core.Const(NamedTuple{(:pow,), T} where T<:Tuple)[39m
[90m│  [39m %10 = Base.structdiff(@_2, %9)[36m::Core.Const(NamedTuple())[39m
[90m│  [39m %11 = Base.pairs(%10)[36m::Core.Const(Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}())[39m
[90m│  [39m %12 = Base.isempty(%11)[36m::Core.C

In [91]:
function raisetop_map(data; pow=pconst)
    raised = zeros(size(data))
    map!(x->x^pow, raised, data)
    return sum(raised)
end

raisetop_map (generic function with 1 method)

In [92]:
mark3 = @benchmark raisetop_map($data, pow=pconst)

BenchmarkTools.Trial: 9625 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m141.100 μs[22m[39m … [35m  5.357 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 91.66%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m445.400 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m511.417 μs[22m[39m ± [32m435.871 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m9.41% ± 10.88%

  [39m▂[39m [39m [39m [39m [39m█[34m▅[39m[39m▂[32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m▅[39m▄[

In [93]:
speedup3 = median(mark0.times) / median(mark3.times)

push!(table, ["Base.mapped!", speedup3]);

print(table)

[1m12×2 DataFrame[0m
[1m Row [0m│[1m Method                 [0m[1m Speedup [0m
[1m     [0m│[90m String                 [0m[90m Float64 [0m
─────┼─────────────────────────────────
   1 │ Global                   1.0
   2 │ Constant                53.4076
   3 │ Parametrized            55.7391
   4 │ Parametrized const.exp  56.0235
   5 │ Parametrized const.imp  56.1955
   6 │ Parametrized            55.7391
   7 │ Parametrized const.exp  56.0235
   8 │ Parametrized const.imp  56.1955
   9 │ Parametrized            55.7391
  10 │ Parametrized const.exp  56.0235
  11 │ Parametrized const.imp  56.1955
  12 │ Base.mapped!            12.3267

## KR2

**Replicated the naive implementation of the polynomial in the textbook.**

$p(x) = \sum_{i-0}^n a_ix^i = a_0 + a_1x + a_2x^2 + \cdots + a_nx^n$

In [95]:
function poly_naive(x, a...) #uses splat operator `...`
    p = zero(x) #for type stability
    for i in eachindex(a)
        p = p + a[i] * x^(i-1)
    end
    return p
end

poly_naive (generic function with 1 method)

In [96]:
f_naive(x) = poly_naive(x, 1,2,3,4,5,6,7,8,9)

f_naive (generic function with 1 method)

In [97]:
x = 3.5

3.5

In [98]:
mark0 = @benchmark f_naive($x)

BenchmarkTools.Trial: 10000 samples with 719 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m175.382 ns[22m[39m … [35m 2.126 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 89.18%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m190.542 ns              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m217.567 ns[22m[39m ± [32m67.172 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.34% ±  1.97%

  [39m█[39m▅[39m▃[34m▅[39m[39m▃[39m▃[39m▂[39m▁[39m▄[39m▃[32m▃[39m[39m▂[39m▃[39m▂[39m▃[39m▂[39m▂[39m▂[39m▂[39m▃[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[39m█[39m█[34m█

In [100]:
peakflops()

1.343783489604155e11

## KR3

**Replicated the naive implementation of the Horner's method for the same polynomial.**

$b_n = a_n$
<br> $b_{n-1} = a_{n-1} + b_nx$
<br> $b_{n-2} = a_{n-2} + b_{n-1}x$
<br> $\vdots$
<br> $b_{0} = a_{0} + b_1x$

such that $p(x) = b_0$

In [101]:
function poly_h(x, a...)
    b = zero(x)
    for i in reverse(eachindex(a))
        b = a[i] + b*x
    end
    return b
end

poly_h (generic function with 1 method)

In [102]:
f_h(x) = poly_h(x, 1,2,3,4,5,6,7,8,9)

f_h (generic function with 1 method)

In [103]:
mark1 = @benchmark f_h($x)

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m3.300 ns[22m[39m … [35m119.100 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m3.500 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m4.101 ns[22m[39m ± [32m  2.863 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m█[34m▅[39m[39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[34m█[39m[39m▅[32m▅[39m

In [104]:
speedup1 = median(mark0.times) / median(mark1.times)

table = DataFrame("Method"=>["Naive","Horner"],"Speedup" => [1.0, speedup1]);

print(table)

[1m2×2 DataFrame[0m
[1m Row [0m│[1m Method [0m[1m Speedup [0m
[1m     [0m│[90m String [0m[90m Float64 [0m
─────┼─────────────────
   1 │ Naive    1.0
   2 │ Horner  54.4407

## KR4

**Replicated the macro implementation of the Horner's method of the same polynomial.**

b = a[n]
<br> b = muladd( x, b, a[n-1] ) = muladd( x, a[n], a[n-1] )
<br> b = muladd( x, b, a[n-2] ) = muladd( x, muladd( x, a[n], a[n-1] ), a[n-2] )
<br> b = muladd( x, b, a[n-3] ) = muladd( x, muladd( x, muladd( x, a[n], a[n-1] ), a[n-2] ), a[n-3] )
<br> ...
<br> b = muladd( x, ..., muladd( x, muladd( x, a[n], a[n-1] ), a[n-2] ), a[n-3], ..., a[1] )

In [109]:
macro horner(x, p...)
    ex = esc(p[end])
    for i in length(p)-1:-1:1
        ex = :(muladd(t,$(ex), $(esc(p[i]))))
    end
    Expr(:block, :(t=$(esc(x))), ex)
end

@horner (macro with 1 method)

In [110]:
f_h_macro(x) = @horner(x, 1,2,3,4,5,6,7,8,9)

f_h_macro (generic function with 1 method)

In [111]:
mark2 = @benchmark f_h_macro($x)

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m0.001 ns[22m[39m … [35m2.400 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m0.001 ns             [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m0.027 ns[22m[39m ± [32m0.051 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [34m█[39m[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁

## KR5

Table showing how many _minutes_ will the function evaluations in both KR3 and KR4 be reduced if KR2 requires 24hours of runtime.

In [108]:
speedup2 = median(mark0.times) / median(mark2.times)

push!(table, ["Macro", speedup2]);

print(table)

[1m3×2 DataFrame[0m
[1m Row [0m│[1m Method [0m[1m Speedup    [0m
[1m     [0m│[90m String [0m[90m Float64    [0m
─────┼────────────────────
   1 │ Naive    1.0
   2 │ Horner  54.4407
   3 │ Macro    1.90542e5

In [54]:
for r in eachrow(table)
    println("$(24*60/r.Speedup) mins for $(r.Method) method.")
end

1440.0 mins for Naive method.
26.36307692307692 mins for Horner method.
0.007532307692307692 mins for Macro method.


In [55]:
transform!(table,:Speedup=>ByRow(x->24*60/x)=>:"Time(mins)")
print(table)

[1m3×3 DataFrame[0m
[1m Row [0m│[1m Method [0m[1m Speedup    [0m[1m Time(mins)    [0m
[1m     [0m│[90m String [0m[90m Float64    [0m[90m Float64       [0m
─────┼───────────────────────────────────
   1 │ Naive    1.0        1440.0
   2 │ Horner  54.6218       26.3631
   3 │ Macro    1.91176e5     0.00753231