# Lab 05: Practical performance debugging tools
Performance is crucial in scientific computing. There is a big difference if your experiments run one minute or one hour. We have already developed quite a bit of code, both in and outside packages, on which we are going to present some of the tooling that Julia provides for finding performance bottlenecks. Performance of your code or more precisely the speed of execution is of course relative (preference, expectation, existing code) and it's hard to find the exact threshold when we should start to care about it. When starting out with Julia, we recommend not to get bogged down by the performance side of things straightaway, but just design the code in the way that feels natural to you. As opposed to other languages Julia offers you to write the things "like you are used to" (depending on your background), e.g. for cycles are as fast as in C; vectorization of mathematical operators works the same or even better than in MATLAB, NumPy. 


Once you have tested the functionality, you can start exploring the performance of your code by different means:
- manual code inspection - identifying performance gotchas (tedious, requires skill)
- automatic code inspection - `Jet.jl` (probably not as powerful as in statically typed languages)
- benchmarking - measuring variability in execution time, comparing with some baseline (only a statistic, non-specific)
- profiling - measuring the execution time at "each line of code" (no easy way to handle advanced parallelism, ...)
- allocation tracking - similar to profiling but specifically looking at allocations (one sided statistic)

## Checking type stability
Recall that type stable function is written in a way, that allows Julia's compiler to infer all the types of all the variables and produce an efficient native code implementation without the need of boxing some variables in a structure whose types is known only during runtime. Probably unbeknown to you we have already seen an example of type unstable function (at least in some situations) in the first lab, where we have defined the `polynomial` function:


In [1]:
using Pkg; Pkg.activate("../")

[32m[1m  Activating[22m[39m new project at `c:\Users\peter\Documents\Skola\SPJ`


In [2]:
function polynomial(a, x)
    accumulator = 0
    for i in length(a):-1:1
        accumulator += x^(i-1) * a[i] # ! 1-based indexing for arrays
    end
    return accumulator
end

polynomial (generic function with 1 method)

The exact form of compiled code and also the type stability depends on the arguments of the function. Let's explore the following two examples of calling the function:

- Integer number valued arguments

In [3]:
a = [-19, 7, -4, 6]
x = 3
polynomial(a, x)

128

- Float number valued arguments

In [4]:
xf = 3.0
polynomial(a, xf)

128.0

The result they produce is the "same" numerically, however it differs in the output type. Though you have probably not noticed it, there should be a difference in runtime (assuming that you have run it once more after its compilation). It is probably a surprise to no one, that one of the methods that has been compiled is type unstable. This can be check with the `@code_warntype` macro:

In [5]:
using InteractiveUtils #hide

In [6]:
@code_warntype polynomial(a, x)  # type stable

MethodInstance for polynomial(::Vector{Int64}, ::Int64)
  from polynomial([90ma[39m, [90mx[39m)[90m @[39m [90mMain[39m [90mc:\Users\peter\Documents\Skola\SPJ\cv05\[39m[90m[4mjl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_W2sZmlsZQ==.jl:1[24m[39m
Arguments
  #self#[36m::Core.Const(polynomial)[39m
  a[36m::Vector{Int64}[39m
  x[36m::Int64[39m
Locals
  @_4[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  accumulator[36m::Int64[39m
  i[36m::Int64[39m
Body[36m::Int64[39m
[90m1 ─[39m       (accumulator = 0)
[90m│  [39m %2  = Main.length(a)[36m::Int64[39m
[90m│  [39m %3  = (%2:-1:1)[36m::Core.PartialStruct(StepRange{Int64, Int64}, Any[Int64, Core.Const(-1), Int64])[39m
[90m│  [39m       (@_4 = Base.iterate(%3))
[90m│  [39m %5  = (@_4 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_4[36m::Tuple{Int64, Int64}[39m
[90m│  [39m       (i = Core.getfie

In [7]:
@code_warntype polynomial(a, xf) # type unstable

MethodInstance for polynomial(::Vector{Int64}, ::Float64)
  from polynomial([90ma[39m, [90mx[39m)[90m @[39m [90mMain[39m [90mc:\Users\peter\Documents\Skola\SPJ\cv05\[39m[90m[4mjl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_W2sZmlsZQ==.jl:1[24m[39m
Arguments
  #self#[36m::Core.Const(polynomial)[39m
  a[36m::Vector{Int64}[39m
  x[36m::Float64[39m
Locals
  @_4[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  accumulator[33m[1m::Union{Float64, Int64}[22m[39m
  i[36m::Int64[39m
Body[33m[1m::Union{Float64, Int64}[22m[39m
[90m1 ─[39m       (accumulator = 0)
[90m│  [39m %2  = Main.length(a)[36m::Int64[39m
[90m│  [39m %3  = (%2:-1:1)[36m::Core.PartialStruct(StepRange{Int64, Int64}, Any[Int64, Core.Const(-1), Int64])[39m
[90m│  [39m       (@_4 = Base.iterate(%3))
[90m│  [39m %5  = (@_4 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_4[36m::Tuple{I

We are getting a little ahead of ourselves in this lab, as understanding of these expressions is part of the future lecture. Anyway the output basically shows what the compiler thinks of each variable in the code, albeit for us in less readable form than the original code. The more red the color is of the type info the less sure the inferred type is. Our main focus should be on the return type of the function which is just at the start of the code with the keyword `Body`. In the first case the return type is an `Int64`, whereas in the second example the compiler is unsure whether the type is `Float64` or `Int64`, marked as the `Union` type of the two. Fortunately for us this type instability can be fixed with a single line edit, but we will see later that it is not always the case.


<div class="alert alert-block alert-warning">
<b>Note:</b> "Type stability"
    Having a variable represented as `Union` of multiple types in a functions is a lesser evil than having `Any`, as we can at least enumerate statically the available options of functions to which to dynamically dispatch and in some cases there may be a low penalty.
</div>



<div class="alert alert-block alert-success">
<b>Exercise:</b> 
Create a new function `polynomial_stable`, which is type stable and measure the difference in evaluation time. 

**HINTS**: 
- Ask for help on the `one` and `zero` keyword, which are often as a shorthand for these kind of functions.
- run the function with the argument once before running `@time` or use `@btime` if you have `BenchmarkTools` readily available in your environment
- To see some measurable difference with this simple function, a longer vector of coefficients may be needed.

</div>

<div class="alert alert-block alert-info">
<b>Solution</b>: </div>

####

In [34]:
function polynomial_stable(a, x)
    accumulator = a[end] * one(x)
    for i in length(a) - 1:-1:1
        accumulator = accumulator * x + a[i]
    end
    return accumulator
end

polynomial_stable (generic function with 1 method)

In [26]:
@code_warntype polynomial_stable(a, x)  # type stable

MethodInstance for polynomial_stable(::Vector{Int64}, ::Int64)
  from polynomial_stable([90ma[39m, [90mx[39m)[90m @[39m [90mMain[39m [90mc:\Users\peter\Documents\Skola\SPJ\cv05\[39m[90m[4mjl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X21sZmlsZQ==.jl:1[24m[39m
Arguments
  #self#[36m::Core.Const(polynomial_stable)[39m
  a[36m::Vector{Int64}[39m
  x[36m::Int64[39m
Locals
  @_4[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  delta_x[36m::Int64[39m
  accumulator[36m::Int64[39m
  i[36m::Int64[39m
Body[36m::Int64[39m
[90m1 ─[39m %1  = Main.eltype(x)[36m::Core.Const(Int64)[39m
[90m│  [39m       (accumulator = Main.zero(%1))
[90m│  [39m %3  = Main.eltype(x)[36m::Core.Const(Int64)[39m
[90m│  [39m       (delta_x = Main.one(%3))
[90m│  [39m %5  = Main.length(a)[36m::Int64[39m
[90m│  [39m %6  = (1:1:%5)[36m::Core.PartialStruct(StepRange{Int64, Int64}, Any[Core.Const(1), Core.Const(1), Int64])[39m
[90m│  [39m       (@_4 = Base.iterate(%

In [27]:
@code_warntype polynomial_stable(a, xf) # type stable

MethodInstance for polynomial_stable(::Vector{Int64}, ::Float64)
  from polynomial_stable([90ma[39m, [90mx[39m)[90m @[39m [90mMain[39m [90mc:\Users\peter\Documents\Skola\SPJ\cv05\[39m[90m[4mjl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X21sZmlsZQ==.jl:1[24m[39m
Arguments
  #self#[36m::Core.Const(polynomial_stable)[39m
  a[36m::Vector{Int64}[39m
  x[36m::Float64[39m
Locals
  @_4[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  delta_x[36m::Float64[39m
  accumulator[36m::Float64[39m
  i[36m::Int64[39m
Body[36m::Float64[39m
[90m1 ─[39m %1  = Main.eltype(x)[36m::Core.Const(Float64)[39m
[90m│  [39m       (accumulator = Main.zero(%1))
[90m│  [39m %3  = Main.eltype(x)[36m::Core.Const(Float64)[39m
[90m│  [39m       (delta_x = Main.one(%3))
[90m│  [39m %5  = Main.length(a)[36m::Int64[39m
[90m│  [39m %6  = (1:1:%5)[36m::Core.PartialStruct(StepRange{Int64, Int64}, Any[Core.Const(1), Core.Const(1), Int64])[39m
[90m│  [39m       (@_4 = 

In [35]:
polynomial_stable(a, xf)

-26096.0

In [23]:
@time polynomial_stable(a, xf)

  0.000010 seconds (1 allocation: 16 bytes)


128.0

Only really visible when evaluating multiple times.

In [22]:
@time polynomial(a, xf)

  0.000009 seconds (1 allocation: 16 bytes)


128.0

In [11]:
using BenchmarkTools

In [24]:
@btime polynomial(a, xf)

  39.174 ns (1 allocation: 16 bytes)


128.0

In [25]:
@btime polynomial_stable(a, xf)

  22.969 ns (1 allocation: 16 bytes)


128.0

####
Code stability issues are something unique to Julia, as its JIT compilation allows it to produce code that contains boxed variables, whose type can be inferred during runtime. This is one of the reasons why interpreted languages are slow to run but fast to type. Julia's way of solving it is based around compiling functions for specific arguments, however in order for this to work without the interpreter, the compiler has to be able to infer the types.

There are other problems (such as unnecessary allocations), that you can learn to spot in your code, however the code stability issues are by far the most commonly encountered problems among beginner users of Julia wanting to squeeze more out of it.

<div class="alert alert-block alert-warning">
<b>Note:</b> 
"Advanced tooling"

Sometimes `@code_warntype` shows that the function's return type is unstable without any hints to the possible problem, fortunately for such cases a more advanced tools such as [`Cthuhlu.jl`](https://github.com/JuliaDebug/Cthulhu.jl) or [`JET.jl`](https://github.com/aviatesk/JET.jl) have been developed.
</div>


## Benchmarking with `BenchmarkTools`
In the last exercise we have encountered the problem of timing of code to see, if we have made any progress in speeding it up. Throughout the course we will advertise the use of the `BenchmarkTools` package, which provides an easy way to test your code multiple times. In this lab we will focus on some advanced usage tips and gotchas that you may encounter while using it. 

There are few concepts to know in order to understand how the pkg works
- evaluation - a single execution of a benchmark expression (default `1`)
- sample - a single time/memory measurement obtained by running multiple evaluations (default `1e5`)
- trial - experiment in which multiple samples are gathered 

The result of a benchmark is a trial in which we collect multiple samples of time/memory measurements, which in turn may be composed of multiple executions of the code in question. This layering of repetition is required to allow for benchmarking code at different runtime magnitudes. Imagine having to benchmark operations which are faster than the act of measuring itself - clock initialization, dispatch of an operation and subsequent time subtraction.

The number of samples/evaluations can be set manually, however most of the time won't need to know about them, due to an existence of a tuning method `tune!`, which tries to run the code once to estimate the correct ration of evaluation/samples. 

The most commonly used interface of `Benchmarkools` is the `@btime` macro, which returns an output similar to the regular `@time` macro however now aggregated over samples by taking their minimum (a robust estimator for the location parameter of the time distribution, should not be considered an outlier - usually the noise from other processes/tasks puts the results to the other tail of the distribution and some miraculous noisy speedups are uncommon. In order to see the underlying sampling better there is also the `@benchmark` macro, which runs in the same way as `@btime`, but prints more detailed statistics which are also returned in the `Trial` type instead of the actual code output.


In [28]:
@btime sum($(rand(1000)))

  48.638 ns (0 allocations: 0 bytes)


504.90582944365

In [29]:
@benchmark sum($(rand(1000)))

BenchmarkTools.Trial: 10000 samples with 990 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m48.788 ns[22m[39m … [35m195.455 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m49.798 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m50.545 ns[22m[39m ± [32m  4.697 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m▄[39m▆[39m█[34m▇[39m[39m▃[39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m▄[39m█[39m█[39m█[3

<div class="alert alert-block alert-warning">
<b>Note:</b> "Interpolation ~ `$` in BenchmarkTools"


In the previous example we have used the interpolation signs `$` to indicate that the code inside should be evaluated once and stored into a local variable. This allows us to focus only on the benchmarking of code itself instead of the input generation. A more subtle way where this is crops up is the case of using previously defined global variable, where instead of data generation we would measure also the type inference at each evaluation, which is usually not what we want. The following list will help you decide when to use interpolation.
```julia
@btime sum($(rand(1000)))   # rand(1000) is stored as local variable, which is used in each evaluation
@btime sum(rand(1000))      # rand(1000) is called in each evaluation
A = rand(1000)
@btime sum($A)              # global variable A is inferred and stored as local, which is used in each evaluation
@btime sum(A)               # global variable A has to be inferred in each evaluation
```


</div>




## Profiling
Profiling in Julia is part of the standard library in the `Profile` module. It implements a fairly simple sampling based profiler, which in a nutshell asks at regular intervals, where the code execution is currently at. As a result we get an array of stacktraces (= chain of function calls), which allow us to make sense of where the execution spent the most time. The number of samples, that can be stored and the period in seconds can be checked after loading `Profile` into the session with the `init()` function.

```julia
using Profile
Profile.init()
```

The same function, but with keyword arguments, can be used to change these settings, however these settings are system dependent. For example on Windows, there is a known issue that does not allow to sample faster than at `0.003s` and even on Linux based system this may not do much. There are some further caveat specific to Julia:
- When running profile from REPL, it is usually dominated by the interactive part which spawns the task and waits for it's completion.
- Code has to be run before profiling in order to filter out all the type inference and interpretation stuff. (Unless compilation is what we want to profile.)
- When the execution time is short, the sampling may be insufficient -> run multiple times.

### Polynomial with scalars
Let's look at our favorite `polynomial` function or rather it's type stable variant `polynomial_stable` under the profiling lens.

```julia
# clear the last trace (does not have to be run on fresh start)
Profile.clear()

@profile polynomial_stable(a, xf)

# text based output of the profiler
# not shown here because it is not incredibly informative
Profile.print()
```
Unless the machine that you run the code on is really slow, the resulting output contains nothing or only some internals of Julia's interactive REPL. This is due to the fact that our `polynomial` function take only few nanoseconds to run. When we want to run profiling on something, that takes only a few nanoseconds, we have to repeatedly execute the function.


In [31]:

function run_polynomial_stable(a, x, n) 
    for _ in 1:n
        polynomial_stable(a, x)
    end
end

a = rand(-10:10, 10) # using longer polynomial

run_polynomial_stable(a, xf, 10) #hide
Profile.clear()
@profile run_polynomial_stable(a, xf, Int(1e5))
Profile.print()

Overhead ╎ [+additional indent] Count File:Line; Function
 ╎4 @Base\client.jl:552; _start()
 ╎ 4 @Base\client.jl:318; exec_options(opts::Base.JLOptions)
 ╎  4 @Base\Base.jl:495; include(mod::Module, _path::String)
 ╎   4 @Base\loading.jl:2136; _include(mapexpr::Function, mod::Module, _path::S…
 ╎    4 @Base\loading.jl:2076; include_string(mapexpr::typeof(identity), mod::M…
 ╎     4 @Base\boot.jl:385; eval
 ╎    ╎ 4 …s\notebook\notebook.jl:35; top-level scope
 ╎    ╎  4 …src\serve_notebook.jl:81; kwcall(::@NamedTuple{error_handler::var"#…
 ╎    ╎   4 …src\serve_notebook.jl:147; serve_notebook(pipename::String, debugg…
 ╎    ╎    4 @JSONRPC\src\typed.jl:67; dispatch_msg(x::VSCodeServer.JSONRPC.JSO…
 ╎    ╎     4 …rc\serve_notebook.jl:13; notebook_runcell_request(conn::VSCodeSe…
 ╎    ╎    ╎ 4 …eServer\src\repl.jl:276; withpath(f::VSCodeServer.var"#217#218"…
 ╎    ╎    ╎  4 …c\serve_notebook.jl:24; (::VSCodeServer.var"#217#218"{VSCodeSe…
 ╎    ╎    ╎   4 @Base\essentials.jl:889; invokelat

In order to get more of a visual feel for profiling, there are packages that allow you to generate interactive plots or graphs. In this lab we will use [`ProfileSVG.jl`](https://github.com/timholy/ProfileSVG.jl), which does not require any fancy IDE or GUI libraries.

```julia
@profview run_polynomial_stable(a, xf, Int(1e5))
```

<img src="./poly_stable.png" align="center">

<div class="alert alert-block alert-success">
<b>Exercise:</b> 

Let's compare this with the type unstable situation.
</div>

<div class="alert alert-block alert-info">
<b>Solution</b>: </div>

####



First let's define the function that allows us to run the `polynomial` multiple times.

####


Other options for viewing profiler outputs
- [ProfileView](https://github.com/timholy/ProfileView.jl) - close cousin of `ProfileSVG`, spawns GTK window with interactive FlameGraph
- [VSCode](https://www.julia-vscode.org/docs/stable/release-notes/v0_17/#Profile-viewing-support-1) - always imported `@profview` macro, flamegraphs (js extension required), filtering, one click access to source code 
- [PProf](https://github.com/vchuravy/PProf.jl) - serializes the profiler output to protobuffer and loads it in `pprof` web app, graph visualization of stacktraces


## Applying fixes
We have noticed that no matter if the function is type stable or unstable the majority of the computation falls onto the power function `^` and there is a way to solve this using a clever technique called Horner schema[^1], which uses distributive and associative rules to convert the sum of powers into an incremental multiplication of partial results.




<div class="alert alert-block alert-success">
<b>Exercise:</b> 

Rewrite the `polynomial` function using the Horner schema/method[^1]. Moreover include the type stability fixes from `polynomial_stable` You should get more than 3x speedup when measured against the old implementation (measure `polynomial` against `polynomial_stable`.

**BONUS**: Profile the new method and compare the differences in traces.

[^1]: Explanation of the Horner schema can be found on [https://en.wikipedia.org/wiki/Horner%27s\_method](https://en.wikipedia.org/wiki/Horner%27s_method).

</div>

<div class="alert alert-block alert-info">
<b>Solution</b>: </div>

####

In [None]:
function polynomial(a, x)
    #code here
end

Speed up:
- 49ns -> 8ns ~ 6x on integer valued input 
- 59ns -> 8ns ~ 7x on real valued input

In [None]:
@btime polynomial($a, $x)
@btime polynomial_stable($a, $x)
@btime polynomial($a, $xf)
@btime polynomial_stable($a, $xf)

These numbers will be different on different HW.

**BONUS**: The profile trace does not even contain the calling of mathematical operators and is mainly dominated by the iteration utilities. In this case we had to increase the number of runs to `1e6` to get some meaningful trace.

```julia
@profview run_polynomial(a, xf, Int(1e6))
```
<img src="./poly_horner.png" align="center">


####


### Where to find source code?
As most of Julia is written in Julia itself it is sometimes helpful to look inside for some details or inspiration. The code of `Base` and stdlib pkgs is located just next to Julia's installation in the `./share/julia` subdirectory
```bash
./julia-1.6.2/
    ├── bin
    ├── etc
    │   └── julia
    ├── include
    │   └── julia
    │       └── uv
    ├── lib
    │   └── julia
    ├── libexec
    └── share
        ├── appdata
        ├── applications
        ├── doc
        │   └── julia       # offline documentation (https://docs.julialang.org/en/v1/)
        └── julia
            ├── base        # base library
            ├── stdlib      # standard library
            └── test
```
Other packages installed through Pkg interface are located in the `.julia/` directory which is located in your `$HOMEDIR`, i.e. `/home/$(user)/.julia/` on Unix based systems and `/Users/$(user)/.julia/` on Windows.
```bash
~/.julia/
    ├── artifacts
    ├── compiled
    ├── config          # startup.jl lives here
    ├── environments
    ├── logs
    ├── packages        # packages are here
    └── registries
```
If you are using VSCode, the paths visible in the REPL can be clicked through to he actual source code. Moreover in that environment the documentation is usually available upon hovering over code.

### Setting up benchmarks to our liking
In order to control the number of samples/evaluation and the amount of time given to a given benchmark, we can simply append these as keyword arguments to `@btime` or `@benchmark` in the following way


In [36]:
@benchmark sum($(rand(1000))) evals=100 samples=10 seconds=1

BenchmarkTools.Trial: 10 samples with 100 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m49.000 ns[22m[39m … [35m50.000 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m50.000 ns              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m49.900 ns[22m[39m ± [32m 0.316 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m█[34m [39m[39m 
  [39m▄[39m▁[39m▁[39m▁[39m▁[39m▁[3

which runs the code repeatedly for up to `1s`, where each of the `10` samples in the trial is composed of `10` evaluations. Setting up these parameters ourselves creates a more controlled environment in which performance regressions can be more easily identified.

Another axis of customization is needed when we are benchmarking mutable operations such as `sort!`, which sorts an array in-place. One way of achieving a consistent benchmark is by omitting the interpolation such as

In [37]:
@benchmark sort!(rand(1000))

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m11.600 μs[22m[39m … [35m181.800 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m14.600 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m17.911 μs[22m[39m ± [32m 10.736 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▃[39m▅[39m█[34m▆[39m[39m▅[39m▆[39m▅[32m▃[39m[39m▂[39m▂[39m▁[39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▂[39m▂[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[39m█[34m█[39m

however now we are again measuring the data generation as well. A better way of doing such timing is using the built in `setup` keyword, into which you can put a code that has to be run before each sample and which won't be measured.

In [38]:
@benchmark sort!(y) setup=(y=rand(1000))

BenchmarkTools.Trial: 10000 samples with 45 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.104 μs[22m[39m … [35m  4.902 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.140 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m1.169 μs[22m[39m ± [32m161.990 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▄[39m█[34m▇[39m[32m▅[39m[39m▃[39m▃[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[34m█[39m[32m█[39m[3


## Ecosystem debugging

Let's now apply what we have learned so far on the much bigger codebase of our
`Ecosystem`.


In [40]:
include("../Ecosystem/src/Ecosystem.jl")
import Pkg; Pkg.add("StatsBase")
function make_counter()
    n = 0
    counter() = n += 1
end

function create_world()
    n_grass  = 1_000
    n_sheep  = 40
    n_wolves = 4

    nextid = make_counter()

    World(vcat(
        [Grass(nextid()) for _ in 1:n_grass],
        [Sheep(nextid()) for _ in 1:n_sheep],
        [Wolf(nextid()) for _ in 1:n_wolves],
    ))
end
world = create_world();




LoadError: LoadError: ArgumentError: Package StatsBase not found in current path.
- Run `import Pkg; Pkg.add("StatsBase")` to install the StatsBase package.
in expression starting at c:\Users\peter\Documents\Skola\SPJ\Ecosystem\src\Ecosystem.jl:2

<div class="alert alert-block alert-success">
<b>Exercise:</b> 

Use `@profview` and `@code_warntype` to find the type unstable and slow parts of
our simulation.

Precompile everything by running one step of our simulation and run the profiler
like this:

```julia
world_step!(world)
@profview for i=1:100 world_step!(world) end
```

You should get a flamegraph similar to the one below:

<img src="../../docs/src/lecture_05/ecosystems/lab04-worldstep.png" align="center">

</div>

<div class="alert alert-block alert-info">
<b>Solution</b>: </div>

####

####

## Different `Ecosystem.jl` versions

In order to fix the type instability in the `Vector{Agent}` we somehow have to
rethink our world such that we get a vector of a concrete type. Optimally we would have one
vector for each type of agent that populates our world. Before we completely
redesign how our world works we can try a simple hack that might already improve
things. Instead of letting julia figure our which types of agents we have (which
could be infinitely many), we can tell the compiler at least that we have only
three of them: `Wolf`, `Sheep`, and `Grass`.

We can do this with a tiny change in the constructor of our `World`:


In [None]:
function World(agents::Vector{<:Agent})
    ids = [a.id for a in agents]
    length(unique(ids)) == length(agents) || error("Not all agents have unique IDs!")

    # construct Dict{Int,Union{Animal{Wolf}, Animal{Sheep}, Plant{Grass}}}
    # instead of Dict{Int,Agent}
    types = unique(typeof.(agents))
    dict = Dict{Int,Union{types...}}(a.id => a for a in agents)

    World(dict, maximum(ids))
end

<div class="alert alert-block alert-success">
<b>Exercise:</b> 

1. Run the benchmark script provided [here](ecosystems/lab04/bench.jl) to get
       timings for `find_food` and `reproduce!` for the original ecosystem.
2. Run the same benchmark with the modified `World` constructor.

Which differences can you observe? Why is one version faster than the other?
</div>

<div class="alert alert-block alert-info">
<b>Solution</b>: </div>

####