<a href="https://colab.research.google.com/github/PhilipFackler/julia-basics/blob/main/distinctions/06_Performance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Performance Notes

Manual: [Performance Tips](https://docs.julialang.org/en/v1/manual/performance-tips/)

## Timing

The `@time` macro
- useful for showing run times and memory allocations
- accuracy on the order of microseconds (may not be useful for quick functions)

For more accurate timing, packages like [`BenchmarkTools.jl`](https://github.com/JuliaCI/BenchmarkTools.jl) and [`ChairMarks.jl`](https://github.com/LilithHafner/Chairmarks.jl) provide more advanced tools.

In [None]:
using Pkg
Pkg.add("BenchmarkTools")

In [None]:
using LinearAlgebra
A = randn(100);
norm(A) # warm up
@time norm(A)

using BenchmarkTools
@benchmark norm(A) setup=(A = randn(100))

## Profiling

In [None]:
using Profile

N = 4_000
A = randn(N, N)
B = randn(N, N)
C = randn(N, N)

Profile.clear()
@profile mul!(C, A, B)
Profile.print()
;

In [None]:
Profile.print(; C=true)
;

Julia is also compatible with third-party profilers:

* Linux perf, also via [`LinuxPerf.jl`](https://github.com/JuliaPerf/LinuxPerf.jl)
* [Intel VTune](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler-download.html) (on x86-64), also for MPI programs, [`IntelITT.jl`](https://github.com/JuliaPerf/IntelITT.jl) for instrumentation
* [NVIDIA Nsight Systems](https://developer.nvidia.com/nsight-systems/) (on x86-64 and aarch64), also for MPI and GPU programs, [`NVTX.jl`](https://github.com/JuliaGPU/NVTX.jl) for instrumentation
* [Tracy](https://github.com/wolfpld/tracy)
* Other profilers that have been used: [`LIKWID.jl`](https://github.com/JuliaPerf/LIKWID.jl), [`Extrae.jl`](https://github.com/bsc-quantic/Extrae.jl), [`ScoreP.jl`](https://github.com/JuliaPerf/ScoreP.jl), and more

# Tips

* avoid accessing (untyped) global variables: **put code in functions**, don't work in global scope
* Julia has a garbage collector (GC) for safe automatic memory management, but in some cases it can get in the way of performance: do fully **in-place operations in hot loops** to avoid the GC activating and achieve best performance
* the compiler may not always be able to prove array indexing operations are in bounds: you may want to use [`@inbounds`](https://docs.julialang.org/en/v1/base/base/#Base.@inbounds) to forcibly disable bounds checking (use with caution!).
  Double check this is actually necessary with [`@code_llvm`](https://docs.julialang.org/en/v1/stdlib/InteractiveUtils/#InteractiveUtils.@code_llvm), and use generic abstracts (like [`eachindex`](https://docs.julialang.org/en/v1/base/arrays/#Base.eachindex)) whenever possible
* slicing an array makes a copy, if you want to take a view instead use [`@view`](https://docs.julialang.org/en/v1/base/arrays/#Base.@view)
* [avoid type-instabilities](https://docs.julialang.org/en/v1/manual/performance-tips/#Write-%22type-stable%22-functions)
  - don't change type of local variable
  - don't use abstract types for `struct` members