# Julia is fast

Very often, benchmarks are used to compare languages.  These benchmarks can lead to long discussions, first as to exactly what is being benchmarked and secondly what explains the differences.  These simple questions can sometimes get more complicated than you at first might imagine.

The purpose of this notebook is for you to see a simple benchmark for yourself.  One can read the notebook and see what happened on the author's Macbook Pro with a 4-core Intel Core I7, or run the notebook yourself.

(This material began life as a wonderful lecture by Steven Johnson at MIT: https://github.com/stevengj/18S096/blob/master/lectures/lecture1/Boxes-and-registers.ipynb.)

# Outline of this notebook

- Define the sum function
- Implementations & benchmarking of sum in...
    - C (hand-written)
    - C (hand-written with -ffast-math)
    - python (built-in)
    - python (numpy)
    - python (hand-written)
    - Julia (built-in)
    - Julia (hand-written)
    - Julia (hand-written with SIMD)
- Summary of benchmarks

# `sum`: An easy enough function to understand

Consider the  **sum** function `sum(a)`, which computes
$$
\mathrm{sum}(a) = \sum_{i=1}^n a_i,
$$
where $n$ is the length of `a`.

In [1]:
a = rand(10^7) # 1D vector of random numbers, uniform on [0,1)

10000000-element Vector{Float64}:
 0.5490460030061002
 0.21152308538883957
 0.4211604760652997
 0.24076031222261052
 0.6399972190235906
 0.29156129931543995
 0.5914275605629259
 0.10600308620788346
 0.0946717908673097
 0.15087196303032246
 ⋮
 0.9983140652072142
 0.3733820512029551
 0.7133868471394686
 0.9839939443707294
 0.7591089368638609
 0.6663232372986172
 0.5153250123154138
 0.027994116646096634
 0.1692840804519279

In [2]:
sum(a)

4.999882072827285e6

The expected result is 0.5 * 10^7, since the mean of each entry is 0.5

# Benchmarking a few ways in a few languages

In [3]:
@time sum(a)

  0.005350 seconds (1 allocation: 16 bytes)


4.999882072827285e6

In [4]:
@time sum(a)

  0.005903 seconds (1 allocation: 16 bytes)


4.999882072827285e6

In [5]:
@time sum(a)

  0.003498 seconds (1 allocation: 16 bytes)


5.000406093458154e6

The `@time` macro can yield noisy results, so it's not our best choice for benchmarking!

Luckily, Julia has a `BenchmarkTools.jl` package to make benchmarking easy and accurate:

In [6]:
using Pkg
Pkg.add("BenchmarkTools")

[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m    Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[32m[1m    Updating[22m[39m registry at `~/.julia/registries/JuliaPOMDP`
[32m[1m    Updating[22m[39m git-repo `https://github.com/JuliaPOMDP/Registry`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m BenchmarkTools ─ v1.1.1
[32m[1m    Updating[22m[39m `~/Desktop/Introduction-to-Julia/Project.toml`
 [90m [6e4b80f9] [39m[92m+ BenchmarkTools v1.1.1[39m
[32m[1m    Updating[22m[39m `~/Desktop/Introduction-to-Julia/Manifest.toml`
 [90m [6e4b80f9] [39m[92m+ BenchmarkTools v1.1.1[39m
[32m[1mPrecompiling[22m[39m project...
[32m  ✓ [39mBenchmarkTools
1 dependency successfully precompiled in 2 seconds (118 already precompiled)


In [5]:
using BenchmarkTools  

#  1. The C language

C is often considered the gold standard: difficult on the human, nice for the machine. Getting within a factor of 2 of C is often satisfying. Nonetheless, even within C, there are many kinds of optimizations possible that a naive C writer may or may not get the advantage of.

The current author does not speak C, so he does not read the cell below, but is happy to know that you can put C code in a Julia session, compile it, and run it. Note that the `"""` wrap a multi-line string.

In [6]:
using Libdl
C_code = """
#include <stddef.h>
double c_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += X[i];
    }
    return s;
}
"""

const Clib = tempname()   # make a temporary file


# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# define a Julia function that calls the C function:
c_sum(X::Array{Float64}) = ccall(("c_sum", Clib), Float64, (Csize_t, Ptr{Float64}), length(X), X)

Base.IOError: IOError: could not spawn `gcc -fPIC -O3 -msse3 -xc -shared -o 'C:\Users\user\AppData\Local\Temp\jl_wtkxytONGp.dll' -`: no such file or directory (ENOENT)

In [7]:
c_sum(a)

UndefVarError: UndefVarError: `c_sum` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [8]:
c_sum(a) ≈ sum(a) # type \approx and then <TAB> to get the ≈ symbolb

UndefVarError: UndefVarError: `c_sum` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [9]:
c_sum(a) - sum(a)  

UndefVarError: UndefVarError: `c_sum` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [10]:
≈  # alias for the `isapprox` function

isapprox (generic function with 13 methods)

In [11]:
?isapprox

Base.Meta.ParseError: ParseError:
# Error @ c:\Users\user\Documents\Julia_Projects\Introduction-to-Julia\jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X26sZmlsZQ==.jl:1:1
?isapprox
╙ ── not a unary operator

We can now benchmark the C code directly from Julia:

In [12]:
c_bench = @benchmark c_sum($a)

UndefVarError: UndefVarError: `c_sum` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [13]:
println("C: Fastest time was $(minimum(c_bench.times) / 1e6) msec")

UndefVarError: UndefVarError: `c_bench` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [14]:
d = Dict()  # a "dictionary", i.e. an associative array
d["C"] = minimum(c_bench.times) / 1e6  # in milliseconds
d

UndefVarError: UndefVarError: `c_bench` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [15]:
using Plots
gr()

│   path = C:\Users\user\.julia\compiled\v1.12\Plots\ld3vC_zedvB.ji.pidfile
└ @ FileWatching.Pidfile C:\Users\user\.julia\juliaup\julia-1.12.0+0.x64.w64.mingw32\share\julia\stdlib\v1.12\FileWatching\src\pidfile.jl:247

SYSTEM: caught exception of type :MethodError while trying to print a failed Task notice; giving up


Plots.GRBackend()

In [16]:
using Statistics # bring in statistical support for standard deviations
t = c_bench.times / 1e6 # times in milliseconds
m, σ = minimum(t), std(t)

histogram(t, bins=500,
    xlim=(m - 0.01, m + σ),
    xlabel="milliseconds", ylabel="count", label="")

UndefVarError: UndefVarError: `c_bench` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

# 2. C with -ffast-math

If we allow C to re-arrange the floating point operations, then it'll vectorize with SIMD (single instruction, multiple data) instructions.

In [17]:
const Clib_fastmath = tempname()   # make a temporary file

# The same as above but with a -ffast-math flag added
open(`gcc -fPIC -O3 -msse3 -xc -shared -ffast-math -o $(Clib_fastmath * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# define a Julia function that calls the C function:
c_sum_fastmath(X::Array{Float64}) = ccall(("c_sum", Clib_fastmath), Float64, (Csize_t, Ptr{Float64}), length(X), X)

Base.IOError: IOError: could not spawn `gcc -fPIC -O3 -msse3 -xc -shared -ffast-math -o 'C:\Users\user\AppData\Local\Temp\jl_8XiT61yBkd.dll' -`: no such file or directory (ENOENT)

In [18]:
c_fastmath_bench = @benchmark $c_sum_fastmath($a)

UndefVarError: UndefVarError: `c_sum_fastmath` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [19]:
d["C -ffast-math"] = minimum(c_fastmath_bench.times) / 1e6  # in milliseconds

UndefVarError: UndefVarError: `c_fastmath_bench` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

# 3. Python's built in `sum` 

The `PyCall` package provides a Julia interface to Python:

In [20]:
using Pkg; Pkg.add("PyCall")
using PyCall

[32m[1m    Updating[22m[39m registry at `C:\Users\user\.julia\registries\General.toml`
[32m[1m   Resolving[22m[39m package versions...
[36m[1m     Project[22m[39m No packages added to or removed from `C:\Users\user\Documents\Julia_Projects\Introduction-to-Julia\Project.toml`
[36m[1m    Manifest[22m[39m No packages added to or removed from `C:\Users\user\Documents\Julia_Projects\Introduction-to-Julia\Manifest.toml`
[32m[1mPrecompiling[22m[39m packages...
            [91m  ✗ [39mPyCall
  0 dependencies successfully precompiled in 7 seconds. 189 already precompiled.

The following 1 direct dependency failed to precompile:

PyCall 

Failed to precompile PyCall [438e738f-606a-5dbb-bf0a-cddfbfd45ab0] to "C:\\Users\\user\\.julia\\compiled\\v1.12\\PyCall\\jl_46A2.tmp".
[91m[1mERROR: [22m[39mLoadError: PyCall not properly installed. Please run Pkg.build("PyCall")
Stacktrace:
  [1] [0m[1merror[22m[0m[1m([22m[90ms[39m::[0mString[0m[1m)[22m
[90m    @[39m [9

ErrorException: Failed to precompile PyCall [438e738f-606a-5dbb-bf0a-cddfbfd45ab0] to "C:\\Users\\user\\.julia\\compiled\\v1.12\\PyCall\\jl_4F02.tmp".

In [21]:
# get the Python built-in "sum" function:
pysum = pybuiltin("sum")

UndefVarError: UndefVarError: `pybuiltin` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [22]:
pysum(a)

UndefVarError: UndefVarError: `pysum` not defined in `Main`
Suggestion: add an appropriate import or assignment. This global was declared but not assigned.

In [23]:
pysum(a) ≈ sum(a)

UndefVarError: UndefVarError: `pysum` not defined in `Main`
Suggestion: add an appropriate import or assignment. This global was declared but not assigned.

In [24]:
py_list_bench = @benchmark $pysum($a)

UndefVarError: UndefVarError: `pysum` not defined in `Main`
Suggestion: add an appropriate import or assignment. This global was declared but not assigned.

In [25]:
d["Python built-in"] = minimum(py_list_bench.times) / 1e6
d

UndefVarError: UndefVarError: `py_list_bench` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

# 4. Python: `numpy` 

## Takes advantage of hardware "SIMD", but only works when it works.

`numpy` is an optimized C library, callable from Python.
It may be installed within Julia as follows:

In [26]:
using Pkg; Pkg.add("Conda")
using Conda

[32m[1m   Resolving[22m[39m package versions...
[36m[1m     Project[22m[39m No packages added to or removed from `C:\Users\user\Documents\Julia_Projects\Introduction-to-Julia\Project.toml`
[36m[1m    Manifest[22m[39m No packages added to or removed from `C:\Users\user\Documents\Julia_Projects\Introduction-to-Julia\Manifest.toml`
[32m[1mPrecompiling[22m[39m packages...
            [91m  ✗ [39mPyCall
  0 dependencies successfully precompiled in 3 seconds. 189 already precompiled.

The following 1 direct dependency failed to precompile:

PyCall 

Failed to precompile PyCall [438e738f-606a-5dbb-bf0a-cddfbfd45ab0] to "C:\\Users\\user\\.julia\\compiled\\v1.12\\PyCall\\jl_6442.tmp".
[91m[1mERROR: [22m[39mLoadError: PyCall not properly installed. Please run Pkg.build("PyCall")
Stacktrace:
  [1] [0m[1merror[22m[0m[1m([22m[90ms[39m::[0mString[0m[1m)[22m
[90m    @[39m [90mBase[39m [90m.\[39m[90m[4merror.jl:44[24m[39m
  [2] top-level scope
[90m    @[39

In [27]:
Conda.add("numpy")

┌ Info: Running `conda install -y numpy` in root environment
└ @ Conda C:\Users\user\.julia\packages\Conda\zReqD\src\Conda.jl:181


Retrieving notices: - 



  conda config --add channels defaults

For more information see https://docs.conda.io/projects/conda/en/stable/user-guide/configuration/use-condarc.html

  deprecated.topic(


done
Channels:
 - defaults
Platform: win-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed



LibMambaUnsatisfiableError: Encountered problems while solving:
  - nothing provides numpy-base 1.23.3 py310h04254f7_1 needed by numpy-1.23.3-py310h60c9a35_1

Could not solve for environment specs
The following packages are incompatible
├─ numpy =* * is installable with the potential options
│  ├─ numpy [1.11.3|1.12.1|...|1.9.3] would require
│  │  └─ python >=2.7,<2.8.0a0 *, which can be installed;
│  ├─ numpy [1.11.3|1.12.1|...|1.9.3] would require
│  │  └─ python >=3.5,<3.6.0a0 *, which can be installed;
│  ├─ numpy [1.11.3|1.12.1|...|1.9.3] would require
│  │  └─ python >=3.6,<3.7.0a0 *, which can be installed;
│  ├─ numpy [1.11.3|1.14.5|...|1.9.3] would require
│  │  └─ python >=3.7,<3.8.0a0 *, which can be installed;
│  ├─ numpy [1.11.3|1.13.3|...|1.24.3] would require
│  │  └─ python >=3.8,<3.9.0a0 *, which can be installed;
│  ├─ numpy [1.16.6|1.19.2|...|2.0.2] would require
│  │  └─ python >=3.9,<3.10.0a0 *, which can be installed;
│  ├─ numpy [1.21.2|1.21.5|...|2.2.5] would 

ProcessFailedException: failed process: Process(setenv(`'C:\Users\user\.julia\conda\3\x86_64\Scripts\conda.exe' install -y numpy`,["WINDIR=C:\\Windows", "PATH=C:\\Users\\user\\.julia\\conda\\3\\x86_64\\Library\\bin;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Git\\cmd;C:\\Program Files (x86)\\Pulse Secure\\VC142.CRT\\X64\\;C:\\Program Files (x86)\\Pulse Secure\\VC142.CRT\\X86\\;C:\\Program Files (x86)\\Common Files\\Pulse Secure\\TNC Client Plugin\\;C:\\Program Files\\dotnet\\;C:\\Program Files\\Git LFS;C:\\Users\\user\\AppData\\Local\\Programs\\Python\\Python313\\Scripts\\;C:\\Users\\user\\AppData\\Local\\Programs\\Python\\Python313\\;C:\\Users\\user\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\user\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\user\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\Users\\user\\AppData\\Local\\Programs\\MiKTeX\\miktex\\bin\\x64\\;C:\\Program Files\\JetBrains\\PyCharm 2024.3.5\\bin;;C:\\Users\\user\\miniconda3;C:\\Users\\user\\miniconda3\\Scripts;C:\\Users\\user\\AppData\\Roaming\\Python\\Python313\\Scripts;;c:\\users\\user\\.local\\bin;C:\\Users\\user\\AppData\\Local\\Programs\\Julia-1.11.7\\bin", "ELECTRON_RUN_AS_NODE=1", "USERDOMAIN_ROAMINGPROFILE=DESKTOP-CKQTI2G", "VSCODE_CODE_CACHE_PATH=C:\\Users\\user\\AppData\\Roaming\\Code\\CachedData\\03c265b1adee71ac88f833e065f7bb956b60550a", "ZES_ENABLE_SYSMAN=1", "LOCALAPPDATA=C:\\Users\\user\\AppData\\Local", "HOMEPATH=\\Users\\user", "RTOOLS45_HOME=C:\\rtools45", "VSCODE_NLS_CONFIG={\"userLocale\":\"en-us\",\"osLocale\":\"en-gb\",\"resolvedLanguage\":\"en\",\"defaultMessagesFile\":\"C:\\\\Users\\\\user\\\\AppData\\\\Local\\\\Programs\\\\Microsoft VS Code\\\\resources\\\\app\\\\out\\\\nls.messages.json\",\"locale\":\"en-us\",\"availableLanguages\":{}}", "PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 142 Stepping 12, GenuineIntel", "PYCHARM=C:\\Program Files\\JetBrains\\PyCharm 2024.3.5\\bin;", "NUMBER_OF_PROCESSORS=8", "VSCODE_PID=16536", "PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC", "SESSIONNAME=Console", "VSCODE_IPC_HOOK=\\\\.\\pipe\\ee27be9a-1.105.0-main-sock", "SYSTEMROOT=C:\\Windows", "APPDATA=C:\\Users\\user\\AppData\\Roaming", "PSMODULEPATH=C:\\Program Files\\WindowsPowerShell\\Modules;C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\Modules", "COMMONPROGRAMW6432=C:\\Program Files\\Common Files", "PROGRAMDATA=C:\\ProgramData", "PUBLIC=C:\\Users\\Public", "USERDOMAIN=DESKTOP-CKQTI2G", "OS=Windows_NT", "PROCESSOR_REVISION=8e0c", "EFC_11964_1592913036=1", "TMP=C:\\Users\\user\\AppData\\Local\\Temp", "FONTCONFIG_FILE=C:\\Users\\user\\.julia\\artifacts\\cb74fa09f359d430ea01be69a61cd98b50ec8c22\\etc\\fonts\\fonts.conf", "VSCODE_ESM_ENTRYPOINT=vs/workbench/api/node/extensionHostProcess", "COMMONPROGRAMFILES(X86)=C:\\Program Files (x86)\\Common Files", "COMSPEC=C:\\Windows\\system32\\cmd.exe", "OPENBLAS_DEFAULT_NUM_THREADS=1", "ALLUSERSPROFILE=C:\\ProgramData", "VSCODE_CRASH_REPORTER_PROCESS_TYPE=extensionHost", "COMMONPROGRAMFILES=C:\\Program Files\\Common Files", "COMPUTERNAME=DESKTOP-CKQTI2G", "ONEDRIVE=C:\\Users\\user\\OneDrive", "VSCODE_CWD=C:\\Users\\user\\AppData\\Local\\Programs\\Microsoft VS Code", "VSCODE_L10N_BUNDLE_LOCATION=", "VSCODE_HANDLES_UNCAUGHT_ERRORS=true", "GRDIR=C:\\Users\\user\\.julia\\artifacts\\a54b856f4d0dfe6301d42d55eba59ee6c86756c2", "USERNAME=user", "__PSLOCKDOWNPOLICY=0", "GIT_LFS_PATH=C:\\Program Files\\Git LFS", "PROGRAMFILES(X86)=C:\\Program Files (x86)", "PROGRAMFILES=C:\\Program Files", "CHROME_CRASHPAD_PIPE_NAME=\\\\.\\pipe\\crashpad_16536_WZQJHNGTZKGPQWIV", "CONDARC=C:\\Users\\user\\.julia\\conda\\3\\x86_64\\condarc-julia.yml", "LOGONSERVER=\\\\DESKTOP-CKQTI2G", "ONEDRIVECONSUMER=C:\\Users\\user\\OneDrive", "USERPROFILE=C:\\Users\\user", "DRIVERDATA=C:\\Windows\\System32\\Drivers\\DriverData", "ORIGINAL_XDG_CURRENT_DESKTOP=undefined", "FONTCONFIG_PATH=C:\\Users\\user\\.julia\\artifacts\\cb74fa09f359d430ea01be69a61cd98b50ec8c22\\etc\\fonts", "CONDA_PREFIX=C:\\Users\\user\\.julia\\conda\\3\\x86_64", "FPS_BROWSER_USER_PROFILE_STRING=Default", "PROCESSOR_LEVEL=6", "SYSTEMDRIVE=C:", "FPS_BROWSER_APP_PROFILE_STRING=Internet Explorer", "PROGRAMW6432=C:\\Program Files", "TEMP=C:\\Users\\user\\AppData\\Local\\Temp", "HOMEDRIVE=C:", "OPENBLAS_MAIN_FREE=1", "PROCESSOR_ARCHITECTURE=AMD64", "PYTHONIOENCODING=UTF-8"]), ProcessExited(1)) [1]


In [28]:
numpy_sum = pyimport("numpy")["sum"]

py_numpy_bench = @benchmark $numpy_sum($a)

UndefVarError: UndefVarError: `pyimport` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [29]:
numpy_sum(a)

UndefVarError: UndefVarError: `numpy_sum` not defined in `Main`
Suggestion: add an appropriate import or assignment. This global was declared but not assigned.

In [30]:
numpy_sum(a) ≈ sum(a)

UndefVarError: UndefVarError: `numpy_sum` not defined in `Main`
Suggestion: add an appropriate import or assignment. This global was declared but not assigned.

In [31]:
d["Python numpy"] = minimum(py_numpy_bench.times) / 1e6
d

UndefVarError: UndefVarError: `py_numpy_bench` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

# 5. Python, hand-written 

In [32]:
py"""
def py_sum(A):
    s = 0.0
    for a in A:
        s += a
    return s
"""

sum_py = py"py_sum"

LoadError: LoadError: UndefVarError: `@py_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at c:\Users\user\Documents\Julia_Projects\Introduction-to-Julia\jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X65sZmlsZQ==.jl:1

In [33]:
py_hand = @benchmark $sum_py($a)

UndefVarError: UndefVarError: `sum_py` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [34]:
sum_py(a)

UndefVarError: UndefVarError: `sum_py` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [35]:
sum_py(a) ≈ sum(a)

UndefVarError: UndefVarError: `sum_py` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

In [36]:
d["Python hand-written"] = minimum(py_hand.times) / 1e6
d

UndefVarError: UndefVarError: `py_hand` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

# 6. Julia (built-in) 

## Written directly in Julia, not in C!

In [37]:
@which sum(a)

In [38]:
j_bench = @benchmark sum($a)

BenchmarkTools.Trial: 751 samples with 1 evaluation per sample.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m4.057 ms[22m[39m … [35m  9.818 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m6.657 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m6.605 ms[22m[39m ± [32m640.156 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▁[39m [39m▂[39m▂[39m▅[39m█[32m▇[39m[34m▆[39m[39m▅[39m▅[39m▅[39m▅[39m▅[39m▅[39m▄[39m▃[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▇[39m▄[39m▄[39m▅[39m▆

In [39]:
d["Julia built-in"] = minimum(j_bench.times) / 1e6
d

Dict{Any, Any} with 1 entry:
  "Julia built-in" => 4.0568

# 7. Julia (hand-written) 

In [40]:
function mysum(A)   
    s = 0.0 # s = zero(eltype(a))
    for a in A
        s += a
    end
    s
end

mysum (generic function with 1 method)

In [41]:
j_bench_hand = @benchmark mysum($a)

BenchmarkTools.Trial: 372 samples with 1 evaluation per sample.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 9.980 ms[22m[39m … [35m23.576 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m12.312 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m13.432 ms[22m[39m ± [32m 2.859 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m▃[39m▅[39m▆[39m▃[39m█[39m▇[39m▂[39m▂[34m [39m[39m [39m▂[39m [39m▂[39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m▄[39m▄[39m█[39m█

In [42]:
d["Julia hand-written"] = minimum(j_bench_hand.times) / 1e6
d

Dict{Any, Any} with 2 entries:
  "Julia hand-written" => 9.9797
  "Julia built-in"     => 4.0568

# 8. Julia (hand-written w. simd) 

In [43]:
function mysum_simd(A)   
    s = 0.0 # s = zero(eltype(A))
    @simd for a in A
        s += a
    end
    s
end

mysum_simd (generic function with 1 method)

In [44]:
j_bench_hand_simd = @benchmark mysum_simd($a)

BenchmarkTools.Trial: 799 samples with 1 evaluation per sample.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m3.890 ms[22m[39m … [35m  9.062 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m6.428 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m6.238 ms[22m[39m ± [32m814.785 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m▁[39m▅[34m▅[39m[39m█[39m▅[39m▂[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m▁[39m▂[39m▂[39m▂

In [45]:
mysum_simd(a)

4.999882072827319e6

In [46]:
d["Julia hand-written simd"] = minimum(j_bench_hand_simd.times) / 1e6
d

Dict{Any, Any} with 3 entries:
  "Julia hand-written simd" => 3.8897
  "Julia hand-written"      => 9.9797
  "Julia built-in"          => 4.0568

# Summary

In [47]:
for (key, value) in sort(collect(d), by=last)
    println(rpad(key, 25, "."), lpad(round(value; digits=1), 6, "."))
end

Julia hand-written simd.....3.9
Julia built-in..............4.1
Julia hand-written.........10.0
