# Julia is fast: Comparing performance using benchmarking techniques


We often use benchmarks to compare languages. In this notebook we will try to explore benchmarking by taking example of simple sum function

## `sum` function

Consider the function `sum(a)` which adds the elements of a sequence supplied to it i.e: 
$$
\mathrm{sum}(a) = \sum_{i=1}^n a_i,
$$

where $n$ is the length of sequence $a$

In [1]:
# Vector to store random numbers between 0-1.
a = rand(10^6)

1000000-element Array{Float64,1}:
 0.08705364460495324
 0.7930448158242029
 0.5490600547756832
 0.32802035406416397
 0.7634176909249002
 0.5187306491888561
 0.9455408768708209
 0.9097727311507622
 0.3316496181089985
 0.27181569886761303
 0.2574728978408143
 0.8267245829526333
 0.16609820293163913
 ⋮
 0.7945897326470248
 0.019332576319549766
 0.8991274034971959
 0.1541851749469909
 0.22306811579255248
 0.40685665882655786
 0.31814366116506165
 0.7929774084109056
 0.690977428943341
 0.15349026600824645
 0.7707953396293554
 0.32271611677727896

In [2]:
sum(a)

500028.72163173417


This should be nearly $0.5X{10}^6$ since $0.5$ is the mean entry here.

# Benchmarking in a few ways in a few langugages

We can use the `@time` macro to get the time required to execute that function.

In [3]:
@time sum(a)

  0.000818 seconds (1 allocation: 16 bytes)


500028.72163173417

In [4]:
@time sum(a)

  0.001751 seconds (1 allocation: 16 bytes)


500028.72163173417

In [5]:
@time sum(a)

  0.000783 seconds (1 allocation: 16 bytes)


500028.72163173417

Here we can see that `@time` may not produce the same result everytime so it is not our best choice for benchmarking!

Instead we can use `BenchmarkTools.jl` package available in Julia to easily get more accurate results.

In [6]:
# using Pkg
# Pkg.add("BenchmarkTools")

In [7]:
using BenchmarkTools

## 1. `sum` function in C

In [8]:
using Libdl
C_code = """
#include <stddef.h>
double c_sum(size_t n,double *X){
    double s = 0.0;
    for(size_t i = 0;i<n;i++){
        s += *(X+i);
    }
    return s;
}
"""
# make a temporary file
const Clib = tempname()

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f,C_code)
end

# define Julia function to call the C function:
c_sum(X::Array{Float64}) = ccall(("c_sum", Clib), Float64, (Csize_t, Ptr{Float64}), length(X),X)

c_sum (generic function with 1 method)

In [9]:
c_sum(a)

500028.7216317387

In [10]:
# Comparing the sum calulated by Julia's sum function and C's sum function
c_sum(a) ≈ sum(a) # type \approx and then <TAB> to get the ≈ symbol

true

$ \approx $ is alias for `isapprox` i.e to compare if the supplied values are approximately equal.

Let's check the difference between the two `sum` functions: 

In [11]:
c_sum(a) - sum(a)

4.540197551250458e-9

We can benchmark the C code directly from Julia:

In [33]:
c_bench = @benchmark $c_sum($a)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.704 ms (0.00% GC)
  median time:      1.738 ms (0.00% GC)
  mean time:        1.740 ms (0.00% GC)
  maximum time:     3.174 ms (0.00% GC)
  --------------
  samples:          2852
  evals/sample:     1

**Benchmark tools** execute the same code sevral times to get the accurate result. We can see the summary of benchmarking above.

In [13]:
println("Fastest time for C: $(minimum(c_bench.times)/10^6) ms")

Fastest time for C: 1.706173 ms


Now let's create a dictionary to store fastest time for different scenerios:

In [84]:
sum_times = Dict("C",(minimum(c_bench.times)/10^6)) # time in milliseconds

MethodError: MethodError: no method matching Dict(::String, ::Float64)
Closest candidates are:
  Dict(::Any) at dict.jl:127

## 2. `sum` function in Python
We can use the `pycall` package to use Python's functions in Julia:

In [15]:
#using Pkg
#Pkg.add("PyCall")

In [16]:
using PyCall

In [17]:
a_list = PyCall.array2py(a);@time PyCall.array2py(rand(10^8));@time PyCall.array2py(rand(10^8));

In [18]:
pysum = pybuiltin("sum")

PyObject <built-in function sum>

In [58]:
pysum(a_list)

500028.7216317387

In [59]:
pysum(a_list) ≈ sum(a)

true

In [28]:
py_list_bench = @benchmark $pysum($a_list)

BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  3
  --------------
  minimum time:     8.649 ms (0.00% GC)
  median time:      8.803 ms (0.00% GC)
  mean time:        8.871 ms (0.00% GC)
  maximum time:     14.262 ms (0.00% GC)
  --------------
  samples:          563
  evals/sample:     1

In [35]:
sum_times["Python built-in"] = minimum(py_list_bench.times)/1e6
sum_times

Dict{Any,Any} with 2 entries:
  "C"               => 1.70617
  "Python built-in" => 8.65179

## 3. Python `numpy`
### Takes advantage and hardware "SIMD", but only works when it work.

`numpy` is an optimized C library, callable from Python. It may be installed within Julia as follows:

In [37]:
Pkg.add("Conda")

[32m[1m   Updating[22m[39m registry at `~/.juliapro/JuliaPro_v1.4.1-1/registries/JuliaPro`
[32m[1m  Resolving[22m[39m package versions...
[32m[1m   Updating[22m[39m `~/.juliapro/JuliaPro_v1.4.1-1/environments/v1.4/Project.toml`
 [90m [8f4d0f93][39m[92m + Conda v1.4.1[39m
[32m[1m   Updating[22m[39m `~/.juliapro/JuliaPro_v1.4.1-1/environments/v1.4/Manifest.toml`
[90m [no changes][39m


In [40]:
using Conda
#Conda.add("numpy")

In [51]:
numpy_sum = pyimport("numpy")."sum"
a_numpy = PyObject(a)

py_numpy_bench = @benchmark $numpy_sum($a_numpy)

BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  3
  --------------
  minimum time:     785.514 μs (0.00% GC)
  median time:      1.083 ms (0.00% GC)
  mean time:        1.108 ms (0.00% GC)
  maximum time:     5.941 ms (0.00% GC)
  --------------
  samples:          4405
  evals/sample:     1

In [57]:
numpy_sum(a_numpy)

500028.72163173446

In [56]:
numpy_sum(a_numpy) ≈ sum(a)

true

In [63]:
sum_times["Python numpy"] = minimum(py_numpy_bench.times)/1e6
sum_times

Dict{Any,Any} with 4 entries:
  "C"                   => 1.70617
  "Python numpy"        => 0.785514
  "Python user-defined" => 56.3724
  "Python built-in"     => 8.65179

## 4. Python, user-defined

In [48]:
py"""
def py_sum(a):
    sum = 0.0
    for i in a:
        sum+=i
    return sum
"""

usrpy_sum = py"py_sum"

PyObject <function py_sum at 0x7f4db0072268>

In [49]:
py_usr = @benchmark $usrpy_sum($a_list)

BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  3
  --------------
  minimum time:     56.372 ms (0.00% GC)
  median time:      58.517 ms (0.00% GC)
  mean time:        59.845 ms (0.00% GC)
  maximum time:     81.006 ms (0.00% GC)
  --------------
  samples:          84
  evals/sample:     1

In [52]:
usrpy_sum(a_list)

500028.7216317387

In [55]:
usrpy_sum(a_list) ≈ sum(a)

true

In [62]:
sum_times["Python user-defined"] = minimum(py_usr.times)/1e6
sum_times

Dict{Any,Any} with 4 entries:
  "C"                   => 1.70617
  "Python numpy"        => 0.791941
  "Python user-defined" => 56.3724
  "Python built-in"     => 8.65179

## Julia (built-in)

Now lets try Julia's built-in sum function:

In [64]:
@which sum(a)

In [65]:
jl_bench = @benchmark sum($a)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     682.417 μs (0.00% GC)
  median time:      748.769 μs (0.00% GC)
  mean time:        784.588 μs (0.00% GC)
  maximum time:     2.353 ms (0.00% GC)
  --------------
  samples:          6224
  evals/sample:     1

In [66]:
sum_times["Julia built-in"] = minimum(jl_bench.times)/1e6
sum_times

Dict{Any,Any} with 5 entries:
  "C"                   => 1.70617
  "Python numpy"        => 0.785514
  "Python user-defined" => 56.3724
  "Python built-in"     => 8.65179
  "Julia built-in"      => 0.682417

## 6. Julia user-defined

In [68]:
function mysum(a)
    sum = 0.0
    for i in a
        sum += i
    end
    sum
end

mysum (generic function with 1 method)

In [69]:
usrjl_bench = @benchmark mysum($a)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.729 ms (0.00% GC)
  median time:      1.762 ms (0.00% GC)
  mean time:        1.772 ms (0.00% GC)
  maximum time:     2.538 ms (0.00% GC)
  --------------
  samples:          2791
  evals/sample:     1

In [70]:
sum_times["Julia hand-written"] = minimum(usrjl_bench.times)/1e6
sum_times

Dict{Any,Any} with 6 entries:
  "C"                   => 1.70617
  "Python numpy"        => 0.785514
  "Python user-defined" => 56.3724
  "Julia hand-written"  => 1.7291
  "Python built-in"     => 8.65179
  "Julia built-in"      => 0.682417

## Summary

In [176]:
key = []
val = []
for (k,v) in sort(collect(sum_times), by=x -> x[2])
    push!(key,k); push!(val,v)
    println(rpad(k,20,"."),lpad(round(v,digits=2),10,"."))
end

Julia built-in............0.68
Python numpy..............0.79
C.........................1.71
Julia hand-written........1.73
Python built-in...........8.65
Python user-defined......56.37
