# Writing fast code

1. Basic compute Architecture
2. Study of a basic program
2. timescales of common instructions
2. allocating memory
3. type stability

# timescales for common instructions

In [62]:
using BenchmarkTools

### Add, multiply, divide, function call
All take order 1 ns 

In [95]:
a, b = -72.0, 55.0
f(x) = x
@btime $a + $b
@btime $a * $b
@btime $a / $b
@btime sin($a)
@btime sqrt(($b)^2 + ($a)^2)
@btime f(a)
@btime rand()

  1.130 ns (0 allocations: 0 bytes)
  1.130 ns (0 allocations: 0 bytes)
  1.130 ns (0 allocations: 0 bytes)
  6.597 ns (0 allocations: 0 bytes)
  1.353 ns (0 allocations: 0 bytes)
  1.129 ns (0 allocations: 0 bytes)
  2.289 ns (0 allocations: 0 bytes)


0.9829889939419227

In [90]:
function timeadd(n)
    s = 0.0
    for i in 1:n
        s += i 
    end
    return s
end

n = 1_000_000_000
@time timeadd(n)

  0.962951 seconds (21.40 k allocations: 1.225 MiB, 0.83% compilation time)


5.00000000067109e17

# allocating memory
Most standard things is to allocate memory before a loop.

In [141]:
# zeros() makes a array of 0 
@btime zeros(3);
@btime zeros(100);
@btime zeros(10000);

  20.884 ns (1 allocation: 80 bytes)
  44.034 ns (1 allocation: 896 bytes)
  3.213 μs (2 allocations: 78.17 KiB)


In [142]:
# allocate memory without filling it with values.
@btime Vector{Float64}(undef, 3);
@btime Vector{Float64}(undef, 100);
@btime Vector{Float64}(undef, 10000);

  19.810 ns (1 allocation: 80 bytes)
  26.095 ns (1 allocation: 896 bytes)
  406.843 ns (2 allocations: 78.17 KiB)


# Writing and Reading from an Array



In [143]:
# make matrix with 10^6 random values.
n = 1000
A = rand(n,n)

1000×1000 Matrix{Float64}:
 0.795366  0.0384061  0.0348913  0.648985  …  0.883733  0.846151  0.0675269
 0.755251  0.495458   0.857065   0.147801     0.966296  0.875652  0.0770365
 0.515005  0.331872   0.59601    0.149106     0.718047  0.344901  0.694572
 0.389452  0.338261   0.989912   0.603325     0.851661  0.576765  0.180598
 0.428939  0.188249   0.442514   0.196765     0.36888   0.748419  0.509066
 0.882728  0.0217228  0.0509673  0.187772  …  0.818283  0.222228  0.909681
 0.230071  0.142156   0.309118   0.1174       0.912287  0.631966  0.209213
 0.472333  0.260234   0.278668   0.529968     0.942357  0.678492  0.270195
 0.354433  0.7078     0.249605   0.230445     0.810704  0.115324  0.26999
 0.780548  0.836304   0.134223   0.447171     0.340376  0.513662  0.64809
 ⋮                                         ⋱                      
 0.114609  0.815257   0.0229175  0.586042     0.817291  0.340336  0.215728
 0.100825  0.239202   0.651607   0.942422     0.250643  0.163928  0.863014
 0.735

In [149]:
function mysum(A)
    s = 0.0
    for i in 1:length(A)
        s += A[i]
    end
    return s
end

mysum (generic function with 1 method)

In [150]:
@btime mysum(A)

  891.611 μs (1 allocation: 16 bytes)


500357.4535311616

In [146]:
@btime sum(A)

  296.977 μs (1 allocation: 16 bytes)


500357.45353116875

In [135]:
n = 1000
A = rand(n,n)
@btime myfill(A, 0.0)
@btime fill!(A,0.0);

  305.895 μs (0 allocations: 0 bytes)
  305.964 μs (0 allocations: 0 bytes)


  0.000008 seconds (1 allocation: 16 bytes)


500000000500000000