# Session 5: Fast Data

In this session, we cover the following major topics:
1. Fast Numbers, and
2. Using `Arrays`.

In [2]:
using Pkg;
Pkg.activate(".");
Pkg.add("BenchmarkTools");

using BenchmarkTools

[32m[1m  Activating[22m[39m environment at `~/Documents/GitHub/Phys215-202122-1/05-Fast-Data/Project.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Documents/GitHub/Phys215-202122-1/05-Fast-Data/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Documents/GitHub/Phys215-202122-1/05-Fast-Data/Manifest.toml`


# Session 5 OKR

**OBJECTIVE:** Examine fast data types and structures in Julia

 - [ ] **KR1:** Shown that the default integer type of your Julia instance is the same as the machine word size. Note that most computers today operates with 64-bit processors or that `WORD_SIZE` is now usually 64.
 - [ ] **KR2:** Explored the different sizes of known _basic data types_ such as `Int8`, `Int16`, ..., `Float16`, `Float32`, `Float64`, and their corresponding `Big*` variables (i.e. `BigInt` and `BigFloat`) in terms of the following: 
     + Explain the result of `sizeof()` for `Int` in relation to your machine word size.
     + `typemax()`, `typemin()` and `eps()` values
     + `maxintfloat()` value
     + `bitstring()` output and `lengths(bitstring())` output
     + Differentiate between the results of `bitstring(3)` and `bitstring(3.0)`
 - [ ] **KR3:** Discuss the practicality of using `Float64` in many ordinary computational problems. Can we use `Float64` to represent numbers beyond its `typemax()`? The functions in KR2 may be useful in this regard.
 
The IEEE 754 standard for double-precision floating-point format[^1] is universal.

[^1]:https://en.wikipedia.org/wiki/Double-precision_floating-point_format#IEEE_754_double-precision_binary_floating-point_format:_binary64

# Fast numbers in Julia

> Integers in Julia are stored as system integers.... The `Int` type alias represents the actual integer type used by the system. [`Int32` for 32-bit machines; `Int64` for 64-bit machines.]

[^1]:Sengupta,Avik. Julia High Performance: Optimizations, distributed computing, multithreading, and GPU programming with Julia 1.0 and beyond, 2nd Edition (p. 77). Packt Publishing.

## Machine/Processor bit size

FOR BASH-like CLI: You can use `uname -m` to examine the processor type of your machine.
The command `uname -a` provide `a`ll the relevant machine information.

In [3]:
;uname -m

x86_64


**Note** that the semicolon indicates that the command is a bash command.
You may need to modify that for non-bash CLI.

Thus, the system `WORD_SIZE` becomes the `Int` size (_in bits_) of the Julia installed.
Check out `?sizeof()` for the output of the command.

In [4]:
Sys.WORD_SIZE

64

In [5]:
sizeof(Int)

8

In [6]:
sizeof(Int) == sizeof(Int64)

true

## Mem size and allocation scheme

Simple `Int` type and `FloatX` type.

In [7]:
println(bitstring(3))
length(bitstring(3))

0000000000000000000000000000000000000000000000000000000000000011


64

In [8]:
println(bitstring(3.0))
length(bitstring(3.0))

0100000000001000000000000000000000000000000000000000000000000000


64

- 💡 Same length; different meaning.
- 📖 Check out floating point representation standards for the bit assignment for `Float64`

## The more `const` the better?

It seems that Julia accesses memory references that are constant.
In `c/c++` static variables are preferred for speed.

Imutable objects are better than mutable ones.

In [9]:
p_CONST = 1.0
p = 1.0

println("Same value? ", p == p_CONST)
println("Same reference? ", p === p_CONST)

Same value? true
Same reference? true


In [10]:
markconst = @benchmark for _ in 1:1_000_000 x = p_CONST end
markvarbl = @benchmark for _ in 1:1_000_000 x = p end

println("Const is faster by: ", round( median(markvarbl.times)/median(markconst.times) ,digits=3))

Const is faster by: 0.948


Thus, there's not much difference!

# Fast Array operations

Specific types:
1. `Vector{T}`: Alias of `Array{T,1}`
2. `Matrix{T}`: Alias of `Array{T,2}`

In [11]:
Vector

Vector{T} where T (alias for Array{T, 1} where T)

In [12]:
Matrix

Matrix{T} where T (alias for Array{T, 2} where T)

## Memory arrangement (column major)

Use `A[row,col]` instead of `A[col,row]`.

Fast access with `col`s first.

In [13]:
A = rand(4,5)

4×5 Matrix{Float64}:
 0.773986  0.0607674  0.555536  0.481478    0.676385
 0.890183  0.882493   0.505842  0.0892964   0.298351
 0.657413  0.916998   0.684062  0.74147     0.00115984
 0.51517   0.403219   0.946533  0.00211842  0.571155

In [14]:
A[1,2]

0.0607673802450337

## `Array`s behave like strided 1D array

In [15]:
A[:]

20-element Vector{Float64}:
 0.773985633498228
 0.8901833282400999
 0.6574128340826788
 0.5151695175457838
 0.0607673802450337
 0.8824928834811963
 0.9169978560490541
 0.40321870930746906
 0.555535704032674
 0.505841963806104
 0.6840618238541907
 0.9465327075479595
 0.48147841952693615
 0.08929639754408236
 0.7414696497347621
 0.0021184176915254316
 0.6763848269555364
 0.29835087718167186
 0.0011598410956996652
 0.5711547788662754

## Different `Array` "modes"

In [16]:
(nrows,ncols) = size(A)

println("nrows = ", nrows)
println("ncols = ", ncols)
println("length(A) = ", length(A))

nrows = 4
ncols = 5
length(A) = 20


### ASIDE: `size(::Array) ::Tuple`

`Tuple` is **like** an ordered list.
Look for it via `?Tuple`.

In [17]:
tt = (1,2,3)
println("typeof(tt) : ", typeof(tt))

pp = 1 => 2
println("typeof(pp) : ", typeof(pp))
println("        pp : ", pp)

typeof(tt) : Tuple{Int64, Int64, Int64}
typeof(pp) : Pair{Int64, Int64}
        pp : 1 => 2


## Different `Array` "modes"

In [47]:
display(A)
display(A[:,1])
display(A[1,:])

4×5 Matrix{Float64}:
 0.0133033  0.555081   0.0967906  0.149183  0.504905
 0.458832   0.30625    0.99031    0.763878  0.0997921
 0.240462   0.936124   0.0803992  0.504495  0.603295
 0.45288    0.0465646  0.0555126  0.574368  0.729041

4-element Vector{Float64}:
 0.013303303793349652
 0.4588317808692648
 0.24046219556526394
 0.4528802542164465

5-element Vector{Float64}:
 0.013303303793349652
 0.5550808839989219
 0.09679063279937639
 0.14918315796445736
 0.5049052299806962

## Speed of accessing `Array` elements

Given difference in the column-major storage arrangement (`MATLAB`, Julia, etc) from the usual row-major arrangement (`c/c++`, etc), access speed differs depending on how the elements are accessed.

## Pass by reference within pass-by-sharing

Passing the reference variable is often faster than passing by value.
The _pass by sharing_ seems enigmatic than I thought!

In [145]:
vold = rand(5);

In [146]:
function assign(vou::Array, vin::Array)
    vou = vin
end

assign (generic function with 2 methods)

In [147]:
println("vold = ", round.(vold,digits=5))
println("assign(vnew<-vold):")
vnew = zeros(Float64,5)
assign(vnew,vold)
println("new:")
println("    ", round.(vnew,digits=5))
println("same values? ", vnew == vold)
println("same reference? ", vnew === vold)

vold = [0.85893, 0.73287, 0.23025, 0.6831, 0.37286]
assign(vnew<-vold):
new:
    [0.0, 0.0, 0.0, 0.0, 0.0]
same values? false
same reference? false


+ 👍 Simple assignment **within** the function is local and does not affect the outside value.

## Forced mutation

Values of the pointed values gets modified.

In [148]:
function assign!(vou::Array, vin::Array)
    vou[:] = vin[:]
end

assign! (generic function with 2 methods)

In [149]:
println("vold = ", round.(vold,digits=5))
println("assign!(vnew[:]<-vold[:]):")
vnew = zeros(Float64,5)
assign!(vnew,vold)
println("new:")
println("    ", round.(vnew,digits=5))
println("same values? ", vnew == vold)
println("same reference? ", vnew === vold) 

vold = [0.85893, 0.73287, 0.23025, 0.6831, 0.37286]
assign!(vnew[:]<-vold[:]):
new:
    [0.85893, 0.73287, 0.23025, 0.6831, 0.37286]
same values? true
same reference? false


- 👍 Values are copied if the reference is used to assign.
- 💡 The use of `vnew[:]` and `vold[:]` forces Julia to use references (?).

## Naive implementations

`Base` functions:
 + In-place `copy!()`
 + In-place `map!()`

In [152]:
?copy!

search: [0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22m[0m[1m![22m [0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22mto[0m[1m![22m [0m[1mc[22mircc[0m[1mo[22m[0m[1mp[22m[0m[1my[22m[0m[1m![22m unsafe_[0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22mto[0m[1m![22m [0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22m [0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22msign deep[0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22m



```
copy!(dst, src) -> dst
```

In-place [`copy`](@ref) of `src` into `dst`, discarding any pre-existing elements in `dst`. If `dst` and `src` are of the same type, `dst == src` should hold after the call. If `dst` and `src` are multidimensional arrays, they must have equal [`axes`](@ref). See also [`copyto!`](@ref).

!!! compat "Julia 1.1"
    This method requires at least Julia 1.1. In Julia 1.0 this method is available from the `Future` standard library as `Future.copy!`.



In [153]:
?map!

search: [0m[1mm[22m[0m[1ma[22m[0m[1mp[22m[0m[1m![22m async[0m[1mm[22m[0m[1ma[22m[0m[1mp[22m[0m[1m![22m [0m[1mm[22m[0m[1ma[22m[0m[1mp[22m [0m[1mm[22m[0m[1ma[22m[0m[1mp[22mfoldr [0m[1mm[22m[0m[1ma[22m[0m[1mp[22mfoldl [0m[1mm[22m[0m[1ma[22m[0m[1mp[22mslices [0m[1mm[22m[0m[1ma[22m[0m[1mp[22mreduce async[0m[1mm[22m[0m[1ma[22m[0m[1mp[22m



```
map!(function, destination, collection...)
```

Like [`map`](@ref), but stores the result in `destination` rather than a new collection. `destination` must be at least as large as the first collection.

# Examples

```jldoctest
julia> a = zeros(3);

julia> map!(x -> x * 2, a, [1, 2, 3]);

julia> a
3-element Vector{Float64}:
 2.0
 4.0
 6.0
```

---

```
map!(f, values(dict::AbstractDict))
```

Modifies `dict` by transforming each value from `val` to `f(val)`. Note that the type of `dict` cannot be changed: if `f(val)` is not an instance of the value type of `dict` then it will be converted to the value type if possible and otherwise raise an error.

!!! compat "Julia 1.2"
    `map!(f, values(dict::AbstractDict))` requires Julia 1.2 or later.


# Examples

```jldoctest
julia> d = Dict(:a => 1, :b => 2)
Dict{Symbol, Int64} with 2 entries:
  :a => 1
  :b => 2

julia> map!(v -> v-1, values(d))
ValueIterator for a Dict{Symbol, Int64} with 2 entries. Values:
  0
  1
```


In [154]:
mvold = rand(5_000);
mvnew = zeros(Float64, 5_000);
mark0 = @benchmark copy!($mvnew,$mvold)
mark0a = @benchmark map!(x->x,mvnew,mvold)
mark1 = @benchmark assign!($mvnew,$mvold)

println("map!(): mark0a/mark0 ≈ ", round( median(mark0a.times)/median(mark0.times), digits=5 ))
println("Naive: mark1/mark0 ≈ ", round( median(mark1.times)/median(mark0.times), digits=5 ))

map!(): mark0a/mark0 ≈ 1.05375
Naive: mark1/mark0 ≈ 23.75206


 - 💡 `map!()` fares the same with `copy!()`
 - ❗ Naive implementation via a function that ensures copying **fares much worse**.

## Bang or no bang

❗: That is the question.

In [155]:
function assign_nobang(vou::Array, vin::Array)
    vou[:] = vin[:]
end

assign_nobang (generic function with 1 method)

❗Effect is same:

In [156]:
println("assign_nobang(vnew[:]<-vold[:]):")
vnew = zeros(Float64,5)
assign_nobang(vnew,vold)
println("new:")
println("    ", round.(vnew,digits=5))
println("same values? ", vnew == vold)
println("same reference? ", vnew === vold)

assign_nobang(vnew[:]<-vold[:]):
new:
    [0.85893, 0.73287, 0.23025, 0.6831, 0.37286]
same values? true
same reference? false


- 📓 Using the bang `!` after the function name is **only** used to indicate **--not force--** changes in the arguments.
- 💡 Function name has nothing to do with modifying the argument.
- 💡 No modifications to the argument naming or style to ensure modification.

## _Manual_ loop ends up faster

The loop per element is almost always better in Julia.
The effects can be the same.

In [157]:
function assign_loop!(vou::Array, vin::Array)
    for i in eachindex(vin)
        vou[i] = vin[i]
    end
end

function assign_vect!(vou::Array, vin::Array)
    vou .= vin # broadcasting loop, a.k.a. "vectorized"
end

assign_vect! (generic function with 1 method)

In [158]:
println("assign_loop!(for loop):")
vnew = zeros(Float64,5)
assign_loop!(vnew,vold)
println("new:")
println("    ", round.(vnew,digits=5))
println("same values? ", vnew == vold)
println("same reference? ", vnew === vold)

assign_loop!(for loop):
new:
    [0.85893, 0.73287, 0.23025, 0.6831, 0.37286]
same values? true
same reference? false


In [159]:
mark2 = @benchmark assign_loop!($vnew,$vold)
mark3 = @benchmark assign_vect!($vnew,$vold)

println("map!(): mark0a/mark0 ≈ ", round( median(mark0a.times)/median(mark0.times), digits=5 ))
println("Naive: mark1/mark0 ≈ ", round( median(mark1.times)/median(mark0.times), digits=5 ))
println("Loops: mark2/mark0 ≈ ", round( median(mark2.times)/median(mark0.times), digits=5 ))
println("Broadcast: mark3/mark0 ≈ ", round( median(mark3.times)/median(mark0.times), digits=5 ))
println("Loop vs Broadcast ≈ ", round( median(mark3.times)/median(mark2.times), digits=5 ))

map!(): mark0a/mark0 ≈ 1.05375
Naive: mark1/mark0 ≈ 23.75206
Loops: mark2/mark0 ≈ 0.00665
Broadcast: mark3/mark0 ≈ 0.02003
Loop vs Broadcast ≈ 3.01387


❗Looping can be more efficient than broadcasting. Or there are broadcasting loops faster than what's used here?

In [162]:
function assign_return(vin::Array)
    vou = vin;
    return vou
end

assign_return (generic function with 1 method)

In [164]:
mark4 = @benchmark vnew = assign_return($vold)

println("map!(): mark0a/mark0 ≈ ", round( median(mark0a.times)/median(mark0.times), digits=5 ))
println("Naive: mark1/mark0 ≈ ", round( median(mark1.times)/median(mark0.times), digits=5 ))
println("Loops: mark2/mark0 ≈ ", round( median(mark2.times)/median(mark0.times), digits=5 ))
println("Broadcast: mark3/mark0 ≈ ", round( median(mark3.times)/median(mark0.times), digits=5 ))
println("Loop vs Broadcast ≈ ", round( median(mark3.times)/median(mark2.times), digits=5 ))
println("Return: mark4/mark0 ≈ ",  round( median(mark4.times)/median(mark0.times), digits=5 ))
println("Return vs Loop ≈ ",  round( median(mark2.times)/median(mark4.times), digits=5 ))

map!(): mark0a/mark0 ≈ 1.05375
Naive: mark1/mark0 ≈ 23.75206
Loops: mark2/mark0 ≈ 0.00665
Broadcast: mark3/mark0 ≈ 0.02003
Loop vs Broadcast ≈ 3.01387
Return: mark4/mark0 ≈ 0.0018
Return vs Loop ≈ 3.69117


# Structured data

