# Session 2: Fast machine representation

In this session, we cover the use of fast `Numbers` in Julia.
Particularly:[^1]
- [ ] Demonstrate tradeoff between runtime speed and over- or underflow checks in [number representations in Julia](https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/).
- [ ] Analyze the floating point layout or architecture used by your Julia installation.
- [ ] Show how much `@fastmath` macro speeds up computation with a trades off in some level of accuracy. The `sum_diff()` function in the main book reference may be replicated for this purpose.

----
[^1]: Covers Chapter 5 of Segupta, _Julia High Performance, 2nd Ed._ (Packt Publishing, 2019).

In [9]:
using Pkg;
Pkg.activate(".");
Pkg.add([
     "Plots"
    ,"BenchmarkTools"
]);

using Plots, BenchmarkTools;

[32m[1m  Activating[22m[39m project at `~/Documents/GitHub/Phys215-202324-2/02-Performance`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Documents/GitHub/Phys215-202324-2/02-Performance/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Documents/GitHub/Phys215-202324-2/02-Performance/Manifest.toml`


In [10]:
include("Phys215Tools.jl") #insert pre-typed tool functions, fast and dirty style

floatbits (generic function with 2 methods)

## Fast numbers in Julia

> Integers in Julia are stored as system integers.... The `Int` type alias represents the actual integer type used by the system. `Int32` for 32-bit machines; `Int64` for 64-bit machines.[^2]

----
[^2]: Segupta, Julia High Performance, 2nd Ed. (Packt Publishing, 2019).

## Machine bit size and representation

- FOR BASH-like CLI: Use `uname -m` to examine the processor type of your machine.
    - The command `uname -a` provides `a`ll the relevant machine information.
- Default integer representation depends on machine word size.

In [29]:
; uname -vpm

Darwin Kernel Version 23.3.0: Wed Dec 20 21:28:58 PST 2023; root:xnu-10002.81.5~7/RELEASE_X86_64 x86_64 i386


**Note** that the semicolon indicates that the command is a bash command.
You may need to modify that for non-bash CLI.

### System `WORD_SIZE`

- System `WORD_SIZE` becomes the `Int` size (_in bits_) of the Julia installed.
- Check out `? sizeof()` for the output of the command.
- Use `Sys` to indicate namespace or module scoping.

In [30]:
@show Sys.WORD_SIZE;

Sys.WORD_SIZE = 64


### Use `sizeof()` for byte size

- One bit = 1 two-state unit in physical memory
- One byte = 8 bits, 2^8 states in physical memory

In [31]:
? sizeof()

```
sizeof(T::DataType)
sizeof(obj)
```

Size, in bytes, of the canonical binary representation of the given `DataType` `T`, if any. Or the size, in bytes, of object `obj` if it is not a `DataType`.

See also [`Base.summarysize`](@ref).

# Examples

```jldoctest
julia> sizeof(Float32)
4

julia> sizeof(ComplexF64)
16

julia> sizeof(1.0)
8

julia> sizeof(collect(1.0:10.0))
80

julia> struct StructWithPadding
           x::Int64
           flag::Bool
       end

julia> sizeof(StructWithPadding) # not the sum of `sizeof` of fields due to padding
16

julia> sizeof(Int64) + sizeof(Bool) # different from above
9
```

If `DataType` `T` does not have a specific size, an error is thrown.

```jldoctest
julia> sizeof(AbstractArray)
ERROR: Abstract type AbstractArray does not have a definite size.
Stacktrace:
[...]
```

---

```
sizeof(str::AbstractString)
```

Size, in bytes, of the string `str`. Equal to the number of code units in `str` multiplied by the size, in bytes, of one code unit in `str`.

# Examples

```jldoctest
julia> sizeof("")
0

julia> sizeof("∀")
3
```


### `sizeof()` different `Int` type

- `Int` uses the machine default integer size
- Bigger integers may be used up to size 128 bytes (2^1024 physical states total)

In [32]:
@show sizeof(Int); # uses machine's default integer representation
@show sizeof(Int32);
@show sizeof(Int64);
@show sizeof(Int128);

sizeof(Int) = 8
sizeof(Int32) = 4
sizeof(Int64) = 8
sizeof(Int128) = 16


## Machine bit representation of `Int`s

- Similar to base-10 representation for whole numbers
- Applicable only for whole numbers
- Different scheme used for numbers with fractional part: floating-point representation

## Algorithm for finding bit representation

- [Divide by two method](https://en.wikipedia.org/wiki/Binary_number#Decimal_to_binary)
- `bitstring()` function exists within Julia.

In [28]:
? bitstring

search: [0m[1mb[22m[0m[1mi[22m[0m[1mt[22m[0m[1ms[22m[0m[1mt[22m[0m[1mr[22m[0m[1mi[22m[0m[1mn[22m[0m[1mg[22m Su[0m[1mb[22mst[0m[1mi[22m[0m[1mt[22mution[0m[1mS[22m[0m[1mt[22m[0m[1mr[22m[0m[1mi[22m[0m[1mn[22m[0m[1mg[22m



```
bitstring(n)
```

A string giving the literal bit representation of a primitive type.

See also [`count_ones`](@ref), [`count_zeros`](@ref), [`digits`](@ref).

# Examples

```jldoctest
julia> bitstring(Int32(4))
"00000000000000000000000000000100"

julia> bitstring(2.2)
"0100000000000001100110011001100110011001100110011001100110011010"
```


## Machine representation (IEEE 754 standards)

- Not all numbers perfectly represented in machines
- Binary representation limitations results to under- and overflows
- Floating-point representation in base 2 used for real numbers
- Machine representation covered by [the IEEE Standard for Floating-Point Arithmetic (IEEE 754)](https://en.wikipedia.org/wiki/IEEE_754)
- illustration found in [GeeksForGeeks page (:warning: with paid ads)](https://www.geeksforgeeks.org/ieee-standard-754-floating-point-numbers/).

## Mem size and allocation scheme

Simple `Int` type and `FloatX` type.

In [19]:
println(bitstring(3))
@show length(bitstring(3));

0000000000000000000000000000000000000000000000000000000000000011
length(bitstring(3)) = 64


In [20]:
println(bitstring(3.0))
@show length(bitstring(3.0));

0100000000001000000000000000000000000000000000000000000000000000
length(bitstring(3.0)) = 64


- 💡 Same length; different meaning.
- 📖 Check out floating point representation standards for the bit assignment for `Float64`

## The more `const` the better?

It seems that Julia accesses memory references that are constant.
In `c/c++` static variables are preferred for speed.

Imutable objects are better than mutable ones.

In [21]:
const p_CONST = 1.0
p = 1.0

println("Same value? ", p == p_CONST)
println("Same reference? ", p === p_CONST)

Same value? true
Same reference? true


In [23]:
markconst = @benchmark for _ in 1:1_000_000 x = p_CONST end
markvarbl = @benchmark for _ in 1:1_000_000 x = p end

println("Const is faster by: ", round( median(markvarbl.times)/median(markconst.times) ,digits=5))

Const is faster by: 1.0


Thus, there's not much difference!

# Fast Array operations

Specific types:
1. `Vector{T}`: Alias of `Array{T,1}`
2. `Matrix{T}`: Alias of `Array{T,2}`

In [24]:
Vector

Vector[90m (alias for [39m[90mArray{T, 1} where T[39m[90m)[39m

In [25]:
Matrix

Matrix[90m (alias for [39m[90mArray{T, 2} where T[39m[90m)[39m

## Memory arrangement (column major)

Use `A[row,col]` instead of `A[col,row]`.

Fast access with `col`s first.

In [31]:
A = rand(100,100);

In [32]:
A[1,2]

0.6625887274501059

In [33]:
A[:,1]

100-element Vector{Float64}:
 0.5632868221427215
 0.23967356660664552
 0.6521297832109703
 0.6843482571753418
 0.7547811970441951
 0.9423338532711044
 0.33352337317656233
 0.11566987920798566
 0.5724909420297839
 0.8826047851751873
 0.8265203916791758
 0.02831831976880783
 0.847785301741143
 ⋮
 0.7496527234005037
 0.260354874165182
 0.9017603932187297
 0.49478355384994144
 0.612275169602041
 0.33630416596386914
 0.88011739903376
 0.40083644510767125
 0.44822939113747096
 0.7214554849846266
 0.4132649951538304
 0.0035678336340345673

In [35]:
mark1 = @benchmark for a in A[:,2] a=rand() end
mark0 = @benchmark for a in A[2,:] a=rand() end

println("Const is faster by: ", round( median(mark1.times)/median(mark0.times) ,digits=5))

Const is faster by: 1.0


## `Array`s behave like strided 1D array

In [36]:
A[:]

10000-element Vector{Float64}:
 0.5632868221427215
 0.23967356660664552
 0.6521297832109703
 0.6843482571753418
 0.7547811970441951
 0.9423338532711044
 0.33352337317656233
 0.11566987920798566
 0.5724909420297839
 0.8826047851751873
 0.8265203916791758
 0.02831831976880783
 0.847785301741143
 ⋮
 0.5575807296192441
 0.5536318166779869
 0.36181216124260696
 0.6194528928249751
 0.6419300976546262
 0.9331663311777869
 0.2959521283862042
 0.6293802951781041
 0.5973047345025132
 0.31104633733970866
 0.6038211801949094
 0.5649186730385296

## Different `Array` "modes"

In [39]:
(nrows,ncols) = size(A)

@show nrows
@show ncols;
@show length(A);

nrows = 100
ncols = 100
length(A) = 10000


### ASIDE: `size(::Array) ::Tuple`

`Tuple` is **like** an ordered list.
Look for it via `?Tuple`.

In [43]:
tt = (1,2,3)
println("typeof(tt) : ", typeof(tt))

pp = 1 => 2
println("typeof(pp) : ", typeof(pp))
println("        pp : ", pp)

@show typeof(tt);
@show typeof(pp);
@show tt;
@show pp;

typeof(tt) : Tuple{Int64, Int64, Int64}
typeof(pp) : Pair{Int64, Int64}
        pp : 1 => 2
typeof(tt) = Tuple{Int64, Int64, Int64}
typeof(pp) = Pair{Int64, Int64}
tt = (1, 2, 3)
pp = 1 => 2


## Different `Array` "modes"

In [None]:
#display(A)
#display(A[:,1])
#display(A[1,:])

@show A;
@show A[:,1];
@show A[1,:];

## Speed of accessing `Array` elements

Given difference in the column-major storage arrangement (`MATLAB`, Julia, etc) from the usual row-major arrangement (`c/c++`, etc), access speed differs depending on how the elements are accessed.

## Pass by reference within pass-by-sharing

Passing the reference variable is often faster than passing by value.
The _pass by sharing_ seems enigmatic than I thought!

In [145]:
vold = rand(5);

In [146]:
function assign(vou::Array, vin::Array)
    vou = vin
end

assign (generic function with 2 methods)

In [147]:
println("vold = ", round.(vold,digits=5))
println("assign(vnew<-vold):")
vnew = zeros(Float64,5)
assign(vnew,vold)
println("new:")
println("    ", round.(vnew,digits=5))
println("same values? ", vnew == vold)
println("same reference? ", vnew === vold)

vold = [0.85893, 0.73287, 0.23025, 0.6831, 0.37286]
assign(vnew<-vold):
new:
    [0.0, 0.0, 0.0, 0.0, 0.0]
same values? false
same reference? false


+ 👍 Simple assignment **within** the function is local and does not affect the outside value.

## Forced mutation

Values of the pointed values gets modified.

In [148]:
function assign!(vou::Array, vin::Array)
    vou[:] = vin[:]
end

assign! (generic function with 2 methods)

In [149]:
println("vold = ", round.(vold,digits=5))
println("assign!(vnew[:]<-vold[:]):")
vnew = zeros(Float64,5)
assign!(vnew,vold)
println("new:")
println("    ", round.(vnew,digits=5))
println("same values? ", vnew == vold)
println("same reference? ", vnew === vold) 

vold = [0.85893, 0.73287, 0.23025, 0.6831, 0.37286]
assign!(vnew[:]<-vold[:]):
new:
    [0.85893, 0.73287, 0.23025, 0.6831, 0.37286]
same values? true
same reference? false


- 👍 Values are copied if the reference is used to assign.
- 💡 The use of `vnew[:]` and `vold[:]` forces Julia to use references (?).

## Naive implementations

`Base` functions:
 + In-place `copy!()`
 + In-place `map!()`

In [152]:
?copy!

search: [0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22m[0m[1m![22m [0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22mto[0m[1m![22m [0m[1mc[22mircc[0m[1mo[22m[0m[1mp[22m[0m[1my[22m[0m[1m![22m unsafe_[0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22mto[0m[1m![22m [0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22m [0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22msign deep[0m[1mc[22m[0m[1mo[22m[0m[1mp[22m[0m[1my[22m



```
copy!(dst, src) -> dst
```

In-place [`copy`](@ref) of `src` into `dst`, discarding any pre-existing elements in `dst`. If `dst` and `src` are of the same type, `dst == src` should hold after the call. If `dst` and `src` are multidimensional arrays, they must have equal [`axes`](@ref). See also [`copyto!`](@ref).

!!! compat "Julia 1.1"
    This method requires at least Julia 1.1. In Julia 1.0 this method is available from the `Future` standard library as `Future.copy!`.



In [153]:
?map!

search: [0m[1mm[22m[0m[1ma[22m[0m[1mp[22m[0m[1m![22m async[0m[1mm[22m[0m[1ma[22m[0m[1mp[22m[0m[1m![22m [0m[1mm[22m[0m[1ma[22m[0m[1mp[22m [0m[1mm[22m[0m[1ma[22m[0m[1mp[22mfoldr [0m[1mm[22m[0m[1ma[22m[0m[1mp[22mfoldl [0m[1mm[22m[0m[1ma[22m[0m[1mp[22mslices [0m[1mm[22m[0m[1ma[22m[0m[1mp[22mreduce async[0m[1mm[22m[0m[1ma[22m[0m[1mp[22m



```
map!(function, destination, collection...)
```

Like [`map`](@ref), but stores the result in `destination` rather than a new collection. `destination` must be at least as large as the first collection.

# Examples

```jldoctest
julia> a = zeros(3);

julia> map!(x -> x * 2, a, [1, 2, 3]);

julia> a
3-element Vector{Float64}:
 2.0
 4.0
 6.0
```

---

```
map!(f, values(dict::AbstractDict))
```

Modifies `dict` by transforming each value from `val` to `f(val)`. Note that the type of `dict` cannot be changed: if `f(val)` is not an instance of the value type of `dict` then it will be converted to the value type if possible and otherwise raise an error.

!!! compat "Julia 1.2"
    `map!(f, values(dict::AbstractDict))` requires Julia 1.2 or later.


# Examples

```jldoctest
julia> d = Dict(:a => 1, :b => 2)
Dict{Symbol, Int64} with 2 entries:
  :a => 1
  :b => 2

julia> map!(v -> v-1, values(d))
ValueIterator for a Dict{Symbol, Int64} with 2 entries. Values:
  0
  1
```


In [154]:
mvold = rand(5_000);
mvnew = zeros(Float64, 5_000);
mark0 = @benchmark copy!($mvnew,$mvold)
mark0a = @benchmark map!(x->x,mvnew,mvold)
mark1 = @benchmark assign!($mvnew,$mvold)

println("map!(): mark0a/mark0 ≈ ", round( median(mark0a.times)/median(mark0.times), digits=5 ))
println("Naive: mark1/mark0 ≈ ", round( median(mark1.times)/median(mark0.times), digits=5 ))

map!(): mark0a/mark0 ≈ 1.05375
Naive: mark1/mark0 ≈ 23.75206


 - 💡 `map!()` fares the same with `copy!()`
 - ❗ Naive implementation via a function that ensures copying **fares much worse**.

## Bang or no bang

❗: That is the question.

In [155]:
function assign_nobang(vou::Array, vin::Array)
    vou[:] = vin[:]
end

assign_nobang (generic function with 1 method)

❗Effect is same:

In [156]:
println("assign_nobang(vnew[:]<-vold[:]):")
vnew = zeros(Float64,5)
assign_nobang(vnew,vold)
println("new:")
println("    ", round.(vnew,digits=5))
println("same values? ", vnew == vold)
println("same reference? ", vnew === vold)

assign_nobang(vnew[:]<-vold[:]):
new:
    [0.85893, 0.73287, 0.23025, 0.6831, 0.37286]
same values? true
same reference? false


- 📓 Using the bang `!` after the function name is **only** used to indicate **--not force--** changes in the arguments.
- 💡 Function name has nothing to do with modifying the argument.
- 💡 No modifications to the argument naming or style to ensure modification.

## _Manual_ loop ends up faster

The loop per element is almost always better in Julia.
The effects can be the same.

In [157]:
function assign_loop!(vou::Array, vin::Array)
    for i in eachindex(vin)
        vou[i] = vin[i]
    end
end

function assign_vect!(vou::Array, vin::Array)
    vou .= vin # broadcasting loop, a.k.a. "vectorized"
end

assign_vect! (generic function with 1 method)

In [158]:
println("assign_loop!(for loop):")
vnew = zeros(Float64,5)
assign_loop!(vnew,vold)
println("new:")
println("    ", round.(vnew,digits=5))
println("same values? ", vnew == vold)
println("same reference? ", vnew === vold)

assign_loop!(for loop):
new:
    [0.85893, 0.73287, 0.23025, 0.6831, 0.37286]
same values? true
same reference? false


In [159]:
mark2 = @benchmark assign_loop!($vnew,$vold)
mark3 = @benchmark assign_vect!($vnew,$vold)

println("map!(): mark0a/mark0 ≈ ", round( median(mark0a.times)/median(mark0.times), digits=5 ))
println("Naive: mark1/mark0 ≈ ", round( median(mark1.times)/median(mark0.times), digits=5 ))
println("Loops: mark2/mark0 ≈ ", round( median(mark2.times)/median(mark0.times), digits=5 ))
println("Broadcast: mark3/mark0 ≈ ", round( median(mark3.times)/median(mark0.times), digits=5 ))
println("Loop vs Broadcast ≈ ", round( median(mark3.times)/median(mark2.times), digits=5 ))

map!(): mark0a/mark0 ≈ 1.05375
Naive: mark1/mark0 ≈ 23.75206
Loops: mark2/mark0 ≈ 0.00665
Broadcast: mark3/mark0 ≈ 0.02003
Loop vs Broadcast ≈ 3.01387


❗Looping can be more efficient than broadcasting. Or there are broadcasting loops faster than what's used here?

In [162]:
function assign_return(vin::Array)
    vou = vin;
    return vou
end

assign_return (generic function with 1 method)

In [164]:
mark4 = @benchmark vnew = assign_return($vold)

println("map!(): mark0a/mark0 ≈ ", round( median(mark0a.times)/median(mark0.times), digits=5 ))
println("Naive: mark1/mark0 ≈ ", round( median(mark1.times)/median(mark0.times), digits=5 ))
println("Loops: mark2/mark0 ≈ ", round( median(mark2.times)/median(mark0.times), digits=5 ))
println("Broadcast: mark3/mark0 ≈ ", round( median(mark3.times)/median(mark0.times), digits=5 ))
println("Loop vs Broadcast ≈ ", round( median(mark3.times)/median(mark2.times), digits=5 ))
println("Return: mark4/mark0 ≈ ",  round( median(mark4.times)/median(mark0.times), digits=5 ))
println("Return vs Loop ≈ ",  round( median(mark2.times)/median(mark4.times), digits=5 ))

map!(): mark0a/mark0 ≈ 1.05375
Naive: mark1/mark0 ≈ 23.75206
Loops: mark2/mark0 ≈ 0.00665
Broadcast: mark3/mark0 ≈ 0.02003
Loop vs Broadcast ≈ 3.01387
Return: mark4/mark0 ≈ 0.0018
Return vs Loop ≈ 3.69117


# Structured data

