# Random Number Generator




## What does it mean to generate random numbers? Why do we need it?

There are many cases whether we need to generate random numbers or draw random values from distributions.

- random draw from a sample: pick a lottery number; draw survey samples
- resample a dataset (e.g., for bootstrapping):
- do numerical integration: 
- draw values from distributions to simulate a distribution (when do we use it?)


## Is it truly *random*? 

- truly random: you cannot repeat it
  - not good for reproducibility
- pseudorandom random numbers   
  - use an algorithm to generate numbers
  - usually requires a *seed* to recursively generate numbers
  - pseudo: 偽; quasi: 類

## random number generation vs. random number generator (RNG)

- *random number generator* (RNG),  pseudorandom number generator (PRNG). 

- Mersenne Twister algorithm 
  - has the root from the Merseen (*[mer-'sen]? well, it's French*) prime number

- xoshiro algorithm
  - based on xor (*exclusive or*; "xo"), shift ("shi"), and rotation ("ro") functions

- Lehmer algorithm


In [2]:
using Random                     # in the base, no need to "add"

myrng1 = MersenneTwister(1234);  # create a RNG that may be used for task-specific purposes; "1234" is the seed
myrng2 = Xoshiro(1234);          # new to Julia 1.7; better; use Xoshiro256++ algorithm(current default algorithm)

# using Pkg; Pkg.add("StableRNGs")
using StableRNGs
myrng3 = StableRNG(1234)         # based on LehmerRNG 

StableRNGs.LehmerRNG(state=0x000000000000000000000000000009a5)

The line `myrng1 = Xoshiro(1234)` creates a random number generator (RNG) with the seeding number `1234`, but the line itself does not put the RNG in effect. There are different ways to put RNG in effect, each has its own purposes.

`myrng1 = Xoshiro(1234)`還沒有產生亂數，還需要用`Random.seed!`

### Put random seeds in "global" scope using `Random.seed!(integer_here)`

Here, "global" means it is effective throughout the script.

In [3]:
Random.seed!(1234)    # use Julia's default RNG
Random.seed!(myrng1)  # use myrng1 defined above
Random.seed!(myrng2)  # use myrng2 defined above
Random.seed!(MersenneTwister(1234)) |> display  # explicit about algorithm
# |> is the pipe operator 
# equivalently, you can write 
# display(Random.seed!(MersenneTwister(1234)))

Random.seed!(Xoshiro(87)) |>display


Random.seed!(myrng1, 5678)  # override the seed number of myrng1: 5678 取代1234

MersenneTwister(0x712ecbebb6f5e8db26e6d4377d854689)

Xoshiro(0x126984a90d71337e, 0x4ba6ed3540bc5c9b, 0xd219884643a6b61b, 0xbb87890869a0560e)

MersenneTwister(5678)

###### lecture notes:

Which is Julia's default algorithm? How do you figure it out?

In [7]:
Random.seed!(123)==MersenneTwister(123)

false

In [8]:
Random.seed!(123)==Xoshiro(123)  # so the default algorithm is Xoshiro

true

In [12]:
# Let's see some examples.

Random.seed!(123)  # seed the global RNG (affect the global scope)

#a0=rand(1)  注意：有沒有a0會影響後面的亂數 (想像碼表在跑 這邊a0跑了之後 後面當然不一樣)

a1 = rand(4)    # a vector(4-elt) of random numbers from uniform(0,1)
a2 = rand(4,1)  # 4x1 matrix
a3 = rand(4,2)  # a matrix(4x2) of random numbers from uniform(0,1)
a4 = randn(4,3) # a matrix(4x3) of random numbers from N(0,1)

#a5=rand(4)  注意：a5!=a1

@show a1
@show a2  # watch out the difference of `;` btw a1 and a2
@show a3
@show a4

a1 = [0.521213795535383, 0.5868067574533484, 0.8908786980927811, 0.19090669902576285]
a2 = [0.5256623915420473; 0.3905882754313441; 0.044818005017491114; 0.933353287277165;;]
a3 = [0.08026576094597515 0.25643321529948804; 0.04902841674350844 0.08627888341903334; 0.9158663552785268 0.27163601818691985; 0.6541013048231016 0.6885748828439957]
a4 = [0.12412397060725551 -1.1759655690989972 0.5187435249130559; 0.032114510283638494 -0.1383989248694004 -0.5255958028703693; 0.2322909372677087 -0.7901058199112866 1.0006949009778057; -1.2653140705580803 1.926394978510115 -1.245738127915438]


4×3 Matrix{Float64}:
  0.124124   -1.17597    0.518744
  0.0321145  -0.138399  -0.525596
  0.232291   -0.790106   1.00069
 -1.26531     1.92639   -1.24574

###### side notes:

**Why there is an exclamation mark ("!") on some of the functions?**

- Function names ending with "!" means that the argument of the function would be modified by the result of the function.
- Some functions have both the "!" and non-"!" versions.


In [16]:
# Example

list1 = rand(4)
@show list1

# aaa stop 1


sort(list1) |> display # it returns the sorted list without modify "list1"
@show list1  # same list1 as above

# aaa stop 2

sort!(list1) # it returns the sorted list and saves the results in "list1"
@show list1 # not the same list1 as above

list1 = [0.6211826338964892, 0.3332600250411273, 0.7048760768344424, 0.889575603392557]


4-element Vector{Float64}:
 0.3332600250411273
 0.6211826338964892
 0.7048760768344424
 0.889575603392557

list1 = [0.6211826338964892, 0.3332600250411273, 0.7048760768344424, 0.889575603392557]
list1 = [0.3332600250411273, 0.6211826338964892, 0.7048760768344424, 0.889575603392557]


4-element Vector{Float64}:
 0.3332600250411273
 0.6211826338964892
 0.7048760768344424
 0.889575603392557

###### lecture notes:
- show `size(a2)`, `size(a2,1)`, `b1, b2 = size(a2)`, etc., introduce `typeof()`
  - important for debugging
  ```julia
a1 = rand(4) 
a2 = rand(4,1) 
# the numbers are not the same; so.. add RNG and comapre, still not the same; use typeof() to check
  ``` 


- global seed vs. task-specific seed; why global random seed may not be enough for reproducibility
  - careful about the "shared" RNG 

In [18]:
@show a1
@show a2
@show size(a1)
@show size(a2)

@show typeof(a1)
@show typeof(a2)

a1 = [0.521213795535383, 0.5868067574533484, 0.8908786980927811, 0.19090669902576285]
a2 = [0.5256623915420473; 0.3905882754313441; 0.044818005017491114; 0.933353287277165;;]
size(a1) = (4,)
size(a2) = (4, 1)
typeof(a1) = Vector{Float64}
typeof(a2) = Matrix{Float64}


Matrix{Float64} (alias for Array{Float64, 2})

In [25]:
# It would be better to show this script in VScode.
# println("#############")

using Random
Random.seed!(123) #global seed i.e.,會影響以下的程式


# axx = rand(10) # 亂入, which runs on the global seed
# 有沒有 axx 會影響下面a1 a2 的結果(recall:碼表)  原因： share global seed

a1 = rand(2) 
a2 = randn(2) 

@show a1;
@show a2;

# aaaa stop 1



# bxx = rand(10) # 亂入, which runs on the global seed
# 有沒有bxx 不會影響下面b1 b2 的結果 因為b1 b2 有specify algorithm (local; task-specific)

b1 = rand(MersenneTwister(123), 2)
b2 = randn(MersenneTwister(123), 2)

@show b1;
@show b2;

# aaaaa stop 2



myrng = Xoshiro(2333)   # for task-specific purpose; 重設，就還原

# cxx = rand(11)          # 亂入, which runs on the global RNG but not run on myrng
# 有沒有cxx 不會影響下面的c1 c2 的結果

c1 = rand(myrng, 2)
c2 = randn(myrng, 2)

@show c1; 
@show c2; 

a1 = [0.521213795535383, 0.5868067574533484]
a2 = [-1.6236037455860806, -0.21766510678354617]
b1 = [0.7684476751965699, 0.940515000715187]
b2 = [1.1902678809862768, 2.04817970778924]
c1 = [0.2923977715754691, 0.4166292994124188]
c2 = [0.2609962536607125, -0.2163406590182754]


In [23]:
#可以用以下方法comment

#=
your codes
=#

### Class Exercises

- Write a code to convert `a1` (a vector) to a matrix (you may have to google the method). 

- Write code to draw a set of 10,000 random numbers that is uniformly distributed in (-2,3). (Hint: Stretch $U(0,1)$ to fit the bound of $U(-2,3)$.)  Show the mean and the standard deviation of the series. What is the theoretical mean and standard deviation of a $U(-2,3)$? Are your answers close to the theoretical values?

- Write code to draw a 10x2 matrix of random numbers from $N(2,3)$ which is a normal distribution with mean=2 and variance=3:

  - use `randn()`; (Hint: `randn()` generates N(0,1) random variables; you have to scale it to the appropriate mean and variance.)
  - use `rand()`. (Hint: `rand()` could take distributions as arguments. See the help file.`?rand`)

### Solutions to Class Exercises

- Write a code to convert `a1` (a vector) to a matrix.

In [3]:
using Random
Random.seed!(123) #default Xoshiro

a1 = rand(2)  # 2-elt vector

reshape(a1, length(a1), 1)




2×1 Matrix{Float64}:
 0.521213795535383
 0.5868067574533484

In [5]:
using Random
Random.seed!(123)

a1 = rand(4) # 4-elt vector

reshape(a1, 1, length(a1))

1×4 Matrix{Float64}:
 0.521214  0.586807  0.890879  0.190907

- Write code to draw a set of 10,000 random numbers that is uniformly distributed in (-2,3). (Hint: Stretch           $U(0,1)$ to fit the bound of $U(-2,3)$.)  Show the mean and the standard deviation of the series. 
  What is the theoretical mean and standard deviation of a $U(-2,3)$? Are your answers close to the theoretical       values?
  
- Note that the theoretical mean of $U(a,b)$ is $ \frac{b+a}{2}$



In [28]:
using Random

Random.seed!(123)

vec1= rand(10000)  # draw 10000 random number from U(0,1)

vec2= vec1*5       # U(0,5)

vec3= vec2 .- 2       # U(-2,3)

@show mean(vec3)   
@show std(vec3)


mean(vec3) = 0.5052948303444671
std(vec3) = 1.440600975309783


1.440600975309783

- Write code to draw a 10x2 matrix of random numbers from $N(2,3)$ which is a normal distribution with mean=2 and variance=3:

  - use `randn()`; (Hint: `randn()` generates N(0,1) random variables; you have to scale it to the appropriate mean and variance.)
  

In [25]:
using Random 
Random.seed!(123)

vec1= randn(10,2)

vec2= vec1 * sqrt(3)

vec3= vec2 .+ 2

@show mean(vec3)
@show var(vec3)

mean(vec3) = 1.6110192027125454
var(vec3) = 2.104853600147445


2.104853600147445

- use `rand()`. (Hint: `rand()` could take distributions as arguments. See the help file.`?rand`)

In [1]:
using Distributions

d= Normal(2, sqrt(3))




Normal{Float64}(μ=2.0, σ=1.7320508075688772)

In [2]:
using Random

mat = rand(d, 10, 2)  # 10x2

@show mean(mat)
@show var(mat)

mean(mat) = 2.700925811919641
var(mat) = 1.8208689464014136


1.8208689464014136

Now that you have generated random numbers from a normal random variable, let's see how the generated values match the true distribution by drawing histograms.

In [1]:
#using Pkg; Pkg.add(["Distributions", "Plots", "Interact", "WebIO", "StatsPlots", "LaTeXStrings"])
using Distributions, Plots, Interact, WebIO, StatsPlots, LaTeXStrings

d = Normal(-1,2) #note that 2 is std dev. not variance

@manipulate for N in (100:100:5000)  # 100,200,300,...,5000  
    histogram(rand(d,N), normalize=true,  bins=100)  # hjw
    plot!(d)
end

#下面的 N 拉來拉去時 會發現相同的Ｎ 但每次拉的圖都不一樣 因為random
#normalize=false 出來的是次數
#normalize=true 出來的是機率 （次數/總次數）
# bins=100 切割成100等分

# Question: What if I want to show the "exact" same graphs everytime I run the code?
# Ans: use random seed, where should I write it?

In [None]:
using Distributions, Plots, Interact, WebIO, StatsPlots, LaTeXStrings, Random

d = Normal(-1,2) #note that 2 is std dev. not variance

@manipulate for N in (100:100:5000)  # 100,200,300,...,5000 
    Random.seed!(123) # Set the random number generator seed HERE
    histogram(rand(d,N), normalize=true,  bins=100)  # hjw
    plot!(d)
end

# Other Comments

- Don't assume random numbers will be the same between Julia versions. See the [doc](https://docs.julialang.org/en/v1.5/stdlib/Random/) here. That is, if you apply the same code `myrandom = rand(MersenneTwister(123), 10)` to different versions of Julia, you'll get different `myrandom`, even if you've specified the local RNG. This may cause problems because you may not be able to reproduce the exact same results of your program after your Julia is upgraded. So, at least you have to document your version of Julia in your results. (BTW, different OS, different types of CPUs, may also have influences on numerical details. Documentation is important.)


- If you want random numbers to be the same between versions use [StableRNGs](https://juliahub.com/ui/Packages/StableRNGs/fu6AW/1.0.0). For instance, `rng = StableRNG(seed::Integer)`.

  - ```julia
using StableRNGs  
rng = StableRNG(123)
A = randn(rng, 10, 10) # instead of randn(10, 10)
@test inv(inv(A)) ≈ A  # if not random, may not be inverted because of deficient rank
x = [1.1, 2.2, 3.1, 4.5, 5.3, 6.1, 4.4, 3.2, 2.9, 9.0] # any vector of 10
@test A \ (A*x) ≈ x   # another test of RNG
```

- StableRNG is currently an alias for LehmerRNG, and implements a well understood linear congruential generator (LCG); an LCG is not state of the art, but is fast and is believed to have reasonably good statistical properties.


- The StableRNG is not as good as MersenneTwister or Xoshiro, but it is simple and less pron to problems.


- Starting from Julia 1.7, the default RNG is switched from from MersenneTwister to Xoshiro (a much faster and easier to parallelize pseudo RNG; also has better statistical properties). Julia 1.7 will also have a different RNG object per task, which will also change the stream of random numbers. 


- Also note that due to performance improvements and improvements to numerical accuracy, exact bitpatterns for floating point results are not guaranteed between versions.


[//]: # "If students have learned Stata, ask some of them to do a presentation on DataFrames vs. Stata, also introducing DataFramesMeta (and something like that). Resources [here](https://dataframes.juliadata.org/stable/man/comparisons/), [here](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_stata.html), [here](https://ahsmart.com/assets/pages/data-wrangling-with-data-frames-jl-cheat-sheet/DataFramesCheatSheet_v0.21_rev3.pdf), and [here](https://towardsdatascience.com/going-from-stata-to-pandas-706888525acf)."
