Skip to content

Commit c231b2d

Browse files
colinleachColin LeachColin Leach
authored
Randomness concept (#816)
* WIP draft of randomness concept * added more on StatsBase * moved to `concepts/`, added `config.json` entry * response to reviewer comments --------- Co-authored-by: Colin Leach <colin.leach@omcast.net> Co-authored-by: Colin Leach <colin.ex@pm.me>
1 parent 98822eb commit c231b2d

File tree

10 files changed

+583
-18
lines changed

10 files changed

+583
-18
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@ bin/configlet
44
bin/configlet.exe
55
tmp/
66
Manifest.toml
7+
8+
.idea/

concepts.wip/randomness/.meta/config.json

Lines changed: 0 additions & 6 deletions
This file was deleted.

concepts.wip/randomness/about.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

concepts.wip/randomness/introduction.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

concepts.wip/randomness/links.json

Lines changed: 0 additions & 6 deletions
This file was deleted.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"authors": [
3+
"colinleach"
4+
],
5+
"contributors": [],
6+
"blurb": "Julia has a wide range of ways to generate random values for modelling and simulations."
7+
}

concepts/randomness/about.md

Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
# About
2+
3+
Many programs need (apparently) random values to simulate real-world events.
4+
5+
Common and familiar examples include:
6+
- A coin toss: a random value from `('H', 'T')`.
7+
- The roll of a die: a random integer from 1 to 6.
8+
- Shuffling a deck of cards: a random ordering of a card list.
9+
10+
Generating truly random values with a computer is a [surprisingly difficult technical challenge][truly-random], so you may see these results referred to as "pseudorandom".
11+
12+
***Important***: This Concept does _not_ cover cryptographically secure random numbers, which are a much more difficult challenge.
13+
14+
However, well-designed libraries like the [`Random`][Random] module in the Julia standard library are fast, flexible, and give results that are amply good enough for most applications in modelling, simulation and games.
15+
16+
Julia divides random functionality into multiple locations:
17+
18+
- Just a few basic but very versatile functions in `Base`, which are always available.
19+
- A wider range of options in the `Random` module.
20+
- More specialized functionality in packages which need to be installed before use (and are not available in Exercism).
21+
22+
`Random` is part of the standard library and likely to be pre-installed, but you will need to add `using Random` at the top of your program to bring its contents into the namespace.
23+
24+
## The `rand()` function
25+
26+
What this function does depends on the arguments you give it.
27+
There are _many_ options.
28+
29+
With no arguments, it generates a float between 0 and 1.
30+
This is a [`uniform`][uniform-distribution] with all values equally likely, as discussed in the Working with Distributions section, below.
31+
32+
A single integer argument generates a vector of that length.
33+
34+
```julia-repl
35+
julia> rand()
36+
0.10261774967264703
37+
38+
julia> rand(5)
39+
5-element Vector{Float64}:
40+
0.24134501977563894
41+
0.5664193284851202
42+
0.9804412082089355
43+
0.6229551330613335
44+
0.47589221741904664
45+
```
46+
47+
For a different range, just shift and scale the result appropriately.
48+
49+
The example below uses [broadcasting][broadcasting] for the subtraction, covered in the [Vector Operations][vector-ops] Concept.
50+
The `.-` simply applies this arithmetic to each vector element.
51+
52+
```julia-repl
53+
# numbers between -1.0 and +1.0
54+
julia> (rand(5) .- 0.5) * 2
55+
5-element Vector{Float64}:
56+
-0.5303906759076336
57+
0.9635682226775855
58+
-0.048823697086981754
59+
0.465842804648374
60+
0.9880834344780736
61+
```
62+
63+
With a type as the only argument, `rand` will use the `typemin` and `typemax` as limits.
64+
This is probably not what you want!
65+
66+
For random integers, we can supply a range, plus optionally how many values to generate.
67+
68+
```julia-repl
69+
julia> rand(Int64)
70+
-9159538335234594326 # not very useful
71+
72+
julia> rand(1:10, 5)
73+
5-element Vector{Int64}:
74+
1
75+
1
76+
1
77+
4
78+
7
79+
```
80+
81+
In the `rand(1:10, 5)` example above, notice that there are (coincidentally) repeating values, because each pick is independent.
82+
This is "sampling with replacement", discussed in more detail below.
83+
84+
Alternatively, supply an array or tuple, and `rand` will return a random entry:
85+
86+
```julia-repl
87+
julia> rand([4, 9, 16, 25])
88+
16
89+
90+
# coin flip
91+
julia> rand(['H', 'T'])
92+
'H': ASCII/Unicode U+0048 (category Lu: Letter, uppercase)
93+
94+
# mixed types in tuple
95+
julia> rand( (1, 3.2, "name"), 2 )
96+
2-element Vector{Any}:
97+
1
98+
"name"
99+
```
100+
101+
### Sampling with or without replacement
102+
103+
Imagine that we have a bag containing 3 red balls and 4 green balls, and we randomly pull a ball from the bag.
104+
To get a second ball, there are two possibilities:
105+
106+
1. Replace the first ball in the bag and give everything a good shake before pulling out another.
107+
The number of balls is now the same as before (7), and _the ratio of red to green is also the same_.
108+
2. Put the first ball on the table before pulling out a second.
109+
Now there are only 6 balls in the bag, and _the red:green ratio depends on the color of the first ball_.
110+
111+
Scenario 1 is with replacement, scenario 2 is without, and _they give different results_.
112+
113+
To simulate sampling without replacement in Julia, there are a couple of options.
114+
115+
Simplest (_and within Exercism the only option_), use `Random.shuffle()` to put the entries in random order, then use the first `n` elements.
116+
This is fine for small problems but may not scale well to large collections: `shuffle` needs to generate the full array, even if you only want a small fraction of it.
117+
118+
To do sampling-with-replacement "properly", install the `StatsBase.jl` package.
119+
That provide the `sample()` function with a full range of options.
120+
121+
We can reasonably hope that similar functionality will be added into `Random` in a future release, to make it part of the standard library (code samples in this document were tested with Julia 1.11).
122+
123+
## Working with Distributions
124+
125+
Until now, we have concentrated on cases where all outcomes are equally likely.
126+
For example, `rand(1:100)` is equally likely to give any integer from 1 to 100.
127+
128+
Many real-world situations are far less simple than this.
129+
As a result, statisticians have created a wide variety of [`distributions`][probability-distribution] to describe "real world" results mathematically.
130+
131+
### Uniform distributions
132+
133+
The `rand()` function described above is used when all probabilities are equal.
134+
This is called a [`uniform`][uniform-distribution] distribution.
135+
136+
### Gaussian distribution
137+
138+
Also called the "normal" distribution or the "bell-shaped" curve, this is a very common way to describe imprecision in measured values.
139+
140+
For example, suppose the factory where you work has just bought 10,000 bolts which should be identical.
141+
You want to set up the factory robot to handle them, so you weigh a sample of 100 and find that they have an average (or `mean`) weight of 4.731g.
142+
This is extremely unlikely to mean that they all weigh exactly 4.731g.
143+
Perhaps you find that values range from 4.627 to 4.794g but cluster around 4.731g.
144+
145+
This is the [`Gaussian distribution`][gaussian-distribution], for which probabilities peak at the mean and tails off symmetrically on both sides (hence "bell-shaped").
146+
To simulate this in software, we need some way to specify the width of the curve (_typically, expensive bolts will cluster more tightly around the mean than cheap bolts!_).
147+
148+
By convention, this is done with the [`standard deviation`][standard-deviation]: small values for a sharp, narrow curve, large for a low, broad curve.
149+
Mathematicians love Greek letters, so we use `μ` ('mu') to represent the mean and `σ` ('sigma') to represent the standard deviation.
150+
Thus, if you read that "95% of values are within 2σ of μ" or "the Higgs boson has been detected with 5-sigma confidence", such comments relate to the standard deviation.
151+
152+
There will be more to say about this in the [`Statistics`][statistics] Concept.
153+
154+
## The `randn()`function
155+
156+
Short for "random normal", this is similar to the floating-point variant of `rand()` except that values are distributed as a Gaussian with mean 0 and standard deviation 1.
157+
158+
Again, you may want to scale the raw output from `randn` for standard deviation, and displace it for the mean.
159+
The example below converts to mean 30 and StdDev 5.
160+
161+
```julia-repl
162+
julia> raw = randn(5)
163+
5-element Vector{Float64}:
164+
3.0762588867281475
165+
1.5101100620253902
166+
-0.5914858221637778
167+
0.684175554069735
168+
-0.8416433926114673
169+
170+
julia> raw * 5 .+ 30
171+
5-element Vector{Float64}:
172+
45.38129443364074
173+
37.55055031012695
174+
27.04257088918111
175+
33.420877770348675
176+
25.791783036942665
177+
```
178+
179+
It is hard to tell from looking at the output that the raw output clusters closer to zero than for a uniform distribution.
180+
If you doubt it, generate 1000 or more and plot them to make it more obvious.
181+
182+
## The `Random` module
183+
184+
[This module][randommod] contains the next tier of functionality, omitted from `Base` to help minimize the size of Julia's default configuration.
185+
186+
`Random` supplements `rand` and `randn` in `Base` with mutating versions, `rand!` and `randn!`.
187+
188+
A useful addition is [`randstring`][randstring], which generates a string of given length.
189+
By default, this uses upper- and lowercase letters plus digits 0 to 9, but other choices can be specified.
190+
191+
```julia-repl
192+
julia> using Random
193+
194+
julia> randstring(20)
195+
"BoJnIxrS33pJiWggXZQV"
196+
```
197+
198+
Additionally, there is a `bitrand` function to generate a random [`BitArray`][bitarray] of specified length.
199+
200+
```julia-repl
201+
julia> bitrand(5)
202+
julia> bitrand(5)
203+
5-element BitVector:
204+
1
205+
1
206+
0
207+
0
208+
1
209+
```
210+
211+
### Shuffles and permutations
212+
213+
To randomly shuffle entries in a `Vector` we have [`shuffle`][shuffle]; also `shuffle!` to mutate the input vector in-place.
214+
215+
```julia-repl
216+
julia> v = ['A', '1', '2', 'J', 'Q', 'K'];
217+
218+
julia> shuffle(v)
219+
6-element Vector{Char}:
220+
'K': ASCII/Unicode U+004B (category Lu: Letter, uppercase)
221+
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
222+
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
223+
'J': ASCII/Unicode U+004A (category Lu: Letter, uppercase)
224+
'2': ASCII/Unicode U+0032 (category Nd: Number, decimal digit)
225+
'Q': ASCII/Unicode U+0051 (category Lu: Letter, uppercase)
226+
227+
# shuffles are random:
228+
julia> shuffle(v)
229+
6-element Vector{Char}:
230+
'2': ASCII/Unicode U+0032 (category Nd: Number, decimal digit)
231+
'K': ASCII/Unicode U+004B (category Lu: Letter, uppercase)
232+
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
233+
'Q': ASCII/Unicode U+0051 (category Lu: Letter, uppercase)
234+
'J': ASCII/Unicode U+004A (category Lu: Letter, uppercase)
235+
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
236+
```
237+
238+
Sometimes it is useful to have the shuffled indices instead.
239+
For this, use [`randperm(n)`][randperm] where n is the length of the sequence.
240+
241+
```julia-repl
242+
julia> randperm(6)
243+
6-element Vector{Int64}:
244+
6
245+
2
246+
4
247+
1
248+
3
249+
5
250+
```
251+
252+
In effect, the example above gives the same results as `shuffle(1:6)`.
253+
254+
Related functions include [`randsubseq`][randsubseq] for pulling out entries with fixed probability, and [`randcycle`][randcycle] for cyclic permutations.
255+
These require some specialized knowledge, so check the documentation if they are of interest to you.
256+
257+
### Seeds and algorithms
258+
259+
Several random number generator (RNG) algorithms are built into `Random` as standard, and anyone with appropriate mathematical skills can add more.
260+
Such things are well beyond the scope of this document!
261+
262+
A more common reason for working with RNGs is to specify a `seed`, which has the effect of making the sequence of "random" outputs reproducible from one run to the next.
263+
264+
Such reproducibility is not appropriate in production code, but it can help with testing and debugging.
265+
266+
## Other packages
267+
268+
Outside Exercism, there are many installable packages relating to randomness, probability and statistics.
269+
For some more information, see the [`Statistics`][statistics] Concept.
270+
271+
### The `StatsBase.jl` package
272+
273+
Most of the [`StatsBase`][statsbase] functions are quite technical and not relevant to this document.
274+
275+
The exception is [`StatsBase.sample`][sbsample], which provides a full implementation of sampling with or without replacement (see an earlier section, above).
276+
There are also functions for weighted (non-uniform) sampling.
277+
278+
### The `Distributions.jl` package
279+
280+
The `uniform` and `normal` (or Gaussian) distributions were described above.
281+
282+
The `Random` module also contains `randexp` to sample from the [Exponential Distribution][expdist], which is related to the (very common) [Poisson Distribution][poisson].
283+
284+
For a much wider range of options, there is the [`Distributions.jl`][distributions] package for those with an appropriate background in statistics.
285+
286+
287+
[gaussian-distribution]: https://simple.wikipedia.org/wiki/Normal_distribution
288+
[probability-distribution]: https://simple.wikipedia.org/wiki/Probability_distribution
289+
[Random]: https://docs.julialang.org/en/v1/stdlib/Random/
290+
[sampling-with-replacement]: https://www.youtube.com/watch?v=LnGFL_A6A6A
291+
[standard-deviation]: https://simple.wikipedia.org/wiki/Standard_deviation
292+
[truly-random]: https://www.malwarebytes.com/blog/news/2013/09/in-computers-are-random-numbers-really-random
293+
[uniform-distribution]: https://www.investopedia.com/terms/u/uniform-distribution.asp#:~:text=In%20statistics%2C%20uniform%20distribution%20refers,a%20spade%20is%20equally%20likely.
294+
[reproducibility]: https://docs.julialang.org/en/v1/stdlib/Random/#Reproducibility
295+
[statsbase]: https://juliastats.org/StatsBase.jl/stable/
296+
[sbsample]: https://juliastats.org/StatsBase.jl/stable/sampling/#StatsBase.sample
297+
[broadcasting]: https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting
298+
[bitrand]: https://docs.julialang.org/en/v1/stdlib/Random/#Random.bitrand
299+
[bitarray]: https://docs.julialang.org/en/v1/base/arrays/#Base.BitArray-Tuple{Any}
300+
[randperm]: https://docs.julialang.org/en/v1/stdlib/Random/#Random.randperm
301+
[shuffle]: https://docs.julialang.org/en/v1/stdlib/Random/#Random.shuffle
302+
[randommod]: https://docs.julialang.org/en/v1/stdlib/Random/#Random-numbers-module
303+
[randstring]: https://docs.julialang.org/en/v1/stdlib/Random/#Random.randstring
304+
[randsubseq]: https://docs.julialang.org/en/v1/stdlib/Random/#Random.randsubseq
305+
[randcycle]: https://docs.julialang.org/en/v1/stdlib/Random/#Random.randcycle
306+
[seed]: https://en.wikipedia.org/wiki/Random_seed
307+
[expdist]: https://en.wikipedia.org/wiki/Exponential_distribution
308+
[poisson]: https://en.wikipedia.org/wiki/Poisson_distribution
309+
[distributions]: https://juliastats.org/Distributions.jl/latest/
310+
[statistics]: https://exercism.org/tracks/julia/concepts/statistics
311+
[vector-ops]: https://exercism.org/tracks/julia/concepts/vector-operations

0 commit comments

Comments
 (0)