<a href="https://colab.research.google.com/github/finesketch/statistics/blob/main/06_Random_Numbers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Randomness is a big part of machine learning. Randomness is used as a tool or a feature in preparing data and in learning algorithms that map input data to output data in order to make predictions.

The source of randomness in machine learning is a mathematical trick called a *pseudorandom* number generator.

## Randomness in Machine Learning



## Random Numbers with Python

The Python standard library provides a module called random that offers a suite of functions for generating random numbers. Python uses a popular and robust pseudorandom number generator called the **Mersenne Twister**.

In [1]:
# seed the pseudorandom number generator
# meaning given the same seed, it will produce the same sequence of numbers every time.
# If the seed() function is not called prior to using randomness, the default is to use the current system time in milliseconds from epoch (1970)
from random import seed
from random import random

# seed random number generator
seed(1)

# generate some random numbers
print(random(), random(), random())

# reset the seed
seed(1)

# generate some random numbers
print(random(), random(), random())

0.417022004702574 0.7203244934421581 0.00011437481734488664
0.417022004702574 0.7203244934421581 0.00011437481734488664


In [3]:
# generate random floating point values
from random import seed
from random import random

# seed random number genertor 
seed(1)

# generate random numbers between 0 and 1
print('Set to Seed of "1":')
for _ in range(10):
  value = random()
  print(value)

# seed random number genertor 
seed(5)

# generate random numbers between 0 and 1
print('Set to Seed of "5":')
for _ in range(10):
  value = random()
  print(value)  

Set to Seed of "1":
0.02290419382011777
0.013767871906265272
0.8639210321685548
0.0027216333430962747
0.24500058428666904
0.9356752277156742
0.4320107345788229
0.3843831733376474
0.5639504770506025
0.19097356937127596
Set to Seed of "5":
0.6695318129236498
0.584080028382301
0.9298930975041293
0.8912685509262611
0.7734944153048311
0.353833039553328
0.9876097772883745
0.48281410594123697
0.7648321593659458
0.890608582155942


The floating point values could be rescaled to a desired range by multiplying them by the size of the new range and adding the min value, like:

*scaledValue = min + (value × (max − min))*

In [5]:
# generate random integer values
from random import seed
from random import randint

# seed random number generator
seed(1)

# generate some integers
for _ in range(10):
  value = randint(0, 10)
  print(value)

5
8
9
5
0
0
1
7
6
9


**Rand Gaussian Values:** Random floating point values can be drawn from a Gaussian distribution using the gauss() function. This function takes two arguments that correspond to the parameters that control the size of the distribution, specifically the mean and the standard deviation.

In [7]:
# generate random Gaussian values
from random import seed
from random import gauss

# seed random number generator
seed(1)

# generate some Gaussian values
for _ in range(10):
  value = gauss(0, 1)
  print(value)

1.2881847531554629
1.449445608699771
0.06633580893826191
-0.7645436509716318
-1.0921732151041414
0.03133451683171687
-1.022103170010873
-1.4368294451025299
0.19931197648375384
0.13337460465860485


In [8]:
# chose a random element from a list
from random import seed
from random import choice

# seed random number generator
seed(1)

# prepare a sequence
sequence = [i for i in range(20)]
print(sequence)

# make choice from the sequence
for _ in range(5):
  selection = choice(sequence)
  print(selection)


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
4
18
2
8
3


In [9]:
 # select a random sample without replacement
 from random import seed
 from random import sample

 # seed random number generator
 seed(1)

 # prepare a sequence
 sequence = [i for i in range(20)]
 print(sequence)

 # select a subset wihtout replacement
 subset = sample(sequence, 5)
 print(subset)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[4, 18, 2, 8, 3]


In [13]:
# randomly shuffle a sequence
from random import seed
from random import shuffle

# seed random number generator
seed(1)

# prepare a sequence
sequence = [i for i in range(20)]
print(sequence)

# randomly shuffle the sequence (in-place shuffle)
shuffle(sequence)
print(sequence)


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[11, 5, 17, 19, 9, 0, 16, 1, 15, 6, 10, 13, 14, 12, 7, 3, 8, 2, 18, 4]


## Random Numbers with NumPy

In machine learning, you are likely using libraries such as scikit-learn and Keras. These libraries make use of NumPy under the covers, a library that makes working with vectors and matrices of numbers very efficient.

In [14]:
# seed the pseudorandom number generator
from numpy.random import seed
from numpy.random import rand

# seed random number generator
seed(1)

# generate some random numbers
print(rand(3))

# reset the seed
seed(1)

# generate some random numbers
print(rand(3))

[4.17022005e-01 7.20324493e-01 1.14374817e-04]
[4.17022005e-01 7.20324493e-01 1.14374817e-04]


In [15]:
# generate random floating point values
from numpy.random import seed
from numpy.random import rand

# seed random number generator
seed(1)

# generate random numbers between 0 and 1
values = rand(10)
print(values)


[4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01
 1.46755891e-01 9.23385948e-02 1.86260211e-01 3.45560727e-01
 3.96767474e-01 5.38816734e-01]


In [16]:
# generate random integer values
from numpy.random import seed
from numpy.random import randint

# seed random number generator
seed(1)

# generate some integers
values = randint(0, 10, 20) # lower-end, higher-end, number of integer values to generate
print(values)

[5 8 9 5 0 0 1 7 6 9 2 4 5 2 4 2 4 7 7 9]


In [19]:
# generate random Gaussian values
from numpy.random import seed
from numpy.random import randn

# seed random number generator
seed(1)

# generate some Gaussian values
values = randn(10) # mean=0, standard deviation=1
print(values)

[ 1.62434536 -0.61175641 -0.52817175 -1.07296862  0.86540763 -2.3015387
  1.74481176 -0.7612069   0.3190391  -0.24937038]
[]


In [21]:
# randomly shuffle a sequence
from numpy.random import seed
from numpy.random import shuffle

# seed random number generator
seed(1)

# prepare a sequence
sequence = [i for i in range(20)]
print(sequence)

# randomly shuffle the sequence
shuffle(sequence)
print(sequence)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[3, 16, 6, 10, 2, 14, 4, 17, 7, 1, 13, 0, 19, 18, 9, 15, 8, 12, 11, 5]


## When to Seed the Random Number Generator

There are times during a predictive modeling project when you should consider seeding the random number generator. Let’s look at two cases:
* Data Preparation. 
* Data Splits. 

## How to Control for Randomness

A stochastic machine learning algorithm will learn slightly differently each time it is run on the same data. This will result in a model with slightly different performance each time it is trained.

A better approach is to evaluate the algorithm in such a way that the reported performance includes the measured uncertainty in the performance of the algorithm. We can do that by repeating the evaluation of the algorithm multiple times with different sequences of random numbers.

There are two uncertainty to consider:
* Data uncertainty
* Algorithm uncertainty

