### Random Seeds

The `random` module provides a variety of functions related to (pseudo) random numbers.

The problem when you use random numbers in your code is that it can be difficult to debug because the same random number sequence is not the same from run to run of your program. If your code fails somewhere in the middle of a run it is difficult to make the problem **repeatable**. Debugging intermittent and non-repeatable failures is one of the worst things to do!

Fortunately, when using the `random` module, we can set the `seed` for the random underlying random number generator.

Random numbers are not truly random - they are generated in such a way that the numbers *appear* random and evenly distributed, but in fact they are being generated using a specific algorithm.

That algorithm depends on a **seed** value. That seed value will determine the exact sequence of randomly generated numbers (so as you can see, it's not truly random). Setting different seeds will result in different random sequences, but setting the seed to the same value will result in the same sequence being generated.

By default, the seed uses the system time, hence every time you run your program a different seed is set. But we can easily set the seed to something specific - very useful for debugging purposes.

In [2]:
import random

In [3]:
for _ in range(10):
    print(random.randint(10, 20), random.random())

11 0.3941432306671926
15 0.8812140790499596
16 0.37282943577477967
20 0.005513098797397142
11 0.3725669786178152
11 0.1196637438444097
10 0.4462043488173808
20 0.7780315394948791
10 0.44184689599720195
18 0.7151196875741429


In [4]:
for _ in range(10):
    print(random.randint(10, 20), random.random())

18 0.2038383050034298
16 0.7440139968470417
15 0.707850464217125
18 0.14114568897542168
11 0.8880311967769577
11 0.14594783916143372
20 0.15605491687220174
11 0.5514178424953686
10 0.5972628604162272
14 0.6761330443296901


As you can see the sequence of numbers is not the same (and even restarting the kernel will result in different numbers).

We can set the **seed** as follows:

In [27]:
random.seed(0)
for i in range(10):
    print(random.randint(10, 20), random.random())

16 0.7579544029403025
16 0.04048437818077755
18 0.48592769656281265
14 0.9677999949201714
15 0.5833820394550312
13 0.5046868558173903
14 0.1397457849666789
11 0.6183689966753316
14 0.9872592010330129
18 0.9827854760376531


If we run this code again, the sequence will still be different:

In [6]:
for i in range(10):
    print(random.randint(10, 20), random.random())

19 0.9021659504395827
14 0.09876334465914771
11 0.8988382879679935
20 0.33019721859799855
18 0.1007012080683658
16 0.31619669952159346
20 0.9130110532378982
18 0.47700977655271704
18 0.2604923103919594
18 0.9159944803568847


Instead what we have to do is reset the seed (which happens if you set the seed to a specific number at the start of running your program - then evey random number generated will be repeatable from run to run).

Here, we just need to reset the seed before running that loop to get the same effect:

In [7]:
random.seed(0)
for i in range(20):
    print(random.randint(10, 20), random.random())

16 0.7579544029403025
16 0.04048437818077755
18 0.48592769656281265
14 0.9677999949201714
15 0.5833820394550312
13 0.5046868558173903
14 0.1397457849666789
11 0.6183689966753316
14 0.9872592010330129
18 0.9827854760376531
19 0.9021659504395827
14 0.09876334465914771
11 0.8988382879679935
20 0.33019721859799855
18 0.1007012080683658
16 0.31619669952159346
20 0.9130110532378982
18 0.47700977655271704
18 0.2604923103919594
18 0.9159944803568847


In [8]:
random.seed(0)
for i in range(20):
    print(random.randint(10, 20), random.random())

16 0.7579544029403025
16 0.04048437818077755
18 0.48592769656281265
14 0.9677999949201714
15 0.5833820394550312
13 0.5046868558173903
14 0.1397457849666789
11 0.6183689966753316
14 0.9872592010330129
18 0.9827854760376531
19 0.9021659504395827
14 0.09876334465914771
11 0.8988382879679935
20 0.33019721859799855
18 0.1007012080683658
16 0.31619669952159346
20 0.9130110532378982
18 0.47700977655271704
18 0.2604923103919594
18 0.9159944803568847


As you can see, the sequence of random numbers generated is now the same every time.

What's interesting is that even functions like `shuffle` will shuffle in the same order!

Let's see this:

In [9]:
def generate_random_stuff(seed=None):
    random.seed(seed)
    results = []
    
    # randint will generate the same sequence (for same seed)
    for _ in range(5):
        results.append(random.randint(0, 5))
    
    # even shuffling generates in the same way (for same seed)
    characters = ['a', 'b', 'c']
    random.shuffle(characters)
    results.append(characters)
    
    # same with the Gaussian distribution
    for _ in range(5):
        results.append(random.gauss(0, 1))
        
    return results

In [10]:
print(generate_random_stuff())

[4, 1, 1, 1, 4, ['a', 'c', 'b'], -0.31512889621846685, 0.5092099345420326, -0.3512169360103597, 0.36741636205390116, -0.5768283643428594]


In [11]:
print(generate_random_stuff())

[0, 0, 0, 5, 5, ['a', 'b', 'c'], 1.1744703949526212, -0.7761467288153496, -1.3751954194033316, -0.9610094283809831, -0.9073440410135304]


Now let's use a seed value:

In [12]:
print(generate_random_stuff(0))

[3, 3, 0, 2, 4, ['a', 'c', 'b'], 1.6391095109274887, -0.9249345372119703, 0.9223306019157185, -0.1891931090669293, 0.5456115709634167]


In [13]:
print(generate_random_stuff(0))

[3, 3, 0, 2, 4, ['a', 'c', 'b'], 1.6391095109274887, -0.9249345372119703, 0.9223306019157185, -0.1891931090669293, 0.5456115709634167]


As long as we use the same seed value the results are repeatable. But if we set different seed values the sequences will be different (but still be the same for the same seed):

In [14]:
print(generate_random_stuff(100))

[1, 3, 3, 1, 5, ['a', 'c', 'b'], -1.639893943131093, 0.7278930291928232, -0.4000719319137612, -0.08390378703116254, -0.3013546798244102]


In [15]:
print(generate_random_stuff(100))

[1, 3, 3, 1, 5, ['a', 'c', 'b'], -1.639893943131093, 0.7278930291928232, -0.4000719319137612, -0.08390378703116254, -0.3013546798244102]


Lastly let's see how we would calculate the frequency of randomly generated integers, just to see how even the distribution is.

Basically, given a sequence of random integers, we are going to create a dictionary that contains the integers as keys, and the values will the frequency of each:

In [16]:
def freq_analysis(lst):
    return {k: lst.count(k) for k in set(lst)}

In [17]:
lst = [random.randint(0, 10) for _ in range(100)]

In [18]:
print(lst)

[10, 3, 5, 3, 4, 3, 2, 2, 3, 5, 5, 10, 6, 3, 6, 7, 8, 4, 6, 2, 10, 10, 1, 2, 0, 9, 6, 2, 9, 2, 3, 2, 0, 10, 3, 7, 10, 6, 2, 9, 8, 0, 9, 3, 9, 8, 4, 9, 5, 8, 10, 5, 8, 5, 7, 6, 0, 1, 4, 10, 6, 4, 3, 6, 8, 7, 9, 4, 2, 0, 10, 9, 5, 2, 0, 4, 1, 0, 9, 3, 4, 8, 9, 3, 10, 9, 5, 5, 7, 0, 1, 7, 3, 8, 1, 9, 3, 0, 6, 1]


In [19]:
random.seed(0)
freq_analysis(lst)

{0: 9, 1: 6, 2: 10, 3: 13, 4: 8, 5: 9, 6: 9, 7: 6, 8: 8, 9: 12, 10: 10}

In [20]:
random.seed(0)
freq_analysis([random.randint(0, 10) for _ in range(1_000_000)])

{0: 90935,
 1: 91184,
 2: 91002,
 3: 91042,
 4: 90766,
 5: 91072,
 6: 90678,
 7: 90985,
 8: 90409,
 9: 91383,
 10: 90544}

Of course, it usually pays to know what's in the standard library :-)

The collections library has a Counter class that can be used to do this precise thing!

In [21]:
from collections import Counter

In [22]:
random.seed(0)
Counter([random.randint(0, 10) for _ in range(1_000_000)])

Counter({6: 90678,
         0: 90935,
         4: 90766,
         8: 90409,
         7: 90985,
         5: 91072,
         9: 91383,
         3: 91042,
         2: 91002,
         1: 91184,
         10: 90544})