In this chapter, we begin our analysis of randomness. To start off, we will use Python to make choices at random. In numpy there is a sub-module called random that contains many functions that involve random selection. One of these functions is called choice. It picks one item at random from an array, and it is equally likely to pick any of the items. The function call is np.random.choice(array_name), where array_name is the name of the array from which to make the choice.

Thus the following code evaluates to treatment with chance 50%, and control with chance 50%.

In [1]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [2]:
two_groups = np.array(('treatment', 'control'))
np.random.choice(two_groups)

'control'

The big difference between the code above and all the other code we have run thus far is that the code above doesn't always return the same value. It can return either treatment or control, and we don't know ahead of time which one it will pick. We can repeat the process by providing a second argument, the number of times to repeat the process.

In [3]:
np.random.choice(two_groups, 10)

array(['treatment', 'treatment', 'treatment', 'treatment', 'control',
       'control', 'treatment', 'treatment', 'control', 'control'],
      dtype='<U9')

Booleans and Comparison 
In Python, Boolean values, named for the logician George Boole, represent truth and take only two possible values: True and False. Whether problems involve randomness or not, Boolean values most often arise from comparison operators. Python includes a variety of operators that compare values. For example, 3 is larger than 1 + 1.

In [4]:
3 > 1 + 1

True

In [6]:
5 == 10/2

True

In [7]:
x = 12
y = 5
min(x, y) <= (x+y)/2 <= max(x, y)

True

Comparing Strings 
Strings can also be compared, and their order is alphabetical. A shorter string is less than a longer string that begins with the shorter string.

In [9]:
'Dog' > 'Catastrophe' > 'Cat'

True

Let's return to random selection. Recall the array two_groups which consists of just two elements, treatment and control. To see whether a randomly assigned individual went to the treatment group, you can use a comparison:

In [10]:
np.random.choice(two_groups) == 'treatment'

False

As before, the random choice will not always be the same, so the result of the comparison won't always be the same either. It will depend on whether treatment or control was chosen. With any cell that involves random selection, it is a good idea to run the cell several times to get a sense of the variability in the result.

## Comparing an Array and a Value

In [12]:
tosses = np.array(('Tails', 'Heads', 'Tails', 'Heads', 'Heads'))
tosses == 'Heads'

array([False,  True, False,  True,  True])

In [13]:
np.count_nonzero(tosses == 'Heads')

3