© Jacob White 2025, All Rights Reserved

# Statistics and Data Analysis

# Chapter 2 - Review of Probability

## 2.1 Basic ideas

- **Random experiment**: Procedure or an operation whose outcome is uncertain and cannot be predicted in advance.
- **Sample space**: Set of all possible outcomes of a random experiment. Denoted in this course by $S$. 

### Simulating different random experiments in Python

#### 1. Single coin toss

To simulate tossing a single fair coin in Python, and in general to perform any random experiment, we can use the `random` module or the `numpy.random` module. The latter is more robust, so we'll use that.

In [2]:
import numpy as np

flip = np.random.randint(0, 2) # Generates 0 or 1, corresponding to heads or tails respectively.
print(f'Result: {flip}')

Result: 0


Alternatively, we can select a random choice from a list containing the strings 'Heads' and 'Tails'

In [3]:
sample_space = ['Heads', 'Tails']
print(f'Result: {np.random.choice(sample_space)}')

Result: Tails


Note that, in the latter case using `np.random.choice`, we have to explicitly write out the entire sample space. 

#### 2. Multiple coin tosses

We can simulate multiple coin tosses in the following fashion. Using `np.random.randint`:

In [4]:
num_flips = 2 # Toss a fair coin twice, or equivalently two fair coins at once.
flips = np.random.randint(0, 2, size = 2)
print(f'Result for multiple flips: {flips}')

Result for multiple flips: [0 1]


Alternatively, again, selecting a random choice from a list describing the sample space for a single coin toss:

In [5]:
num_flips = 2
sample_space = ['Heads', 'Tails']
flips = np.random.choice(sample_space, size = num_flips)
print(f'Result for multiple flips: {flips}')

Result for multiple flips: ['Heads' 'Heads']


#### 3. Lifetime of a car battery

The first two examples above have **discrete sample spaces** and are denumerable. We can also consider random experiments with **continuous sample spaces**, such as the observation of the lifetime of a car battery. Say, a car battery can last 100 months at most, and so its lifetime lies somewhere in the interval $[0, 100]$, where a lifetime of 0 corresponds to a defective battery at the point of manufacture. 

We can use the `numpy.random` module to study continuous sample spaces as well. 

In [6]:
lifetime = np.random.uniform(low = 0.0, high = 100.0, size = None) # Generates a random float in the interval [0, 100] 
print(f'Lifetime: {lifetime}')

Lifetime: 29.699747724266402


Again, we can also generate random lifetimes for multiple batteries:

In [7]:
num_batteries = 10
lifetimes = np.random.uniform(low = 0.0, high = 100.0, size = num_batteries)
print(f'Lifetimes: {lifetimes}')

Lifetimes: [78.85831564 48.42787598 28.50435607 30.50484688 25.03149858 26.83189939
 91.46914471 20.6550159  71.44702701  6.48744607]


### Axioms of Probability

Let $S$ denote the sample space of an experiemnt. Associated with each event $A$ in $S$ is a number, $P(A)$ , called the **probability** of $A$, which satisfies three axioms:
1. $P(A) \geq 0$,
2. $P(S) = 1$,
3. If $A$ and $B$ are mutually exclusive events, i.e. $A \cap B = \emptyset$, then $P(A \cup B) = P(A) + P(B)$. 

We have some basic results by virtue of the above 3 axioms:
- $P(A^c) = 1 - P(A)$,
- $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ for any two events $A, B$,
- For any two events $A$ and $B$, $P(A) = P(A \cap B) + P(A \cap B^c)$ (Caratheodory's criterion from GMT!),
- If $B \subset A$, then $A \cap B = B$. Therefore, $P(A) - P(B) = P(A \cap B^c)$ and $P(A) \geq P(B)$.
- **Bonferroni inequality**: $$P(A \cup B) \leq P(A) + P(B).$$
- **Inclusion-exclusion principle**: $$P(A_1 \cup \cdots \cup A_n) = \sum_{i = 1}^n P(A_i) - \sum_{i \neq j} P(A_i \cap A_j) + \sum_{i \neq j \neq k} P(A_i \cap A_j \cap A_k) - \cdots + (-1)^{n - 1} P(A_1 \cap A_2 \cap \cdots \cap A_n).$$

### Counting with Python

**Counting formula 1:** A **permutation** is an ordered arrangement of distinct items. The total number of permutations of $n$ distinct items is $$n(n - 1) \cdots (2)(1) = n!$$

In Python, we call the `factorial` function from the `math` module:

In [8]:
import math
n = 5
# Compute n!
result = math.factorial(n)
print(result)

120


We can also generate permutations of a given set using the `permutation` function from the `itertools` module. Say I want to generate all ordered arrangements of a list of 3 flavors of vitaminwater®:

In [9]:
import itertools
flavors = ['Orange', 'Focus Kiwi Strawberry', 'Refresh Tropical Mango'] # Refresh tropical mango is probably my favorite
# Since there are 3 flavors, we should expect to see a list of 3! = 6 permutations.
permutations = list(itertools.permutations(flavors))
for _ in permutations:
    print(_)


('Orange', 'Focus Kiwi Strawberry', 'Refresh Tropical Mango')
('Orange', 'Refresh Tropical Mango', 'Focus Kiwi Strawberry')
('Focus Kiwi Strawberry', 'Orange', 'Refresh Tropical Mango')
('Focus Kiwi Strawberry', 'Refresh Tropical Mango', 'Orange')
('Refresh Tropical Mango', 'Orange', 'Focus Kiwi Strawberry')
('Refresh Tropical Mango', 'Focus Kiwi Strawberry', 'Orange')


**Counting formula 2**: The number of permutations of $r$ items out of $n$ distinct items is $$n(n - 1)(n - 2) \cdots (n - r + 1) = \frac{n(n - 1) \cdots (n - r + 1)(n - r) \cdots 1}{(n - r)(n - r - 1) \cdots 1} = \frac{n!}{(n - r)!}.$$

There is also the "pick" notation $_{n}P_r = \frac{n!}{(n - r)!}$. This quantity is computed in Python as follows using the `math` module:

In [10]:
# Number of distinct items
n = 3
# Number of items being picked
r = 2

nPr = math.perm(n, r)
print(nPr)

6


We can again use the `itertools` module to generate the different $_nP_r$ permutations of $r$ items out of $n$ distinct items. Back to our vitaminwater® example above:

In [11]:
# Let's generate all permutations of 2 flavors out of 3 flavors of vitamin water, of which there are 3 P 2 = 6:
perms = itertools.permutations(flavors, 2)
for _ in perms:
    print(_)

('Orange', 'Focus Kiwi Strawberry')
('Orange', 'Refresh Tropical Mango')
('Focus Kiwi Strawberry', 'Orange')
('Focus Kiwi Strawberry', 'Refresh Tropical Mango')
('Refresh Tropical Mango', 'Orange')
('Refresh Tropical Mango', 'Focus Kiwi Strawberry')


**Counting formula 3**: The number of *unordered* arrangements of $r$ items out of $n$ (number of **combinations**) is $$\frac{n!}{r!(n - r)!} = \binom{n}{r}.$$ 

In Python, we compute this using `math.comb()` after importing the math module:

In [12]:
# Number of distinct items
n = 3
# Number of items being picked
r = 2

nCr = math.comb(n, r)
print(nCr)

3


Using `itertools` to generate the different $_nC_r$ combinations of $r$ items out of $n$ distinct items, we have:

In [13]:
combs = itertools.combinations(flavors, 2)
for _ in combs:
    print(_)

('Orange', 'Focus Kiwi Strawberry')
('Orange', 'Refresh Tropical Mango')
('Focus Kiwi Strawberry', 'Refresh Tropical Mango')


**Counting formula 4**: This is the **multinomial coefficient**, which is computed when we need to count the number of ways of classifying $n$ objects into $k$ groups ($k \geq 2$) so that there are $r_i$ in group $i$ ($1 \leq i \leq k$), $r_1 + r_2 + \cdots + r_k = n$. It is given by $$\binom{n}{r_1, r_2, \dots, r_k} := \frac{n!}{r_1! r_2! \cdots r_k!}.$$

We can either code our own multinomial coefficient calculator, or we can use the `scipy` module. Unfortunately, there's some version issue right now with `scipy` (of course there is) and I'll have to come back to fix it later.

## 2.2 Conditional Probability and Independence

The **conditional probability** of an event $A$ given an event $B$ with $P(B) > 0$ is defined as $$P(A | B) = \frac{P(A \cap B)}{P(B)}.$$

We can see that the formula above is true experimentally. Let's simulate an experiment of tossing two fair dice and let $$A = \{\text{Sum of the numbers on the dice is 4 or 8}\}, \qquad B = \{\text{Sum of the numbers on the dice is even}\}.$$ Our sample space $S$ has 36 elements, and $$A = \{(1, 3), (2, 2), (3, 1), (2, 6), (3, 5), (4, 4), (5, 3), (6, 2)\}$$ while $$B = \{(1, 1), (1, 3), (1, 5), (3, 1), (3, 3), (3, 5), (5, 1), (5, 3), (5, 5), (2, 2), (2, 4), (2, 6), (4, 2), (4, 4), (4, 6), (6, 2), (6, 4), (6, 6)\}.$$ With $|A| = 8, |B| = 18$, and $A = A \cap B$, we have $$P(A | B) = \frac{P(A \cap B)}{P(B)} = \frac{8/36}{18/36} = \frac{4}{9} \approx 0.44.$$

Let's generate a random sample of die tosses using Python. From this sample, we will compute the (experimental) conditional probability of a two-dice toss belonging to $A$, given that it already belongs to $B$. 

In [132]:
import numpy as np
n = 10000 # Number of tosses
tosses = [tuple(np.random.randint(1, 7, size=2)) for _ in range(n)]

# To compute the experimental probability of B, we see what proportion of tosses sum to an even number:
n_B = 0
for toss in tosses:
    if sum(toss)%2 == 0:
        n_B += 1
print(f'Experimental probability of B: {n_B/n}')

# Now, compute the experimental probability of A and B:
n_AaB = 0

for toss in tosses:
    if sum(toss)%2 == 0 and (sum(toss) == 4 or sum(toss) == 8):
        n_AaB += 1

print(f'Experimental probability of A and B: {n_AaB/n}')

# Conditional probability of A given B

print(f'Conditional (experimental) probability of A given B = {n_AaB/n_B}')

Experimental probability of B: 0.5098
Experimental probability of A and B: 0.2267
Conditional (experimental) probability of A given B = 0.4446841898783837


For large $n$, we see that the experimental probabilities closely align with the theoretical probabilities computed above. However, we can also compute this another way which is equivalent to computing the left hand side of the conditional probability formula. To do this, we filter `tosses` for those tosses that lie in $B$, and then this becomes our new sample space from which we compute the probability of $A$.

In [142]:
B = []
# Generate B based off all samples of two-dice tosses.
for toss in tosses:
    if sum(toss)%2 == 0:
        B.append(toss)
# Now, we compute P(A) "relative" to the sample space B.

n_A = 0
for toss in B:
    if sum(toss) == 4 or sum(toss) == 8:
       n_A += 1

print(f'Conditional (experimental) probability of A given B, computed directly: {n_A/len(B)}')

Conditional (experimental) probability of A given B, computed directly: 0.4446841898783837


We see that either way, we get the same answer. While the first way requires more code, it is actually more practical to compute in practice. 