# Probability Axioms Lab

## Introduction

Now that you know what sets are, we can go on and work with two sets that are of key importance when talking about probability: the event space and the sample space. These two concepts are foundational for calculating probabilities when assuming each event in the event space *has a same probability of happening*. Typical examples are rolling a dice (if the dice is "fair", the chance of throwing each number between 1 and 6 is 1/6) and flipping a coin (1/2 heads vs tails). You'll get a better sense of how all of this works in this lab.

## Objectives

In this lab, you'll 
- learn how defining an event space and a sample space can help you calculate the probability of a certain event.
- learn how to create a function to test some of the probability axioms.

##  Exercise 1

#### a. Let's throw a dice once: formula of Laplace

First, create a set `roll_dice` that holds the sample space.

In [1]:
roll_dice = set(range(1,7))
roll_dice

{1, 2, 3, 4, 5, 6}

Now, let's assume that the event space is defined by "throwing a number higher than 4". This means that we consider the outcome "successful" if a 5 or a 6 is thrown. Create a set that holds these values.

In [2]:
event = {5,6}

Now use the formule $P(E) = \dfrac{\# A}{\#\Omega}$ to calculate the probability.

In [4]:
prob_5_6 = len(event)/len(roll_dice)
prob_5_6  # 0.3333333333333333

0.3333333333333333

Using this formula, it should be clear that the answer is 1/3 or 0.3333....  

#### b. Now, let's simulate rolling dice to see how the law of relative frequency works.

As mentioned in the lecture, the law of relative frequency can be used to prove certain probabilities. But how does this work exactly? You're about to find out!

$$P(E) = \lim_{n\rightarrow\infty} \dfrac{S{(n)}}{n}$$

As you can see in the formula, the law states that when repeating an experiment $n$ times, where $n$ is very big, and you divide the number of "good" outcomes by the sample space (here we call it event E), you get to the probability of the event E. It should be clear that we get a more accurate number for P(E) when $n$ grows.

Let's see how this works. First, let's randomly generate values between 1 and 6. You can use `numpy` (imported as `np`) to generate random integers between 1 and 6. by setting the correct arguments. 

In [19]:
import numpy as np
np.random.randint(1,7) #you will get a random value between 1 and 6. See how it changes when you rerun


6

Now, let's repeat this expermient 10 times, then 1000 times, then 1 million times, then 100 million times. 
You can do this by specifying the argument `size` within the numpy function used above. Store the values in the pre-defined variables below.

In [55]:
np.random.seed(12345) # to make sure there is no randomness

dice_10 = np.random.randint(1,7,size=10)
dice_1k = np.random.randint(1,7,size=1000)
dice_1m = np.random.randint(1,7,size=1000000)
dice_100m = np.random.randint(1,7,size=100000000)

next, let's count the number of "events". Remember that an event here is defined as throwing a 5 or a 6. Store them in the values below.

In [56]:
event_10 = np.sum(dice_10>4)
event_1k = np.sum(dice_1k>4)
event_1m = np.sum(dice_1m>4)
event_100m = np.sum(dice_100m>4)

Next, you'll divide the number of events for each $n$ by the respective values for $n$. What do you see?

In [57]:
prob_10 = event_10/10
prob_1k = event_1k/1000
prob_1m = event_1m/1000000
prob_100m = event_100m/100000000
prob_10, prob_1k, prob_1m, prob_100m  # 0.5 0.331 0.333657 0.33329752

(0.5, 0.331, 0.333657, 0.33329752)

You see that the probability converges to 0.3333333... for higher values of $n$. 

##  Exercise 2

You're working at the United Nations, and want to get a better sense of the world population. 

You come across some numbers and find the list of probabilities of being an inhabitant for each of the seven continents (rounded up to 3 digits):

- P(Africa) = 0.161
- P(Antarctica) = 0.000
- P(Asia) = 0.598
- P(Europe) = 0.10
- P(North-America) = 0.078
- P(Australia) = 0.005
- P(South-America) = 0.057

store these values using the variable names below:

In [87]:
P_afr = 0.161
P_ant = 0.000
P_as = 0.598
P_eur = 0.10
P_na = 0.078
P_aus = 0.006
P_sa = 0.057

Now create the sample space set names `continents`. Store the sample space in a numpy array.

In [88]:
continents = np.array([P_afr, P_ant, P_as, P_eur, P_na, P_aus, P_sa])
print(continents)

[0.161 0.    0.598 0.1   0.078 0.006 0.057]


We want to make sure that the three probability axioms are fulfilled, because they assure us that $(\Omega,E,P)$ is a **probability space**:

- if we have a sample space $S$ (or $\Omega$)
- if we have an event space $E$ and a probability measure $P$, 
- **and** the three probability axioms are fulfilled, 

The third axiom is fairly ad hoc, and you will basically have to deduct from the context whether individual events are independent. It is fairly straightforward, however, that people can not be inhabitants of two continents at the same time, so for now, we will assume that we're good for axiom three.

However, we can use the numpy array `continents` to verify if axiom 1 and 2 are fulfilled. Create a function "axioms" that returns the message "We're good!" if both axiom 1 and 2 are fulfilled, and "Not quite!" if that's not the case.

In [89]:
def check_axioms(sample_space):
    if (sample_space >=0).all() and (sample_space <=1).all() and np.sum(sample_space) == 1:
        return "We're good!"
    else:
        return 'Not quite!'

Now test your newly created function out on `continents`

In [90]:
check_axioms(continents)

"We're good!"

You want to make sure your test returns `"Not quite!"` for the following numpy arrays. Go ahead and test away!

In [91]:
test_1 = np.array([0.05, 0.2, 0.3, 1.01])
test_2 = np.array([0.05, 0.5, 0.6, -0.15])
test_3 = np.array([0.043, 0.05,.02,0.3,0.2])

In [92]:
check_axioms(test_1)
check_axioms(test_2)
check_axioms(test_3)

'Not quite!'

Great! We tested it and seems like our set `continents` is a true probability space.

## Exercise 3 (extra)

Harry Potter example: probability of being assigned to a certain "house" (Gryffindor, etc) is 0.25 for each student. 

Probabilities are independent, and probability axioms can also be tested here.

# Sources

https://en.wikipedia.org/wiki/Probability_axioms

https://www.datacamp.com/community/tutorials/statistics-python-tutorial-probability-1
