Descriptive statistics describe data as I showed in the previous chapter. We can understand the data at hand by checking the measures and observing the plots. In descriptive statistics, we are just concluding the data it presents itself. Inferential statistics allows you to make predictions aka inferences from that data. With inferential statistics, you take data from samples and make generalizations about a population.

# Probability

The basis for statistical methods.

Just like measures of location and spread it is a quantifiable measure of belief in a particular event to happen. It's a way to quantify and describe uncertainty. Probability is a language to describe uncertainty.

## Experiment

It is any procedure that can be infinitely repeated and has a well-defined set of possible outcomes that can happen, this set is known as the sample space. For example, tossing a coin.

## Sample space $ S $

Is a set that contains all the possible outcomes of an experiment. For example $S = \{Heads, Tails\} $.

## Event

An event is a subset of the sample space $ S $. An event is a collection of elementary outcomes from the sample space $ S $ of an experiment, which is a subset of  $ S $.

From all the possible outcomes of an experiment, an event is all the occurrences from an experiment that matches a specific interest. It's a slice of the sample space.

### Examples

- **Experiment**: Die roll

**Sample space**: $ S = \{1,2,3,4,5,6 \} $

**Event**: $A = \text{an even score} = \{2, 4, 6 \} $ 

- **Experiment**: Die roll

**Sample space**: $ S = \{1,2,3,4,5,6 \} $

**Event**: $A = \text{score greater than 4} = \{5, 6 \} $ 

- **Experiment**: Coin toss

**Sample space**: $ S = \{Heads, Tails \} $

**Event**: $A = \text{Heads} = \{Heads\} $ 

- **Experiment**: Number of goals in a soccer match

**Sample space**: $ S = \{0, 1, 2, 3, 4,..., n \} $

**Event**: $A = \text{Less than 3 goals} = \{0, 1, 2\} $ 




## $ [0, 1] $ : the probability interval

A convention to define probability: the unit interval.

- An unlikely event has a probability of 0.

- A certain event has a probability of 1.

- If the probability of event A is greater than the probability of event B happening, then A is more likely to happen than B.

- Multiply it by 100 and you express probability as a percentage.

## The classical approach to probability

- Each outcome (member from the sample space) is equally likely to occur. 

- No bias towards a specific event.

For example, if the sample space has N members, the probability of any outcome in the sample space happening is 1 divided by N.

Let's say we're interested in knowing the probability  of event A happening:

# $$ P(A) = \frac{n}{N} $$

where:

- $ n $ is the number of outcomes that favor event A. It's a subset of the sample space.
- $ N $ is the number of all outcomes in a given sample space.

In [4]:
# Experiment: die roll
sample_space = [1, 2, 3, 4, 5, 6]
# Event A: an even score
event = [2, 4, 6]
probability = len(event) / len(sample_space)

print(f"The probability of getting an even score when rolling a die is: {probability} or {probability * 100}%")


The probability of getting an even score when rolling a die is: 0.5 or 50.0%


In [5]:
# Experiment: coin toss
sample_space = ["Heads", "Tails"]
# Event A: Heads
event = ["Heads"]
probability = len(event) / len(sample_space)

print(
    f"The probability of getting Heads when tossing a coin is: {probability} or {probability * 100}%"
)


The probability of getting Heads when tossing a coin is: 0.5 or 50.0%


In [6]:
# Experiment: die roll
sample_space = [1, 2, 3, 4, 5, 6]
# Event A: score greater than 4
event = [5, 6]
probability = len(event) / len(sample_space)

print(
    f"The probability of getting a score > 4 when rolling a die is: {round(probability, 4)} or {round(probability, 4) * 100}%"
)


The probability of getting a score > 4 when rolling a die is: 0.3333 or 33.33%


## Relative frequency

The classical approach to probability is also known as the frequentist approach. If you have an event, let's say event $ B $, there are only two options:

- event B does or does not occur.

You then repeat the experiment $ F $ times, for example, let's think about a coin toss experiment. Event $B$ is when you get heads. You repeat the experiment of tossing a coin $ F $ times. From those trials, in how many of them do you get event $ B $ = heads?


# $$ P(B) = \frac{f}{F} $$

💡 Randomness is a fact. We can't control it. We can repeat the same experiment and get different results under similar circumstances. That's why probability comes in handy. It's a way to describe that uncertainty.



## Probability is a language to describe uncertainty

Inferential statistics draws conclusions about data and that data is subject to randomness. This randomness might happen because of different reasons, e.g., how sampling happened, observational errors, etc.

If you have an experiment, for example, throwing a die. You can't know for certain what number will show up. Even if you have the same conditions on every throw. We can't control the random effect of this experiment and frequently these random factors are unknown to us. These unknown and uncontrollable variables that we can't measure will have a cumulative effect on the result. That's where probability comes handy. It's a language to describe that uncertainty. 

## Axioms of probability

An axiom is a statement or proposition which is regarded as being established, accepted, or self-evidently true.

- For any event $ A$, the probability of its occurrence is greater or equal to 0. 💡 Also remember that it must be less than 1.

# $$ P(A) \geq 0 $$

- The probability of the sample space $ S $ is 1.

# $$ P(S) = 1 $$

💡 If the sample space exhausts all possible outcomes for a given experiment one of them must happen. The members of the sample space are collectively exhaustive, i. e., all of their probabilities sum up to 1.

- If events are mutually exclusive, the probability of their union is the sum of the respective probabilities. Mutually exclusive means when two events can't happen at the same time.


# $$ P(\cup^n_{i=1} A_i) =  \sum^n_{i=1} P(A_i) $$




## Venn diagram

Expresses the sample space geometrically. For example, You throw two dice and sum the number on the upward faces.

- $ A$  = the sum is even

- $B$ = the sum is less than 6

- $C$ = the sum is greater than 2, but less than 11

$$ S = \{ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 \} $$

$$ A \cap B = \{ 2, 4 \} $$

$$ A \cap C = \{ 6, 8, 10 \} $$

$$ A \cap B \cap C = \emptyset $$

$$ (A \cup B \cup C)^c = \{ 11 \} $$

$$ A \cup B \cup C = \{ 2, 3, 4, 5, 6, 7, 8, 9, 10, 12 \} $$

$$ A|C =  \{ 6, 8, 10 \} $$

<p align="center">
  <img src="./imgs/venn_diagram.png" alt="Venn Diagram"/>
</p>



## The additive law

What's the probability that at least one of two events occurs? What's the probability of A or B occurring, or that both occur?

# $$ P(A \cup B) = P(A) + P(B) - P(A \cap B) $$

- 💡 We subtract the intersection of $A$ and $B$ because we don't want to count that slice twice since we're summing $A$ and $B$.

- 💡 The probability of an event not occurring is its complement, i. e., the probability of the sample space minus the probability of the event happening:

# $$ P(A)^c = 1 - P(A) $$


#### Mutually exclusive events

Event there is no way that both events happen at the same time.

# $$ P(A \cup B) = P(A) + P(B) - 0 $$


<p align="center">
  <img src="./imgs/mutually_exclusive.png" alt="Mutually exclusive events"/>
</p>

#### Collectively exhaustive events

It's a set of all the possible independent events in a sample space for a certain experiment.

For example the experiment of rolling a die. We might have the events:

### $$ A = \text{an even score} $$
### $$ B = \text{an odd score} $$

Events A and B are collectively exhaustive because it doesn't matter what is the output of the experiment one of these events must happen.


## The multiplicative law

What's the probability that two events occur? The events do not influence one another.

# $$ P(A \cap B) = P(A) P(B) $$


## 1. Probability problem

Two fair dice are rolled and the number on the upward faces are summed. What is the probability of the sum is:

- $ A $: odd
- $ B $: less than 7
- $ C $: exactly 10
- $ D $: exactly 3

<p align="center">
  <img src="./imgs/prob_problem_1.png" alt="Sample space"/>
</p>

# $$ S = 36 $$

# $$ P(A) = \frac{18}{36} = \frac{1}{2} = 0.5 $$

# $$ P(B) = \frac{15}{36} = 0.4166 $$

# $$ P(C) = \frac{2}{36} = \frac{1}{18} = 0.0555 $$

# $$ P(D) = \frac{2}{36} = \frac{1}{18} = 0.0555 $$


## 2. Probability problem

Let's imagine a game: 4 fair coins and one fair die are thrown together. You win if the number of tails obtained is greater than or equal to the score obtained from the die roll. What is the probability that you will win?

#### Sample space

- Possible outcomes from a coin toss: Heads or tails = 2
- Number of coins: 4
- Possible outcomes of a die roll: 6

# $$ S = 2 \times 2 \times 2 \times 2 \times 6 = 2^4 \times 6 = 96 $$

<p align="center">
  <img src="./imgs/prob_problem_2.png" alt="Sample space"/>
</p>

# $$ S = 96 $$

# $$ A = 30 $$

# $$ P(A) = \frac{30}{96} = \frac{5}{16} = 0.3125 $$



## 3. Probability problem

- The probability that I'll wake up late tomorrow is 0.4 = $ A $ 

- The probability that my phone will run out of battery during the night is 0.05 = $ B $

- The probability that I'll wake up late and my phone will run out of battery is $ 0.01$ 

What is the probability that I'll wake up late or that my phone will run out of battery?

# $$ P(A \cup B) = P(A) + P(B) - P(A \cap B) $$


# $$ P(A \cup B) = 0.4 + 0.05 - 0.01 = 0.44 $$


## 4. Probability problem

You roll two dice. What's the probability of:

- Getting 2 sixes?

# $$ P(A \cap B) = P(A)P(B) $$

# $$ P(A \cap B) = \frac{1}{6} \times \frac{1}{6} = \frac{1}{36} =  0.0277 $$

**Events are independent!**


- Getting at least one six?

# $$ P(A \cup B) = P(A) + P(B) - P(A \cap B) $$

# $$ P(A \cup B) = \frac{1}{6} + \frac{1}{6} - \frac{1}{36}  = 0.3056 $$


## More resources

I created a deck summarising what I cover in this Notebook:

https://pitch.com/public/86519e11-42a1-4f52-bd1e-037a93b88001

<iframe src="https://pitch.com/embed/86519e11-42a1-4f52-bd1e-037a93b88001" allow="fullscreen" allowfullscreen="" width="560" height="368" style="border:0"></iframe>