**Table of contents**<a id='toc0_'></a>    
- [Probability](#toc1_)    
  - [Key concepts](#toc1_1_)    
    - [Sample space](#toc1_1_1_)    
    - [Events](#toc1_1_2_)    
    - [Let's get some numbers!](#toc1_1_3_)    
- [Normal distribution](#toc2_)    
  - [Height: A binomial distribution](#toc2_1_)    
- [**Key Takeaway: When we look at data distribution (i.e. the sample space), we are basically looking at the probability of different events happening**](#toc3_)    
- [Extra: Resources](#toc4_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Probability](#toc0_)

> Probability is the branch of mathematics concerning numerical descriptions of **how likely an event is to occur**, or **how likely it is that a proposition is true**. The probability of an event is a number between 0 and 1, where, roughly speaking, 0 indicates impossibility of the event and 1 indicates certainty. ([Wikipedia](https://en.wikipedia.org/wiki/Probability))

## <a id='toc1_1_'></a>[Key concepts](#toc0_)

> **Sample space** = the set of all possible outcomes or results of an experiment ([Wikipedia](https://en.wikipedia.org/wiki/Sample_space))

**Event** - a subset of the sample space, i.e. one/more possible outcomes of an experiment

### <a id='toc1_1_1_'></a>[Sample space](#toc0_)

What is the sample space of an experiment involving coin tosses?

![Coin](https://imgs.search.brave.com/M8qJHJ7NOlyVk1yfwVkmXQVPBaiCVrkeRnTQOIwVHvo/rs:fit:860:0:0/g:ce/aHR0cHM6Ly93d3cu/dGhlc3BydWNlY3Jh/ZnRzLmNvbS90aG1i/L0ltV25lVkgyY2hE/T2JUc3FtLXJNRlhz/aUY5OD0vMTUwMHgw/L2ZpbHRlcnM6bm9f/dXBzY2FsZSgpOm1h/eF9ieXRlcygxNTAw/MDApOnN0cmlwX2lj/YygpL1VTMDA1MC0y/MDE0LVctS2VubmVk/eS1IYWxmLURvbGxh/ci1SZXZlcnNlLVBy/b29mLVNpbHZlci01/NmExNzhkNjNkZjc4/Y2Y3NzI2YWZkMzcu/anBn)

What about dice throwing?

![Dice](https://imgs.search.brave.com/3bZx1ow2ARu-HK-gaYQg2Rs65e3mlyCDCTwNHw48o64/rs:fit:860:0:0/g:ce/aHR0cHM6Ly90My5m/dGNkbi5uZXQvanBn/LzAzLzk4LzEyLzU2/LzM2MF9GXzM5ODEy/NTY5Ml9IUk1OaVRX/OURGWElrRW9ob3pG/OEJCWGVwYXZYTGpx/VC5qcGc)

What if we had 2 dices instead?

![Dice](https://imgs.search.brave.com/3bZx1ow2ARu-HK-gaYQg2Rs65e3mlyCDCTwNHw48o64/rs:fit:860:0:0/g:ce/aHR0cHM6Ly90My5m/dGNkbi5uZXQvanBn/LzAzLzk4LzEyLzU2/LzM2MF9GXzM5ODEy/NTY5Ml9IUk1OaVRX/OURGWElrRW9ob3pG/OEJCWGVwYXZYTGpx/VC5qcGc) ![Dice](https://imgs.search.brave.com/3bZx1ow2ARu-HK-gaYQg2Rs65e3mlyCDCTwNHw48o64/rs:fit:860:0:0/g:ce/aHR0cHM6Ly90My5m/dGNkbi5uZXQvanBn/LzAzLzk4LzEyLzU2/LzM2MF9GXzM5ODEy/NTY5Ml9IUk1OaVRX/OURGWElrRW9ob3pG/OEJCWGVwYXZYTGpx/VC5qcGc)

### <a id='toc1_1_2_'></a>[Events](#toc0_)

What are the events that can occur for tossing a coin?

What about throwing a dice?

What about throwing 2 dice?

### <a id='toc1_1_3_'></a>[Let's get some numbers!](#toc0_)

**Event:** Landing heads when tossing a coin once?  
**Probability:** ?

In [None]:
possibilities = ['heads', 'tails']
event = 'heads'

**Assumption:** There is no bias, i.e. getting head or tails is just as likely.  
**Chance:** 1 in 2

In [None]:
1/2 * 100

**Event:** Landing an even number when throwing a dice?  
**Probability:** ?

In [None]:
possibilities = [1, 2, 3, 4, 5, 6]
event = [2, 4, 6]

**Assumption:** There is no bias, i.e. getting any of the numbers is just as likely.  
**Chance:** 3 in 6

In [None]:
3/6 * 100

**Event:** Landing two even numbers when throwing two dices?  
**Probability:** ?

In [None]:
from itertools import product

In [None]:
possibilities = list(product([1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]))
print(possibilities)
print('There are', len(possibilities), 'possibilities')

In [None]:
event = [(2, 2), (2, 4), (2, 6), (4, 2), (4, 4), (4, 6), (6, 2), (6, 4), (6, 6)]
print('There are', len(event), 'event possibilities')

In [None]:
chance = 0
for combo in event:
    chance += possibilities.count(combo)
    
print('There is a', chance, 'in', len(possibilities), 'chance that the dices will show 2 even numbers')

**Assumptions:** 
- There is no bias, i.e. getting any of the numbers is just as likely.  
- The first roll dice doesn't influence the second roll dice, i.e. they are independent of each other.  

**Chance:** 9 in 36

In [None]:
9/36 * 100

# <a id='toc2_'></a>[Normal distribution](#toc0_)

We've already covered the normal distribution in terms of shape:

![image.png](https://imgs.search.brave.com/890nJAyetvjXKqV3Dcfu4-Pj-BUk56L5MSP-zc5wArg/rs:fit:860:0:0/g:ce/aHR0cHM6Ly90NC5m/dGNkbi5uZXQvanBn/LzA1LzY4Lzk1LzU5/LzM2MF9GXzU2ODk1/NTk2MV9Pc0dkYmpo/MXFQa1N5czlsVU1z/bTVQa3VzTTdGR1B2/SC5qcGc)

But we haven't looked at any examples of a normal distribution and what drives it! 

## <a id='toc2_1_'></a>[Height: A bimodal normal distribution](#toc0_)

For height, we don't have exactly a normal distribution but a bimodal normal distribution, meaning that we have 2 normal distributions superimposed on each other, in this case representing the sex of individuals.

![](https://imgs.search.brave.com/IaGCI0_UBGutcGU-GpfLRhtrKJUeN7Gsnvs9tU9Ihso/rs:fit:860:0:0/g:ce/aHR0cHM6Ly90YWxs/LmxpZmUvd3AtY29u/dGVudC91cGxvYWRz/LzIwMTYvMDEvTmV3/LWhlaWdodC1kaXN0/cmlidXRpb24td2l0/aC1sZWdlbmQuanBn)

**Event:** Being 175cm tall when you're a woman/man?  
**Probability (Likelihood if we talk about a single individual):** We can read it from the chart!

But why does height fall under a normal distribution? 

It's estimated that height is [80% determined by genetics](https://medlineplus.gov/genetics/understanding/traits/height/) and there are **more than 700 genes** involved in determining height!

In a very simplified manner, each gene will choose between two options: make someone taller or shorter. Then, all the 700 decisions are added together to determine the height of the person.

**Assumptions:**
- There is no bias, i.e. each gene contributes the same amount to height (not necessarily true, but an approximation)  
- The decision of a gene doesn't affect the decision of another gene (not necessarily true, but an approximation)

**Chance:** Let's look at a [Galton Board](https://www.compadre.org/osp/EJSS/3965/109.htm) to get an intuition for how 700 independent decisions would look like!

# <a id='toc3_'></a>[**Key Takeaway: When we look at data distribution (i.e. the sample space), we are basically looking at the probability of different events happening**](#toc0_)

# <a id='toc4_'></a>[Extra: Resources](#toc0_)

If you're interested in learning more about the awesome world of statistics & probabilities, below are some great resources:

**From the ex-Chief Decision Scientist @Google:**
- [Bayesian vs Frequentist Probability](https://www.youtube.com/watch?v=GEFxFVESQXc) - basic intuition
- [Bayesian vs Frequentist Probability](https://towardsdatascience.com/statistics-are-you-bayesian-or-frequentist-4943f953f21b) - deep dive

**StatQuest:**
- [Expected Values](https://www.youtube.com/watch?v=KLs_7b7SKi4)
- [Conditional Probabilities](https://www.youtube.com/watch?v=_IgyaD7vOOA)
- [Bayes Theorem](https://www.youtube.com/watch?v=9wCnvr7Xw4E)