# Probabilty Fundamentals

![xkcd](increased_risk_2x.png)

[xkcd comic 1252](https://xkcd.com/1252/)

Learning goals for today:
- Develop a definition of probability beyond coin tosses  
- Describe set theory and its terminology
- defining the size of sets with permutations and combinations


## Learning Goal 1:

### Openning Task:

As a group, make a list of commonalities among the three scenarios.

What does each one have to do with probability?

### Scenario 1

> Paul has the option between a high deductible plan and a low deductible plan for health insurance. 
If Paul chooses the low deductible plan he will pay the first 1000 dollars of the any medical costs. The low deductible plan costs 8000 dollars.<br>
> If Paul chooses the high deductible plan he will pay the first 2500 dollars of any medical costs. The high deductible plan costs 7500 dollars. <br>
> Paul found a table of data on the frequency of medical costs. Based on this table, which should he choose?

| Cost | Probability|
|:----:|:-------:|
|0 | 30% |
|1000 | 25%|
|4000 | 20% |
|7000 | 20% |
|15000 | 5% |

### Scenario 2

>A drawer contains red socks and black socks. When two socks are drawn at random, the probability that both are red is 1/2. <br> 
>a) How small can the number of socks in the drawer be?<br>
>b) How small if the number of black socks is even?

[the sock problem](https://engineering-math.org/2017/05/10/the-sock-drawer-probability-and-statistics-problem/)

### Scenario 3

>Among engineers, risk is defined as a product of probability of the occurrence of an undesired event and the expected consequences in terms of human, economic, and environmental loss. These two components are equally important; therefore, the appropriate estimation of these values is a matter of great significance. This paper deals with one of these two components—the assessment of the probability of vessels colliding, presenting a new approach for the geometrical probability of collision estimation on the basis of maritime and aviation experience. The geometrical model that is being introduced in this paper takes into account registered vessel traffic data and generalised vessel dynamics and uses advanced statistical and optimisation methods (Monte Carlo and genetic algorithms). The results obtained from the model are compared with registered data for maritime traffic in the Gulf of Finland and a good agreement is found.

[Probability modelling of vessel collisions](https://www.sciencedirect.com/science/article/pii/S0951832010000256)

__What is probability?__

__What should I care about probabilities?__<br>
Studying probabilities allows us to make better and more informed decisions, based on data previously collected. For example, understanding the fact that it is nearly impossible for us to ever win the lottery from a probabilistically stand point deters us from ever relying on that as a source of income. <br>
Probability theory also lies in the heart of making inference using our data, which is what statistics is all about!

## II. Set Theory
In probability theory, a set is denoted as a well-defined collection of objects.
Mathematically, you can define a set by $S$. If an element $x$ belongs to a set $S$, then you'd write $x \in S$. On the other hand, if $x$ does not belong to a set $S$, then you'd write $x\notin S$.

__2.1 Subsets__ <br>
Set $T$ is a subset of set $S$ if every element in set $T$ is also in set $S$. The mathematical notation for a subset is $T \subset S$.

The empty set, $\emptyset$ or { }, is a subset of every set. Do you see why?

__2.2 Set Operations__ <br>

    - Union of Two Sets: The union of 2 sets S and T is the set of elements that belong either to S or to T (or to both).
    
    - The intersection of two sets S and T is the set of all elements that belong *both* to S and to T.

We are trying to create rooming arrangements based on staff interest for a staff trip. <br>
Who should room with whom based on interests?

This is another way to look at sets.<br>
And we can still use the math!

In [26]:
Robin = set(["art", "traveling", "wine", "doodling", "tech", "gadgets"])
Rob = set(["rock-climbing", "traveling", "dad jokes", "ice cream"])
Alison = set(["wine", "traveling", "schitts creek", "dogs"])
Su = set(["schitts creek", "dogs", "tarot card reading", "croquet", "taxonomy"])
Molly = set(["wine", "ice cream", "dogs", "zookeeping", "traveling"])

In [1]:
Robin.intersection(Alison)

In [2]:
Rob.intersection(Alison)

In [3]:
Alison.intersection(Su)

In [4]:
Su.union(Molly)

In [5]:
Molly.intersection(Rob).intersection(Robin)

In [6]:
Alison.intersection(Su)

**Task**:

- Try drawing the Venn diagram of interests for Robin and Rob. Now add in a circle for Alison. How many 'sections' should the new diagram have? What about adding in a fourth circle for Su? And a final circle for Molly?


## Foundations of (Independent) Probabilities 

What's the probability that a staff person likes wine?

That's a very specific probability example.

But there are other applications and terminology that are important for probability. 


In this section, we will introduce you to the foundation of independent probability theory. Later on in the course, you will be introduced to concepts such as conditional probability and probability of dependent events.

__Terminology Alert__ 
- Random Variable
    - A random variable is a variable whose outcome is the result of a random phenomenon which can take on different values
    - A random variable can either be discrete or continuous
        - __Discrete__ : the variable takes on integer values
        - __Continous__ : can take on any values

####  Probability of A and B 
<center>$P(A and B) = P(A) * P(B)$</center>

What's the probability that a staff person likes wine *and* likes dogs?

#### Probabilities of A or B
<center>$P(A  or  B) = P(A) + P(B) - P(A  and  B)$</center>

What's the probabilty that someone like ice cream *or* traveling?

What happens when you have multiple events? 

$$ P(A orB or C) = P(A) + P(B) + P(C) - P(A and B) - P(A and C) - P(B and C) + P(A&B&C) $$

Can you explain the above formula?

## Permutations & Combinations
Help us define the full *set* of options related to a probability

**Permutation**
    - ordering matters
    - how many different arrangement can you get out of a number of elements?
    - possible number of arrangement $r$ out of a total of $n$ elements is given by: <br/>
    $n! / (n – r)!$ 

```
from itertools import permutations 
l = list(permutations(range(1, 4))) 
print l
```

#### Scenario:

You are trying to break the code - to hack into the mainframe, and stop the KGB from launching US missiles remotely.

You know the password is some 5 letter anagram of a subset of the word "pochemuchka"

How many words potential passwords are there? ie, how large is the **set** of password options?

In [33]:
from itertools import permutations 
l = list(permutations("pochemuchka", 5)) 

In [34]:
len(l)

55440

In [35]:
len(set(l))

22050

What's the probability that the password starts with p?

**Combination:**
    - ordering does not matter
    - how many different selections can you get out of number of elements?
    - possible number of selections $r$ out of a total of $n$ elements given by: <br/>
    $n! / (r ! * (n – r)!)$

    - Example: How many ways are there of choosing a five-card poker hand from a deck of 52 cards?

#### Scenario A
- Combinatorics in specific scenario
    - What is the probability of getting exactly 3 heads out of 5 fair coins? 
    - What is the probability of getting at least 3 heads out of 5 tosses?
- The Binomial Theorem tells us the probability of k successes on n binary trials:

$P(n, k, p)$ = ${n}\choose {k}$ $p^k (1-p)^{n-k}$

Can you explain this equation?

#### Scenario B
if you used combinations rather than permutations to figure out the kgb password - how many passwords would you potentially miss?

In [36]:
from itertools import combinations 
l = list(combinations("pochemuchka", 5)) 

In [37]:
len(set(l))

452