<img src="https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png" style="float: left; margin: 15px;">
### Counting and Probability

Week 3 | Lesson 2.1

---
| TIMING  | TYPE  
|:-:|---|---|
| 25 min| [Review: Linear Algebra](#review) |
| 10 min| [Counting](#counting) |
| 45 min| [Counting Lab](#counting_lab) |
| 20 min| [Probability](#probability) |
| 20 min| [Probability Lab](#probability_lab) |

---

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Solve counting problems using permutations and combinations 
- Tame uncertainty with classical, frequentist, and bayesian probability 

### STUDENT PRE-WORK
*Before this lesson, you should already be able to:*
- Be able to code basic python syntax
- Be comfortable using packages

### INSTRUCTOR PREP
*Before this lesson, instructors will need to:*
- Review Learning Objectives 
- Review problems discussed during lesson

---
<a name="review"></a>
### Review:  
<a href=https://docs.google.com/presentation/d/1BV3IZdDuuPWtd1BwFgiKC_MZ54A-cxMOgt82GZWvmo4/edit> Linear Algebra for ML </a>

<a href=http://math.arizona.edu/~brio/VIGRE/ThursdayTalk.pdf> Application of SVD to Image Compression </a>

---
<a name="counting"> </a>

### The Counting Principle

---

The counting principal says that:

If $A$ can occur in $n$ ways 

and $B$ can occur in $m$ ways

then $A$ and $B$ can occur in $n * m$ ways.

Counting is a fundamental component of probability. From counting, we can start to examine combinations and permutations of events and their associated probabilties.

**Q: If you have two groups of students, 10 freshman and 11 sophmores, how many teams of 2 can be formed that include one member from each class?**

In [1]:
#A: 

**Q: How many ways can we award 1st, 2nd, and 3rd prize among eight contestants? **

<img src=https://betterexplained.com/wp-content/uploads/math/permuations/permuation_medals.png>

### The factorial function

---

The factorial function is defined as: 

### $$ n! = n \cdot (n-1) \cdot (n-2) \; ... \; (n-1) $$

This is an important part of permutations. When $n = k$, permutations of $n$ in $k$ arrangements is equivalent to $n!$

The permutation function can be rewritten using factorials:

### $$  \text{permutations}(n, k) = \frac{n!}{(n-k)!} $$

In [None]:
#A: 

**Q: If there are 9 players on a baseball team, how many possible permutations are there for the first three players in the batting lineup?**

In [2]:
from math import factorial

# A:

**Q: Code the factorial function.**

Try writing a function that will compute the factorial of a number.

In [3]:
#A: 

<a id='permutation'></a>

### Permutations

---

Permutations ask "How many different ways can a subset be generated from a set. An example of this type of question is:

- "How many distinct groups of three students can I make from a class of 6?"

This assumes that the **order matters** because we want distinct groups!

A permutation is an arragement of objects where the order is important. The permutations of a set of items or events are the total number of order-dependent arrangements possible.

Given a set of items of size $n$ and arrangements of length $k$, the permutations are calculated like so:

### $$ \text{permutations}(n, k) = n \cdot (n - 1) \cdot (n - 2) \; ... \; (n - k + 1)$$

which is 0 when $k > n$

This can be generalized out to the following equation for **permutation without replacement**:

$$ P(n, k) = \frac{n!}{(n - k)!} $$

Let's assume that we have six wonderful students in our class:

- George Washington
- John Adams
- Thomas Jefferson
- James Madison
- James Monroe
- John Quincy Adams

We want to give out 10 points of extra credit to one student, 5 points to the next, and 1 point to a third student. We want a permutation of these students!

Every time we take out a person, we can't choose them a second time. That means that our full set of choices is $6!$. First we have 6 choices, then 5 choices, then 4 choices, etc.

However, we just want to make 3 choices every time we try -- we can represent this as $3!$. This gives us $6!$/$3!$

What we want is something that equates to $6 \cdot 5 \cdot 4$ -- first we choose from 6 things, then we choose from 5 things, then we choose from 4 things.


**Q: There are 5 qualified candidates to fill 3 positions.  How many ways can the roles be filled, assuming only one person is hired for each position?**

In [4]:
# A:

<a id='combination'></a>

### Combinations

---

Like permutations, combinations are arrangements of objects. However, with combinations, the order of objects does not matter.

Given a set of items of size $n$ and arrangements of length $k$, the combinations are calculated like so:

### $$ \text{combinations}(n, k) = \frac{ \text{permutations}(n,k) }{ \text{permutations}(k, k) } = \frac{n!}{(n-k)!k!}$$

You'll _also_ see this referred to as the binomial coefficient and written as:

$$ \binom{n}{k}$$

which is read n choose k or, from a group of n choose a group of k!

An intuitive way to think about this is that we take the number of permutations, then we divide that by the number of possible orderings we could have in the available slots $k$.

### With Replacement

The examples above assume that there once we've chosen something, it's gone for good. That's sometimes not the case! Sometimes we choose something and then put it back and choose again.

For permutation with replacement, we represent that as:

$$ P^R(n, k) = n^k $$

We want all the different possibilities that we can make $k$ draws out of $n$ -- you can imagine it like $6 \cdot 6 \cdot 6$ (which is, of course, $6^3$).

If we were replacing the students back and assigning students some extra credit, we'd have 216 different permutations of students.

For combinations with replacement, we represent that as:

$$ C^R(n, k) = \frac{(n + r -1)!}{r! (n-1)!}$$

If we wanted to know how many different ways we could group 3 students together _including replacing them_, we would have $56$ different groups (including one of "Washington, Washington, Washington")

**Q: How many possible five-card hands are there with a deck of 52 cards?**

In [5]:
#A: 

---
### **Counting Practice Problems (60 Minutes)**
<a name="counting_lab"></a>

<a id='practice'></a>

### Practice Problems with Counting, Permutations, and Combinations

---

A restaurant has 22 employees, including:

- Six chefs
- Five waiters
- Seven busboys 

The owner has requested a few things from you, the manager:

1. A private party is coming on Tuesday and wants to know how many teams of one chef and two waiters can be formed.
- Each busboy works just one day of the week. The owner wants to know how many different versions of the weekly busboy assignments are possible.
- He wants his favorite waiter to serve him and his wife for their anniversary on Sunday, along with two other waiters. How many teams of three waiters can serve him?

In [None]:
#A.1

In [None]:
#A.2

In [None]:
#A.3

**Question:** How many ways can you split 12 people into 3 teams of 4?

In [24]:
#A:

#### Distinguishable urns and indistinguishable balls

---

Now, our three urns are still labeled, but the balls are indistinguishable; there is no way to tell them apart in arrangements. We can only tell arrangements apart by how many balls are in each labeled urn.

Imagine our balls are represented by the character `o`. We also have "separator" character `|` that divides the balls between urns. So, if all balls were in the first urn, it would look like this:

    'oo||'
    
Or, if one ball was in the first and one was in the second:
    
    'o|o|'
    
Or, if one was in the first and one was in the third:

    'o||o'
    
To find all the distinct arrangements of the two indistinguishable balls among the three distinguishable urns, we need to find all the combinations of the separators in the string, or `comb(4, 2)`.

The general formula for the arrangements of $n$ indistinguishable items among $k$ distinguishable places is:

### $$ \text{combinations}(n+k-1, k-1)$$


**Q: Suppose you have a game in which you can toss a ball and it will land in one of five buckets.**

**You toss two balls and they each land in a bucket. If the balls are indistinguishable from each other, how many different outcomes (distributions of balls in buckets) are possible?**



    1.|BB |  2.|   |  3.|   |  4.|   |  5.|   |
    1.|   |  2.|BB |  3.|   |  4.|   |  5.|   |
    1.|   |  2.|   |  3.|BB |  4.|   |  5.|   |
    1.|   |  2.|   |  3.|   |  4.|BB |  5.|   |
    1.|   |  2.|   |  3.|   |  4.|   |  5.|BB |
    1.|B  |  2.|B  |  3.|   |  4.|   |  5.|   |
    1.|B  |  2.|   |  3.|B  |  4.|   |  5.|   |
    1.|B  |  2.|   |  3.|   |  4.|B  |  5.|   |
    1.|B  |  2.|   |  3.|   |  4.|   |  5.|B  |
    1.|   |  2.|B  |  3.|B  |  4.|   |  5.|   |
    1.|   |  2.|B  |  3.|   |  4.|B  |  5.|   |
    1.|   |  2.|B  |  3.|   |  4.|   |  5.|B  |
    1.|   |  2.|   |  3.|B  |  4.|B  |  5.|   |
    1.|   |  2.|   |  3.|B  |  4.|   |  5.|B  |
    1.|   |  2.|   |  3.|   |  4.|B  |  5.|B  |
    

In [None]:
#A:

**Q: How many solutions are there to this formula:**

$X_1 + X_2 + X_3 + X_4 = 15$

What about when the order of numbers matters?

The balls are indistinguishable (just because $X_1$ is three doesn't mean there are three versions, or, in other words, $1$ is indistinguishable from another $1$ in $1+1+1$).

In [8]:
#A: 

**Q: Hackerank: <a href=https://www.hackerrank.com/challenges/choose-and-calculate> Choose and Calculate </a> **

**Q: Hackerank: <a href=https://www.hackerrank.com/challenges/permutation-problem> Permutations </a> **

---
### **Probability**
<a name="probability"></a>

Probability is the mathematical framework for representing **uncertainty in the world.** It provides a means of quantifying uncertainty and axioms for deriving new uncertain statements. There are three possible sources of uncertainty (Deep Learning Book, Bengio et al.): 

1. **Stochasticity in the System Modeled**: Examples include behaviors of subatomic particles in quantum mechanics, card games where we assume shuffling has been truly random.

2. **Incomplete Observability**: Some systems can appear random if we can't observe everything. Examples are: Monty Hall Problem, Donald Trump's tweets. 

3. **Incomplete Modeling**: When we discard some information during modeling which results in uncertain predictions. Examples include: Uber Pick up locations, Rumba Robots. 

### Probability Axioms

---

**First Axiom**

Probabilities are non-negative, real numbers. The lowest probability possible is zero, and probability cannot be infinite.

**Second Axiom**

The probability of the sample space $S$ of events is one:

### $$ P(S) = 1 $$

**Third Axiom**

The probability of two _mutually exclusive_ events, here denoted $E_1$ and $E_2$, is equal to the sum of their individual probabilities:

### $$ P(E_1 \cup E_2) = P(E_1) + P(E_2) $$

This is also known as additivity. Hence, if $E_1$ and $E_2$ are complements, then: $$ 1 = P(S) = P(E_1) + P(E_2) = P(E) + P(E^c)$$. This leads to: 
$$ P(E^c) = 1 - P(E) $$.

This axiom can be understoon as say "the probability of either event E_1 or event E_2 occuring is the same as the probability of both occuring."

### Random Variables

A random variable (usually denoted as **X** ) is a variable that is assigned a random value. In other words, we don't know the value of the variable until we check its value. Also, evey time we check its value we expect it's value to randomly change from one value to another. 

There are **discrete random variables** which can take on finite or countably infinite number of states. A **continuous random variable** is associated with a real value. 

Let's go through some probability examples in order to better understand random variables. 

## 1. Classical probability

---

<div style="font-size:25px">
\\[P(X=x) = \frac{\text{n number of x outcomes}}{\text{total possible outcomes in event space S}}\\]
</div>

<br>
<div style="font-size:18px">


**Defned as:** The probability of x occuring is equal to the the numer of times that event x happens divided by the number of total possible outcomes. 
</div>

Classical probability is an assessment of **possible** outcomes of elementary events. Elementary events are assumed to be equally likely.

The set up to a classical probability problem is as follows:

**Experiment** any action or process that generates observations

**Sample Space** the set of all possible outcomes **S**

**Event** a subset of the sample space **X**, where **X** is a random variable

Let's go through some examples to better understand how to apply these ideas. 

> **Example:** Toss two coins up and what is the probability of getting head twice.

In [None]:
#A: compute it analytically

In [10]:
#A: simulate it using np.random.choice

def get_prob_for_two_coins(x, sims=100):
    '''Get probability that event x happens'''
    pass

>**Example**: Roll a fair die. What is the probability of rolling an even number?

In [12]:
#A: compute it analytically

#A: simulate it 
def get_prob_for_even_die_roll(x, sims=100):
    '''Get probability that event x happens'''
    pass

> **Example:** What is the probability of tossing 3 coins and getting 2 heads and 1 tails in any order? 

In [None]:
#A: compute it analytically

#A: simulate it 
def get_three_coin_tosses(x, sims=100):
    '''Get probability that event x happens'''
    pass

### Conditional and Joint Probability 
---


From the axioms outlined above, we can derive a variety of essential properties of probability. Many of these are outlined below.

**The probability of no event**

The probability of the empty set, denoted $\emptyset$, is zero.

### $$ P\left(\emptyset \right) = 0 $$

**The probability of A or B occuring (union)**

The probability of event $A$ or event $B$ occuring is equivalent to the sum of their individual probabilities, minus the intersection of their probabilities (the probability they both occur).

### $$ P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

**Conditional probability**

The probability of an event that's conditional on another event is written using a vertical bar between the two events. The probability of event $A$ occuring _given_ event $B$ occurs is calculated like so:

### $$ P(A | B) = \frac{P(A \cap B)}{P(B)} $$

Meaning the probability of both $A$ and $B$ occuring is divided by the probability that $B$ occurs at all.


<img src=https://i.ytimg.com/vi/H02B3aMNKzE/maxresdefault.jpg>

**Joint probability**

The joint probability of two events $A$ and $B$ is a reformulation of the above equation.

### $$ P(A \cap B) = P(A|B) \; P(B) $$

Verbally, if we want to know the probability that both $A$ and $B$ occur, we can multiply the probability that $B$ happens by the probability that $A$ happens given that $B$ happens.

<img src=https://image.slidesharecdn.com/chap005-111020122418-phpapp01/95/chap005-18-728.jpg?cb=1319113519>

**The law of total probability**

Lets say we want to know the probability of the event $B$ occuring across _all_ different events $A$. For example, lets say that we are a judge presiding over a murder trial. $B$ is the event that the suspect's wallet was found at the scene of the murder. We could have many hypotheses or possible scenarios in which the wallet is found at there, one being that the suspect was actually at the scene at the time of the murder.

These different events $A$ - our scenarios - are disjoint. The _total probability_ of $B$ is the probability across all of these scenarios that the wallet is found at the murder scene. In other words - regardless of the possible scenario $A$ - it's the probability that the wallet is found at the scene.

### $$ P(B) = \sum_{i=1}^n P(B \cap A_i) $$


### Contingency Table 
**Slides: <a href=http://ocw.metu.edu.tr/pluginfile.php/2277/mod_resource/content/0/ocw_iam530/2.Conditional%20Probability%20and%20Bayes%20Theorem.pdf>
Contingency Tables </a> **

### 2. Frequentist probability

---

<div style="font-size:25px">
\\[P(X=x | ~\text{some condition}) =  \frac{\text{# times x has occurred}}{\text{# independent and identical trials}}\\]
</div>

<br>
<div style="font-size:18px">
**Defined as:** The probability of x occuring (given that the codition is true) is equal to the the number of times that the event x happens divided by the total number of independent and identical outcomes. 
</div>


Unlike classical probability, frequentist probability is an EMPIRICAL definition. It is an objective statement desribing events that have occurred in the real world.

In order to assign a probability to an event we must:

1. Go out into the world and execute an experiment
2. Make measurements of some kind 
3. Collect data
4. Return to our computers and calculate the probability of some event happening based on the data that we can collected. 

You can think of frequentist probability as classical probability **plus empiricism**. However, unlike classical probability, we have to be very careful in our experimental design to **ensure that the following conditions are met**:

1. Events occur independent of each other
2. Trails/experiments used to measure events are all identical 
3. Randomization is built into the experiment - otherwise our random variable is not random. 

In order to better understand frequentist probability let's go through an example. 

### Male Heights Experiment

We are interested in finding the frequentist probability that an American male is 6 ft tall, so we conduct an experiment. We travel the United states and: 

1. Randomly approach one million males and measure their heights - this ensures that events (measing height) are independent of each other. 
2. Always measure people's heights using the same measuring tape and between the hours of 9 am and 5 pm - this ensures that the experiments are identical. 

Consider the following cases: 

#### Case 1.

1. The first male we measure is 6 feet tall. 
2. We go back to our computers, calculate the frequentist probability that a male is 6 ft tall, and get a probability of 100%

$$P(X= \text{6 ft}~ |~\text{ male}) =  \frac{\text{# of 6 ft   males measured}}{\text{total # of measurements}} = \frac{1}{1} = 100\%$$


Can we trust this result?

In [None]:
# switch cell to Markdown mode and write your answer here

#### Case 2.

1. We measure 100 males and none are 6 feet tall. 
2. We go back to our computers, calculate the frequentist probability that a male is 6 ft tall, and get a probability of 0%

$$P(X= \text{6 ft}~ |~\text{ male})=  \frac{\text{# of 6 ft males measured}}{\text{total # of measurements}} = \frac{0}{100} = 0\%$$


Can we trust this result?


In [None]:
# switch cell to Markdown mode and write your answer here

#### Case 3.

1. We measure 1,000,000 males and 100,000 are 6 feet tall. 
2. We go back to our computers, calculate the frequentist probability that a male is 6 ft tall, and get a probability of 10%

$$P(X= \text{6 ft}~ |~\text{ male}) =  \frac{\text{# of 6 ft males measured}}{\text{total # of measurements}} = \frac{100,000}{1,000,000} = 10\%$$


Can we trust this result?

In [14]:
# switch cell to Markdown mode and write your answer here

### Sample Size: how many samples is enough?


In [15]:
# switch cell to Markdown mode and write your answer here

In order to help cement the idea of why sample size is important let's run through a simulation. 

Image that we don't actually know what the probabiliy is for getting heads or tails, so we start fliping a coin. 

In [16]:
def get_prob_for_coin_flip(num_events):
    '''Get probability of getting heads on a coin flip for a given number of flips'''
    # create dataframe with empty cells (i.e. no values)
    df = pd.DataFrame(index = range(1,num_events), columns=['P(heads)','P(tails)']) 

    # initialize variables
    heads = 0
    tails = 0
    tries = 0

    while tries < num_events:

        tries += 1
        # randomly sample between values 1 and 2 to simulate a coin flip 
        coin = random.randint(1, 2)

        # if heads
        if coin == 1:
            heads += 1
            df.loc[tries, 'P(heads)'] = heads/tries
            df.loc[tries, 'P(tails)'] = tails/tries

        # if tails
        if coin == 2:
            tails += 1
            df.loc[tries, 'P(tails)'] = tails/tries
            df.loc[tries, 'P(heads)'] = heads/tries


    Num_Heads = heads
    Num_Tails = tails

    Prob_Heads = Num_Heads/tries
    Prob_Tails = Num_Tails/tries
    
    return df.values.T[1]

In [17]:
# num_events = 10000
# results = get_prob_for_coin_flip(num_events)

# plt.figure(figsize = (12,4))
# plt.plot(results)
# plt.title("Probability of getting heads vs. Coin flips")
# plt.ylabel("Probability of getting heads");
# plt.xlabel("Number of coin flips")
# #plt.xlim(0,1)
# plt.show()

### 3. Bayesian probability

---

<div style="font-size:25px">
\\[ P(A|B) = \frac{\text{P(B|A)}\cdot \text{P(A)}}{\text{P(B)}} \\]
</div>

<br>
<div style="font-size:18px">
**Definded as:** The posterior probability of event B happening, given that event A has occured, is equal to the likelihood of event A happening, given that event B has occured, all divided by the total probability that event A happening. 
</div>


Bayesian probability gives us an updated probability that something is true based on new obsverations. Unlike classical or frequentist probability, bayesian probability incorporates our prior belief $\text{P(A)}$ that something is true into the probability model. 

![](https://nflinjuryanalyticscom.files.wordpress.com/2016/12/bayes-rule-e1350930203949.png)

Bayesian probability gives us an updated probability that something is true based on new obsverations. Unlike classical or frequentist probability, bayesian probability incorporates our prior belief $\text{P(A)}$ that something is true into the probability model. 

Let's look at the reformation of Bayes Law in the colored image above. 

**Hypothesis: H** Our hypothesis that we want to determine a probability for how true it is

**Evidence: e** The data/evidence that we have collected in order to help determine how likely our hypothesis is true

**Prior Probability: P(H)** The probability that our hypothesis is true before we make any measurements

**Likelihood Probability: P(e|H)** The probability that our hypothesis is true based on the data that we have collected (i.e. if the hypothesis was true, how likely is it that we would see the data what we see)

**Posterior Probability: P(H|e)** The updated probability that our hypothesis is true given the probability we initially assigned to it and the probability that we would see the evidence that we have collected it our hypothesis was true. 

**Marginal Probability: P(e)** Probability that the evidence is true given under all cases (i.e. if the hypothesis is true and if it is false). Mathematically, this is used as a normalization term. 

### Male Height Probelm Revisted

In the male height problem, we assume that the condition that all height measurements were done for males (and they were). But what if we are forced to design an experiment in which we don't actually know if the person whos height we are measuring is male or female and our goal was to determine the probability that this person was a male based on the measurement?

Using frequentist probability, we can't actually address that problem because frequentist have to assume that a hypothesis is true, they don't have a way of incorporating uncertainty into their model -- but bayesian do!


In the male height problem we were calculating the likelihood that we get the data that we get (measuremetns are 6 feet) assuming that the hypothesis is true (that all measurments are done on males) 

$$ \text{Likelihood}~~ P(X= \text{6 ft}~ |~\text{ male})$$

In the case that we don't know if the person being measured is male or female, bayesian can quantify their uncertainty with the prior probability:

$$\text{Prior}~~P(male) = P(\text{person is a male}) = 0.5$$

Leading to a probability that takes uncertainty into account and allowing us to address this problem. 

$$ P(\text{male}~|~6 \text{ft})   = \frac{P(X= \text{6 ft}~ |~\text{ male})~~P(\text{male}) } {P(~X= \text{6 ft}) }  $$


### Ex. One 

A family has two children. Given that one of the children is a boy, what is the probability that both children are boys?



### Ex. Two

Bag one contains 4 white and 6 black balls while another Bag two contains 4 white and 3 black balls. One ball is drawn at random from one of the bags and it is found to be black. Find the probability that it was drawn from Bag one.

### Ex. Three
A disease test is advertised as being 99% accurate: if you have the disease, you will test positive 99% of the time, and if you don't have the disease, you will test negative 99% of the time. If 1% of all people have this disease and you test positive, what is the probability that you actually have the disease?

### Ex. Four

You go to see the doctor about an ingrowing toenail. The doctor selects you at random to have
a blood test for swine flu, which for the purposes of this exercise we will say is currently suspected
to affect 1 in 10,000 people in Australia. The test is 99% accurate, in the sense that the probability
of a false positive is 1%. The probability of a false negative is zero. You test positive. What is the
new probability that you have swine flu?

### Ex. 4 (continued)

Now imagine that you went to a friend’s wedding in Mexico recently, and (for the purposes of this
exercise) it is know that 1 in 200 people who visited Mexico recently come back with swine flu.
Given the same test result as above, what should your revised estimate be for the probability you
have the disease?

## Expected Values

---

>The expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. -Wikipedia


$$ E[X] = \sum_{i = 1}^{n}{x_{i}p_{i}} = \sum_{x} x\cdot P(X=x) $$


$x\cdot P(X=x)$ think of this expression as the event **x** times the probability of that event happening $P(X=x) = p_{x}$

You might also see the expected value in the following form:

$$ E[X] = \sum_{i = 1}^{n}{x_{i}p_{i}} = x_{1}p_{1} + x_{2}p_{2} + \cdot \cdot \cdot+ x_{n}p_{n}  $$


The final expression is the **weighted average** of the random variable **X**. The weights are the probabilities for each event, **x**. We say that it is weighted because the influnce that each outcome has is dependent on it probability. 

#### Ex. Weighted Average 

Let's say we have a fixed die that has a weight inserted into it so that the probability of rolling a 1 is 50% and the probability of rolling any other number is 10%. 

**Question: **What is the average value we expect to get from this fixed die?

This questions can be rephrase as saying, if I roll the die an infinit number of times, what is the average value?

$$ E[X] = \sum_{i = 1}^{n}{x_{i}p_{i}} = 1 (0.5) + 2 (0.10) + 3 (0.10) + 4 (0.10) + 5 (0.10) + 6 (0.10) = 2.5$$


In [27]:
def expected_value_die(num_rolls):
    '''Get expected value for a fixed die'''
    
    sample_space = [1,2,3,4,5,6]
    weights = [0.5,0.1,0.1,0.1,0.1,0.1]
    events = np.zeros(num_rolls)
    expected_value = []
    
    roll = 1
    for i, _ in  enumerate(xrange(num_rolls)):
        event = np.random.choice(sample_space, size=1,p=weights)[0]
        events[i] = event
        expected_value.append(np.sum(events)/roll)
        roll += 1
        

    return expected_value

In [28]:
# num_rolls = 10
# expected_val_fixed = expected_value_die(num_rolls)
# expected_val_fixed[-1]

In [29]:
# plt.figure(figsize=(12,4))
# plt.title("Average Value of a Fixed Die vs. Number of Rolls")
# plt.xlabel("Number of Rolls")
# plt.ylabel("Average Dic Value")
# plt.plot(expected_val_fixed);

 ### Ex. Simple Average 
 
A simple average is the case in which all events are equally likely, such as fliping a fair coin or rolling a fair die. Since all probabilities are equal (all weights are the same) that means that we can rewrite the formula for the expecte value. 

$$ E[X] = \sum_{i = 1}^{n}{x_{i}p_{i}} = x_{1}p_{1} + x_{2}p_{2} + \cdot \cdot \cdot+ x_{n}p_{n}  $$
 
 
$$ E[X] = \sum_{i = 1}^{n}{x_{i}p_{i}} =\frac{ x_{1}p_{1} + x_{2}p_{2} + \cdot \cdot \cdot+ x_{n}p_{n}}{1}  $$


**Recall: ** $p_{1} + x_{2} + \cdot \cdot \cdot+ p_{n} = n\cdot p = 1$

$$ E[X] =\frac{ x_{1}p_{1} + x_{2}p_{2} + \cdot \cdot \cdot+ x_{n}p_{n}}{p_{1} + x_{2} + \cdot \cdot \cdot+ p_{n}}  $$

**Factor out all the p's**

$$ E[X] = \frac{p}{n\cdot p}( x_{1} + x_{2} + \cdot \cdot \cdot+ x_{n})   $$

$$ E[X] = \frac{1}{n}\sum_{i = 1}^{n}{x_{i}}  $$

**Note: ** The expectation value for events with equal probability is called the mean and denoted by $\mu$. (Yes, the same mean that we are all familiar with). 


$$ E[X] = \mu = \frac{1}{n}\sum_{i = 1}^{n}{x_{i}}  $$


The average value of rolling a fair die an infinite number of times: 

$$ E[X] = \frac{1}{n}\sum_{i = 1}^{n}{x_{i}} = 1 \frac{1}{6} + 2 \frac{1}{6} + 3 \frac{1}{6} + 4 \frac{1}{6} + 5 \frac{1}{6} + 6 \frac{1}{6} = 3.5$$

In [30]:
def expected_value_die(num_rolls):
    '''Get expected value for a fair die'''
    
    # define sample space by include all possible outcomes
    sample_space = [1, 2, 3, 4, 5, 6]
    
    events = np.zeros(num_rolls)
    expected_value = []
    
    roll = 1
    for i, _ in  enumerate(xrange(num_rolls)):
        event = random.randint(1,6)
        events[i] = event
        expected_value.append(np.sum(events)/roll)
        roll += 1
        

    return expected_value

In [31]:
# num_rolls = 10000
# expected_val = expected_value_die(num_rolls)
# expected_val[-1]

In [32]:
# plt.figure(figsize=(12,4))
# plt.title("Average Value of Fair Die vs. Number of Rolls")
# plt.xlabel("Number of Rolls")
# plt.ylabel("Average Dic Value")
# plt.plot(expected_val);

### Variance, Standard Deviation
---

In probability theory, variance is a measure of **how far a set of numbers is spread out.** It describes how much a random variable differs from the expected value. The standard deviation is the square root of variance. 

<img src=https://wikimedia.org/api/rest_v1/media/math/render/svg/67c38600b240e9bf9479466f5f362792e4fc4fb8>

<img src=https://image.slidesharecdn.com/randomvar-120913182756-phpapp01/95/random-variables-38-728.jpg?cb=1347561142>

### Lab: Probability 

<a name="probability_lab"></a>


** Question 1:** In any 15-minute interval, there is a 20% probability that you will see at least one shooting star. What is the probability that you see at least one shooting star in the period of an hour?

In [20]:
##A:

**Question 2:** A certain couple tells you that they have two children, at least one of which is a girl. What is the probability that they have two girls?

In [21]:
##A:

**Question 3:** What is the probability that you are dealt two cards that are both  aces from a 52-card deck?

In [None]:
##A:

**Question 4:** What is the probability of getting two queens and three kings in a five-card hand



In [None]:
##A:

**Question 5:** For 10 coin-flips, print out the probability of getting 0 heads through 10 heads. Also, print out the sum of the probabilities.



In [23]:
## A:

**Question 6:** You have two coins, one of which is fair and comes up heads with a probability 1/2, and the other which is biased and comes up heads with probability 3/4. You randomly pick coin and flip it twice, and get heads both times. What is the probability that you
picked the fair coin?

In [25]:
##A:

**Question 7:** You have a 0.1% chance of picking up a coin with both heads, and a 99.9% chance that you pick up a fair coin. You flip your coin and it comes up heads 10 times. What’s the chance that you picked up the fair coin, given the information that you observed?

In [26]:
##A:

**Question 8:** What’s the expected number of coin flips until you get two heads in a row? What’s the expected number of coin flips until you get two tails in a row?

In [33]:
##A:

**Question 9:** <a href=https://www.hackerrank.com/challenges/kevin-and-expected-value> Kevin and the Expected Value </a> 

** Question 10 [Challenge, Skip If Difficult] ** <a href=https://www.hackerrank.com/challenges/mathematical-expectation> Mathematical Expectation </a>

In [34]:
##A:

**Question 11:** Brilliant Qustions 
##### Classical probability and expectation values 

The exercises for classical probability and expectation values will be completed at Brilliant.org.

1. Create a free account at [Brilliant.org](brilliant.org/)
2. Navigate to the Math for Quantitative Finance section
3. Complete the Probability and Expected Value modules 


**Question 12:** Monty Hall Problem
- There are 3 doors, behind which are two goats and a car.
- You pick a door (call it door A). You’re hoping for the car of course.
- Monty Hall, the game show host, examines the other doors (B & C) and always opens one of them with a goat (Both doors might have goats; he’ll randomly pick one to open)

In [35]:
## Should you switch? 
## 12.1 Use Bayes Theorem to say if you should switch or not
## 12.2 Simulate the answer by writing some code 

**Question 13:** Birthday Problem

There are 30 people in a room ... what is the chance that any two of them celebrate their birthday on the same day? Assume 365 days in a year.


How many people do you need in a room to get the probability of two people sharing the birthday to be greater than 50%?