# Topic 11 - Intro to Combinatorics & Probability 


## Learning Objectives


- Briefly discuss Sets and set notation
- Learn to recognize Permutations vs Combinations
- Review what is probability
- Activity: Playlist Probabilities
- Dependent vs Independent Probability AKA when to use conditional probability

# Sets

Sets are collections of elements, and the language and notation used to talk about sets are used in a lot of other placs, so it's nice to go ahead and lay the groundwork here.

## Set notation
<img src="images/setnotation.jpg" alt="set notation, found from https://slideplayer.com/slide/10502152/" width=800>

[Image Source](https://slideplayer.com/slide/10502152/)

More on Sets: https://www.mathsisfun.com/sets/sets-introduction.html

In [1]:
# empty={1,2}
# set()
# type(empty)

### But Now, in Python:

 

| Method        |	Equivalent |	Result |
| ------                    | ------       | ------    |
| s.issubset(t)             |	s <= t     | test whether every element in s is in t
| s.issuperset(t)           |	s >= t     | test whether every element in t is in s
| s.union(t)                |	s $\mid$ t | new set with elements from both s and t
| s.intersection(t)         |	s & t      | new set with elements common to s and t
| s.difference(t)           |	s - t 	   | new set with elements in s but not in t
| s.symmetric_difference(t) |	s ^ t      | new set with elements in either s or t but not both

# Permutations vs. Combinations

Now let's talk about how you can take elements from different collections and group them.

## What's the difference between a permutation and a combination?

## Permutations

- In a **permutation**, order matters.
    - If you have a race, it matters who arrives in first, or second, or third place - there's a difference in the ordering of the group!

> ***Permutation: How many ways to arrange n elements?***
    $$\large P(n) = n!$$
___
> ***Permutations with replacement: How many ways to select $j$ elements out of a pool of $n$ objects?***
$$ \large {P}_{j}^{n} = n^j $$
- where $n$ = total number of elements
- $j$ = number of positions to fill   


___
> ***Permutations (without replacement): How many ways to select $k$ elements out of a pool of $n$ objects? (AKA permutations of a subset)***
$$ \large P_{k}^{n}= \dfrac{n!}{(n-k)!}$$ <br>this is known as the **$k$-permutation of $n$**
    
    


> ***Permutations with repetition: Permutation where there are some elements that may appear multiple times.***<br>
The general formula can be written as:
$$\dfrac{n!}{n_1!n_2!\ldots n_j!}$$
where $n_j$ stands for identical objects of type $j$

- E.G. looking at the word TENNESSEE by itself, you can swap the 3rd and the 4th letter and have the same word. So the total number is less than $9!$.
    - The solution is to divide $9!$ by the factorials for each letter that is repeated!
    - The answer here is then (9 letters, 4 x E, 2 x N, 2 x S)
    - $\dfrac{9!}{4!2!2!} = 3780$



## Combinations

- In a **combination**, you only care about which items are members of the set. 
    - For example, if you're creating groups of students to work on a project, the order in which you add their names to the group doesn't really matter - it's the group members itself, not any order, that you care about.
    
> ***Combination: How many ways can we create a subset $k$ out of $n$ objects, when order is not important?*** 
$$\large C_{k}^{n} = \displaystyle\binom{n}{k} = \dfrac{P_{k}^{n}}{k!}=\dfrac{ \dfrac{n!}{(n-k)!}}{k!} = \dfrac{n!}{(n-k)!k!}$$


- Simplified combination equation
$$\large C_{k}^{n} = \dfrac{n!}{(n-k)!k!}$$

## Practice Questions

In [2]:
import itertools
from math import factorial

### Q1: How many possible codes are there for a standard padlock?

> Hint: (there are 40 numbers on a padlock and use 3 numbers.)

This is an example of... 

 - Permutation or Combination ?

In [3]:
## A:
40**3

64000

#### A1:

For the first number: 40 choices. For the second number, still 40 choices: $40\cdot40=40^2$. Again, for 3rd number, still 40 choices: $40\cdot40\cdot40=40^3$

This is an example of... 

 - **Permutation** or Combination ?

In [4]:
# A:
40**3

64000

### Q2: How many unique 3 topping pizzas can you make from the following ingredients:

- Mushrooms
- Pepperoni
- Onion
- Peppers
- Ham
- Pineapple
- Sausage
- Olives
    
> Side note: which is the worst?

This is an example of... 

 - Permutation or Combination?

$$\large C_{k}^{n} = \dfrac{n!}{(n-k)!k!}$$

In [5]:
## A:
factorial(8)/(factorial(8-3)*factorial(3))

56.0

#### A: Q2

This is an example of... 

 - Permutation or **Combination**?

In [6]:
## Equation Answer:
factorial(8)/ (factorial(5)*factorial(3))

56.0

In [7]:
## Code answer
toppings = ["Mushrooms","Pepperoni","Onion",
            "Peppers","Ham","Pineapple","Sausage","Olives"]
three_topping_pizzas = list(itertools.combinations(toppings, 3))
three_topping_pizzas

[('Mushrooms', 'Pepperoni', 'Onion'),
 ('Mushrooms', 'Pepperoni', 'Peppers'),
 ('Mushrooms', 'Pepperoni', 'Ham'),
 ('Mushrooms', 'Pepperoni', 'Pineapple'),
 ('Mushrooms', 'Pepperoni', 'Sausage'),
 ('Mushrooms', 'Pepperoni', 'Olives'),
 ('Mushrooms', 'Onion', 'Peppers'),
 ('Mushrooms', 'Onion', 'Ham'),
 ('Mushrooms', 'Onion', 'Pineapple'),
 ('Mushrooms', 'Onion', 'Sausage'),
 ('Mushrooms', 'Onion', 'Olives'),
 ('Mushrooms', 'Peppers', 'Ham'),
 ('Mushrooms', 'Peppers', 'Pineapple'),
 ('Mushrooms', 'Peppers', 'Sausage'),
 ('Mushrooms', 'Peppers', 'Olives'),
 ('Mushrooms', 'Ham', 'Pineapple'),
 ('Mushrooms', 'Ham', 'Sausage'),
 ('Mushrooms', 'Ham', 'Olives'),
 ('Mushrooms', 'Pineapple', 'Sausage'),
 ('Mushrooms', 'Pineapple', 'Olives'),
 ('Mushrooms', 'Sausage', 'Olives'),
 ('Pepperoni', 'Onion', 'Peppers'),
 ('Pepperoni', 'Onion', 'Ham'),
 ('Pepperoni', 'Onion', 'Pineapple'),
 ('Pepperoni', 'Onion', 'Sausage'),
 ('Pepperoni', 'Onion', 'Olives'),
 ('Pepperoni', 'Peppers', 'Ham'),
 ('Pepper

## What is probability?

> **Probability is the likelihood of a specific outcome/event occuring out of all possible outcomes, expressed as a fraction between 0 and 1.**

Perhaps more importantly:

> **"Probabilities do not tell us what will happen for sure; they tell us what is _likely to happen_ and what is _less likely to happen_."**
>
> -- _Naked Statistics_, by Charles Wheelan, p. 72

<!---
Example Probability Qs:
- How likely is it to end up with heads when flipping a coin once? (the answer here is 50% - not very surprising)

- How likely is it to end up with exactly 2 x heads and 3 x tails when flipping a coin 5 times?

- How likely is it to throw tails first, then heads, then tails, then heads, then tails when flipping a coin 5 times?

- If you throw 5 dice, what is the probability of throwing a ["full house"](http://grail.sourceforge.net/demo/yahtzee/rules.html)?

- What is the probability of drawing 2 consecutive aces from a standard deck of cards?

> But how do we calculate it? ..._to be continued_...

--->

In general, you can think of dividing the outcome you're exploring by all possible outcomes:

$$ P(Event) = \frac{|Event|}{|Sample\ Space|} $$

##### Sample space:

$$S = \{ 1,2,3,4,5,6\}$$ 
being the possible outcomes when throwing a dice.
- Sample space =  $\Omega$ 

##### Event space:

-   The **event space** is a subset of the sample space. It is the **desired outcome** of the experiment.
$$E \subseteq S$$
-   Example:
    -   Throwing an odd number would lead to an event space $$E = \{ 1,3,5\}$$.

### Probability of an Event

$$ P(E) = \frac{|E|}{|S|} $$
probability is the number of possible preferred outcomes over the sample space / all outcomes

### Probability axioms

1.  Positivity : 
    - Prob is always $0 <= P(E) <=1$


2.  Probability of a certain/guaranteed event:
    - $P(S)=1$


3.  Additivity Union of 2 exclusive sets = sum prob of individual events happening
  
    - If $A\cap B = \emptyset $, then $P(A\cup B) = P(A) + P(B)$

#### Addition law of probability 

-   Prob of union of A and B is individual P minus intersection

$$ \large P(A\cup B) = P(A) + P(B) - P(A \cap B)$$

# 🕹 Activity: Dinner Party Playlist Permutations & Probabilities

- We are constructing a Spotify playlist for a dinner party that we are planning. 

- We asked our attendees to each provide a handful of songs they would like to be played at the dinner party.

- Each guest has provided their requests in a csv.

> - Load in and join the csvs into one df, keeping track of who requested what. 

In [8]:
import pandas as pd
import numpy as np
from math import factorial
import os,glob

## Our Guests requests
datafolder = "playlist_requests/"
request_files = glob.glob(datafolder+"*.csv")
request_files

['playlist_requests/joe_recs.csv',
 'playlist_requests/james_recs.csv',
 'playlist_requests/anne_recs.csv',
 'playlist_requests/carla_recs.csv',
 'playlist_requests/john_recs.csv',
 'playlist_requests/samantha_recs.csv']

In [9]:
## Make a playlist dictionary
playlists = {}

for file in request_files:
    ## Slice out the requester's name
    name = file.split('/')[-1].replace('_recs.csv','')
    
    ## load the csv file 
    temp_df = pd.read_csv(file)
    
    ## save the person's name in a new "Requested By" column
    temp_df['Requested By'] = name.title()
    
    ## save the csv as a df to the dict, using name as the key
    playlists[name] = temp_df
    del temp_df

    
playlists.keys()

dict_keys(['joe', 'james', 'anne', 'carla', 'john', 'samantha'])

In [10]:
## Loop through the dict and display the requester's name and their df
for name,playlist in playlists.items():
    
    ## Can use .style.set_caption(name) or just print name before display(df)
    display(playlist.style.set_caption(f"{name.title()}'s Requests:"))

Unnamed: 0,artist,track,Requested By
0,Green Day,Time of your Life,Joe
1,B-52s,Rock Lobster,Joe
2,Lady GaGa,Poker Face,Joe
3,John Lennon,Imagine,Joe


Unnamed: 0,artist,track,Requested By
0,Eve 6,Here's to the Night,James
1,Neutral Milk Hotel,Into the Aeroplane Over the Sea,James
2,Rilo Kiley,With Arms Outstretched,James
3,Red Hot Chili Peppers,Otherside,James


Unnamed: 0,artist,track,Requested By
0,Smashing Pumpkins,"Tonight, Tonight",Anne
1,Black Eyed Peas,Let's Get it Started,Anne
2,Green Day,Time of your Life,Anne


Unnamed: 0,artist,track,Requested By
0,Cartman (South Park),Poker Face,Carla
1,Nicki Minaj,Right By My Side,Carla
2,Kelly Clarkson,Since You've Been Gone,Carla
3,Nicki Minaj,Marilyn Monroe,Carla
4,Kelly Clarkson,Never Again,Carla
5,Green Day,Minority,Carla


Unnamed: 0,artist,track,Requested By
0,Black Eyed Peas,Let's Get it Started,John
1,Lady GaGa,Poker Face,John
2,Lady GaGa,Bad Romance,John
3,Lady GaGa,Just Dance,John


Unnamed: 0,artist,track,Requested By
0,Black Eyed Peas,Let's Get it Started,Samantha
1,Panic at the Disco,Hallelujah,Samantha
2,Adele,Set Fire to the Rain,Samantha


> - We want to make sure the soundscape at our party is representative of the group, so let's take everyones' recommendations (even if the same song has been recommended by someone else). 


In [11]:
## Create 1 df for all rescs (and reset index)
df = pd.concat(playlists).reset_index(drop=True)
df

Unnamed: 0,artist,track,Requested By
0,Green Day,Time of your Life,Joe
1,B-52s,Rock Lobster,Joe
2,Lady GaGa,Poker Face,Joe
3,John Lennon,Imagine,Joe
4,Eve 6,Here's to the Night,James
5,Neutral Milk Hotel,Into the Aeroplane Over the Sea,James
6,Rilo Kiley,With Arms Outstretched,James
7,Red Hot Chili Peppers,Otherside,James
8,Smashing Pumpkins,"Tonight, Tonight",Anne
9,Black Eyed Peas,Let's Get it Started,Anne


## For the following questions:
> **Assume we just accept everyone's suggestions allowing duplicate songs and play on shuffle, with repeat set to None/False**

### Q1: What is the probability of the next song being by Lady GaGa?

$$ P(E) = \frac{|E|}{|S|} $$


In [12]:
## Use value_counts to make Sample Spaces for Tracks and artists
sample_space = df['artist'].value_counts()
sample_space

Lady GaGa                4
Green Day                3
Black Eyed Peas          3
Kelly Clarkson           2
Nicki Minaj              2
Adele                    1
Eve 6                    1
B-52s                    1
Cartman (South Park)     1
Smashing Pumpkins        1
John Lennon              1
Red Hot Chili Peppers    1
Rilo Kiley               1
Panic at the Disco       1
Neutral Milk Hotel       1
Name: artist, dtype: int64

In [15]:
## What would be the number of events that meet our criterion? |E|
E = sample_space.loc['Lady GaGa']
E

4

In [16]:
## P_lady_gaga 
P_lady_gaga = E/len(df)
P_lady_gaga

0.16666666666666666

### Q2: What is the probability of the next song being "Time of Your Life"?

In [17]:
## Making Sample Spaces for Tracks and artists
sample_space = df['track'].value_counts()
display(sample_space)

Poker Face                         3
Let's Get it Started               3
Time of your Life                  2
Bad Romance                        1
Here's to the Night                1
Hallelujah                         1
Tonight, Tonight                   1
Marilyn Monroe                     1
With Arms Outstretched             1
Rock Lobster                       1
Never Again                        1
Into the Aeroplane Over the Sea    1
Since You've Been Gone             1
Just Dance                         1
Right By My Side                   1
Otherside                          1
Imagine                            1
Minority                           1
Set Fire to the Rain               1
Name: track, dtype: int64

In [19]:
## What is the event space?
E = sample_space.loc['Time of your Life']
E

2

In [None]:
## What about the sample_space?
# S = None
# S

In [20]:
P_time_of_your_life = E/sample_space.sum()
P_time_of_your_life

0.08333333333333333

### Q3: what is the probability of hearing a song by Lady GaGa or Green Day?


In [21]:
## Get the event space
sample_space = df['artist'].value_counts()
sample_space

Lady GaGa                4
Green Day                3
Black Eyed Peas          3
Kelly Clarkson           2
Nicki Minaj              2
Adele                    1
Eve 6                    1
B-52s                    1
Cartman (South Park)     1
Smashing Pumpkins        1
John Lennon              1
Red Hot Chili Peppers    1
Rilo Kiley               1
Panic at the Disco       1
Neutral Milk Hotel       1
Name: artist, dtype: int64

In [23]:
## Get the Event Space
E = sample_space.loc['Green Day']  + sample_space.loc['Lady GaGa']
E

7

In [24]:
## Get the sample sapce
S = sample_space.sum()
S

24

In [25]:
P_lady_gaga_or_greenday = E/S
P_lady_gaga_or_greenday

0.2916666666666667

### Q1: How many different ways could we build a playlist using everyone's recommendations (without shuffle, no looping)?

- Q: Combination or permutation? Which formula?
    - A:
   

In [30]:
tracks = df['track'].value_counts()
tracks[tracks>1]

Poker Face              3
Let's Get it Started    3
Time of your Life       2
Name: track, dtype: int64

In [38]:
from math import factorial
ans = factorial(len(df))

f"{ans:,}"
len(str(ans))

24

In [35]:
ans = factorial(len(df))/(factorial(3)*factorial(3)*factorial(2))
f"{ans:,}"

'8.617338912961659e+21'

#### A:

What formula would we use?
    - A:   $$\large P(n) = n!$$

### Q2: What if we limit the playlist to only 10 songs, without replacement? How many possible playlists?

- Q: Combination or permutation? Which formula?
    - A:

In [39]:
ans/factorial(10)

1.709789466857472e+17

#### A:

$$ \large P_{k}^{n}= \dfrac{n!}{(n-k)!}$$ 

In [41]:
df['track'].unique()

array(['Time of your Life', 'Rock Lobster', 'Poker Face', 'Imagine',
       "Here's to the Night", 'Into the Aeroplane Over the Sea',
       'With Arms Outstretched', 'Otherside', 'Tonight, Tonight',
       "Let's Get it Started", 'Right By My Side',
       "Since You've Been Gone", 'Marilyn Monroe', 'Never Again',
       'Minority', 'Bad Romance', 'Just Dance', 'Hallelujah',
       'Set Fire to the Rain'], dtype=object)

In [48]:
n = len(df)
k =  10

n,k

(24, 10)

In [49]:
"{:,}".format((factorial(n)/factorial(n-k))/(factorial(3)*factorial(3)*factorial(2)))

'98,847,302,400.0'

### Q3: what if we limit the playlist to 10 songs, WITH replacement?

- Q: Combination or permutation? Which formula?

#### A:

$$ \large {P}_{j}^{n} = n^j $$

### Q4: what if we select 10 songs out of the total number of suggestions and allow for repitition?

- Q: Combination or permutation? Which formula?

#### A:

$$\large C_{k}^{n} = \displaystyle\binom{n}{k} = \dfrac{P_{k}^{n}}{k!}=\dfrac{ \dfrac{n!}{(n-k)!}}{k!} = \dfrac{n!}{(n-k)!k!}$$

## So....  We realize we need to relax and not worry about the song-order. That's what Shuffle is for, right? 😅

### Q5: How many playlists can we produce for an 8-track playlist from the unique suggested songs (10)?

- Q: Combination or permutation? Which fomrula?
    - A:

#### A:

$$\large C_{k}^{n} = \displaystyle\binom{n}{k} = \dfrac{P_{k}^{n}}{k!}=\dfrac{ \dfrac{n!}{(n-k)!}}{k!} = \dfrac{n!}{(n-k)!k!}$$

In [50]:
n = 10
k = 8
factorial(n)/(factorial(n-k)*factorial(k))

45.0

# CONDITIONAL PROBABILITY

**Conditional probability emerges when the outcome a trial may influence the results of the upcoming trials (when we have dependent events)**


<img src="https://raw.githubusercontent.com/jirvingphd/dsc-conditional-probability-online-ds-ft-100719/master/images/Image_71_TreeDiag.png" width = 500>

$$ P (A \mid B) = \dfrac{P(A \cap B)}{P(B)}$$

$P(A|B)$, is the probability A **given** that $B$ has just happened. 

## Types of Events

#### Indepdent Events

**Events $A$ and $B$ are independent when the occurrence of $A$ has no effect on whether $B$ will occur (or not).**
<!-- 
- A and B are independent if:
    - $P(A \cap B) = P(A)\cdot P(B)$

 <img src="https://raw.githubusercontent.com/jirvingphd/dsc-conditional-probability-online-ds-ft-100719/master/images/Image_67_independent.png" width=30%>

- Probability of A or B occurring:
    - $P (A \cup B) = P(A) + P(B) - P(A \cap B)$

 -->


#### Disjoint Events




**Events $A$ and $B$ are disjoint if $A$ occurring means that $B$ cannot occur.**

Disjoint events are **mutually exclusive**. $P (A \cap B)$ is **empty**.
<!-- 
<img src="https://raw.githubusercontent.com/jirvingphd/dsc-conditional-probability-online-ds-ft-100719/master/images/Image_68Disjoint.png" width=30%>

 -->

#### Dependent Events

**Events $A$ and $B$ are dependent when the occurrence of $A$ somehow has an effect on whether $B$ will occur (or not).**

<img src="https://raw.githubusercontent.com/learn-co-students/dsc-conditional-probability-onl01-dtsc-ft-070620/master/images/Image_69_Marb.png" width=50%>




# 🕹 Activity Part 2: Dinner Party Playlist - Conditional Probabilities

In [51]:
from math import factorial

### Q: What is the probability of hearing "Poker Face"?

In [52]:
## Get sample space
sample_space = df['track'].value_counts()
sample_space

Poker Face                         3
Let's Get it Started               3
Time of your Life                  2
Bad Romance                        1
Here's to the Night                1
Hallelujah                         1
Tonight, Tonight                   1
Marilyn Monroe                     1
With Arms Outstretched             1
Rock Lobster                       1
Never Again                        1
Into the Aeroplane Over the Sea    1
Since You've Been Gone             1
Just Dance                         1
Right By My Side                   1
Otherside                          1
Imagine                            1
Minority                           1
Set Fire to the Rain               1
Name: track, dtype: int64

In [55]:
## Get event space
E = sample_space.loc["Poker Face"]
E

3

In [56]:
## Get Event Space
P_lets_get_it = E/sample_space.sum()
P_lets_get_it

0.125

### Q: what is the probability that the song playing is "Poker Face", given that the song is by Lady GaGa?

- **What would be the formula to calculate $P(\text{PokerFace}|\text{LadyGaga})$  ?**

A:


In [60]:
sample_space = df.groupby('artist').get_group('Lady GaGa')['track'].value_counts()
sample_space

Poker Face     2
Bad Romance    1
Just Dance     1
Name: track, dtype: int64

In [None]:
E =  None
E

In [None]:
S = None
S

In [None]:
# E/S

### Q: What is the  probability that the song playing is by Lady GaGa, given that it is titled "Poker Face"?


In [None]:
poker_face_space = None
sample_space = None
sample_space

In [None]:
E = None
S = None

In [None]:
# E/S

# The Law of Total Probability


<img src="https://raw.githubusercontent.com/jirvingphd/dsc-law-of-total-probability-online-ds-ft-100719/master/images/Image_55_TotProb.png" width=50%>

$$\large P(B)= \sum_i P(B \cap A_i)= \sum_i P(B \mid A_i)P(A_i)$$


- This law allows us to calculate $P(B)$ from partial/conditional probabilitie of subsets ($A_n$).
- Requires that the different $A$'s that make up sample space $S$ be disjointed events.


S $A_1, A_2, \dots, A_n$ partition sample space $S$ into disjoint regions that sum up to $S$.




<img src="https://raw.githubusercontent.com/learn-co-students/dsc-law-of-total-probability-onl01-dtsc-ft-070620/master/images/Image_56_vent.png" width=50%>





## 🕹 Activity Pt 4: Law of Total House Party Playlists

- We've decided to be a little more adventurous and turn our dinner party into a larger house party.
    - The House party spread across 4 rooms that we assume people will spread their time evenly across the various rooms:
        - living room
        - basement
        - back patio
        - kitchen
        
- We have separate play lists playing at each location that were constructed with our dinner party recommendations.


#### OUR HOUSE PARTY & LAW OF TOTAL PROB
- Our House Party = space $S$
- The 4 rooms in the house are A1,A2,A3,A4
- B represents the probability of hearing a specific song or artist as you wander the house.

<img src="https://raw.githubusercontent.com/jirvingphd/dsc-law-of-total-probability-online-ds-ft-100719/master/images/Image_55_TotProb.png" width=50%>

$$P(B)= \sum_i P(B \cap A_i)= \sum_i P(B \mid A_i)P(A_i)$$




In [None]:
import os
folder = "playlist_requests/house_party/"
os.makedirs(folder,exist_ok=True)

house_party = dict(living_room = df.sample(12,random_state=12).reset_index(drop=True).copy(),
                   basement = df.sample(10,random_state=321).reset_index(drop=True).copy(), 
                   back_patio = df.sample(9,random_state=42).reset_index(drop=True).copy(),
                  kitchen=df.sample(8,random_state=3210).reset_index(drop=True).copy())

## Save for later

for room,room_df in house_party.items():
    room_df.to_csv(f"{folder}/{room}.csv",index=False)
    
## Preview
for k,df_ in house_party.items():
    df_['Room'] = k 
    display(df_.style.set_caption(f"Playlist for {k}"))
    

### Q1: What is the probability of hearing a Green Day song at the house party at any given moment?

####  To Calculate $P(GD)$ for a Room

$$ P(\text{Green Day})=\sum_i P(\text{Green Day} \mid \text{Room}_i)P(\text{Room}_i)$$

- Q: **With our 4 rooms, what would our equation look like?**
    - What is the probability of being in each room?

##### A:

$$P(\text{Green Day})= P(GD|Room1)\times \frac{1}{4} + P(GD|Room2)\times \frac{1}{4} + P(GD|Room3)\times \frac{1}{4} + P(GD|Room4)\times \frac{1}{4} $$

In [None]:
## Make a dictionary of prob of being in each room
room_probs = {'living_room':0.25, 
              'basement':0.25,
              'back_patio':0.25,
              'kitchen':0.25}#]
room_probs

In [None]:
## Lets do 1 example room - living room


## Get room sample space

## Get P_gd_given_room

## Multiply cond prob by prob being in the room


#### Let's turn that process into a function

In [None]:
def prob_event_given_room():
    pass
    
    #### Pull out current room_df from dict


    ## Get room sample space


    
    ## Get P_gd_given_room
    
        
    ## Set prob=0 if event does not exist.
    #  P = P * room_probs[room] 
    ## Print the Prob event given room (if verbose==True)
   
    ## Return the Prob 
    


In [None]:
## Test function


In [None]:
## Let's try out the function for Green Day on the back patio



In [None]:
## Calculate Total Probability using a for loop

## Get rooms from house_party
    
    ## Get conditional prob for room

    
    ## Get the prob of being in that room

    
    ## Append p_gd_given_room * p_room



In [None]:
## Checking against actual values


### Q1B: But wait...what if we have unequal probabilties for being in each room?



- The True prob of people being in each room is determined by what is going on in that room.
    - The snacks are in the kitchen (prob=0.4)
    - The drinks/bar is on the back patio (prob=0.3).
    - The living room and basement have no special amenities. (prob=0.15 each)

In [None]:
## Update house_party_room_odds
room_probs['kitchen'] = 0.4
room_probs['back_patio'] = 0.3
room_probs['living_room'] =0.15
room_probs['basement'] =0.15
room_probs

### Q2: What is the probability of hearing a Lady GaGa song at the house party at any given moment with the new room probabilities?

> The **easiest way** to do this would be to copy the for loop above and paste it below, then modify all of the variables to match the new question...

> So INSTEAD of that, **let's do it a better more programmatic way**: take the code we produced above to calculate TOTAL_PROB and lets turn it into a function.

In [None]:
## Paste your loop from Green Day question below for reference


In [None]:
## Now how can we make that process flexible?
def law_of_total_probability():
    pass

In [None]:
## test fucnction


### Q: what is the probability of hearing a song recommend by Anne?

$$ P(AnneRec)=\sum_i P(AnneRec \mid Room_i)P(Room_i)$$
- Since we made our function flexible, we can easily calculate this. 


In [None]:
## Compare that to getting val counts from whole df


# APPENDIX

### Factorials


The factorial of $n$ is calculated by multiplying every number below $n$ together.

$$n! = n  \times (n-1) \times (n-2)\times...\times1$$



**Factorial Rules/Operations:**
- Negative numbers do not have a factorial.
- $0! = 1$
- $n! = (n-1)! \cdot n$
- $(n+1)! = n! \cdot (n+1)$ 

- $ (n+k)! = n! \cdot (n+1) \cdot (n+2) \cdot... (n+k) $

- $ (n-k)! = \frac{n!}{(n-k+1)\cdot(n-k+2) \cdot ... (n-k+k) }$
- When $n>k$:
    - $\frac{n!}{k!} = (k+1)\cdot(k+2)\cdot...n $

## Laws & Theorems Based on Conditional Probability


#### Theorem 1: Product Rule

The intersection of events $A$ and $B$ can be given by

\begin{align}
    \large P(A \cap B) = P(B) P(A \mid B) = P(A) P(B \mid A)
\end{align}




#### Theorem 2: Chain Rule AKA "General Product Rule"

- Allows calculation of any member of the joint distribution of a set of random variables using _only_ conditional probabilities.

- Builds upon the product rule: 
$$P(A \cap B) = P(A \mid B) P(B)$$

- When applied to 3 variables, becomes:


$$P(A\cap B \cap C) = P(A\cap( B \cap C))$$
$$ = P(A\mid B \cap C) P(B \cap C)$$
$$ = P(A \mid B \cap C) P(B \mid C) P(C)$$

And you can keep extending this to $n$ variables.



#### Theorem 3 - Bayes Theorem



The **Bayes theorem**, which is the outcome of this section. Below is the formula that we will dig deeper into in upcoming lessons.

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

- It uses that $P(A \cap B) = P(B) P(A \mid B) = P(A) P(B \mid A)$. 
- Note that, using Bayes theorem, you **can compute conditional probabilities without explicitly needing to know $P(A \cap B)$!** 


#### Additional note: the complement of an event
- Basic complements:
$$P(A) + P(A') = 1$$
with A' being the complement of A.

- Conditional Probability Complements

$$P(A|B) + P(A'|B) = 1$$