# Topic 11A - Intro to Combinatorics & Probability 


## Learning Objectives


- Briefly discuss Sets and set notation
- Learn to recognize Permutations vs Combinations
- Review what is probability?
- Activity: Playlist Probabilities
- Dependent vs Independent Probability AKA when to use conditional probability

# Sets

Sets are collections of elements, and the language and notation used to talk about sets are used in a lot of other placs, so it's nice to go ahead and lay the groundwork here.

## Set notation
<img src="images/setnotation.jpg" alt="set notation, found from https://slideplayer.com/slide/10502152/" width=800>

[Image Source](https://slideplayer.com/slide/10502152/)

More on Sets: https://www.mathsisfun.com/sets/sets-introduction.html

### But Now, in Python:

 

| Method        |	Equivalent |	Result |
| ------                    | ------       | ------    |
| s.issubset(t)             |	s <= t     | test whether every element in s is in t
| s.issuperset(t)           |	s >= t     | test whether every element in t is in s
| s.union(t)                |	s $\mid$ t | new set with elements from both s and t
| s.intersection(t)         |	s & t      | new set with elements common to s and t
| s.difference(t)           |	s - t 	   | new set with elements in s but not in t
| s.symmetric_difference(t) |	s ^ t      | new set with elements in either s or t but not both

# Permutations vs. Combinations

Now let's talk about how you can take elements from different collections and group them.

## What's the difference between a permutation and a combination?

## Permutations

- In a **permutation**, order matters.
    - If you have a race, it matters who arrives in first, or second, or third place - there's a difference in the ordering of the group!

> ***Permutation: How many ways to arrange n elements?***
    $$\large P(n) = n!$$
___
> ***Permutations with replacement: How many ways to select $j$ elements out of a pool of $n$ objects?***
$$ \large {P}_{j}^{n} = n^j $$
- where $n$ = total number of elements
- $j$ = number of positions to fill   


___
> ***Permutations (without replacement): How many ways to select $k$ elements out of a pool of $n$ objects? (AKA permutations of a subset)***
$$ \large P_{k}^{n}= \dfrac{n!}{(n-k)!}$$ <br>this is known as the **$k$-permutation of $n$**
    
    


## Combinations

- In a **combination**, you only care about which items are members of the set. 
    - For example, if you're creating groups of students to work on a project, the order in which you add their names to the group doesn't really matter - it's the group members itself, not any order, that you care about.
    
> ***Combination: How many ways can we create a subset $k$ out of $n$ objects, when order is not important?*** 
$$\large C_{k}^{n} = \displaystyle\binom{n}{k} = \dfrac{P_{k}^{n}}{k!}=\dfrac{ \dfrac{n!}{(n-k)!}}{k!} = \dfrac{n!}{(n-k)!k!}$$


- Simplified combination equation
$$\large C_{k}^{n} = \dfrac{n!}{(n-k)!k!}$$

## Practice Questions

### Q1: How many possible codes are there for a standard padlock?

> Hint: (there are 40 numbers on a padlock and use 3 numbers.)

For the first number: 40 choices. For the second number, still 40 choices: $40\cdot40=40^2$. Again, for 3rd number, still 40 choices: $40\cdot40\cdot40=40^3$

This is an example of... 

 - **Permutation** or Combination ?

In [1]:
# A:
40**3

64000

### Q2: How many unique 3 topping pizzas can you make from the following ingredients:

- Mushrooms
- Pepperoni
- Onion
- Peppers
- Ham
- Pineapple
- Sausage
- Olives
    
> Side note: which is the worst?

This is an example of... 

 - Permutation or **Combination**?

In [2]:
import itertools

In [3]:
toppings = ["Mushrooms","Pepperoni","Onion","Peppers","Ham","Pineapple","Sausage","Olives"]
three_topping_pizzas = list(itertools.combinations(toppings, 3))
len(three_topping_pizzas)

56

How many unique 3 topping pizzas can you make from the following ingredients:

- Mushrooms
- Pepperoni
- Onion
- Peppers
- Ham
- Pineapple
- Sausage
- Olives
    
> Side note: which is the worst?

This is an example of... 

 - Permutation or **Combination**?

In [4]:
import itertools

toppings = ["Mushrooms","Pepperoni","Onion","Peppers","Ham","Pineapple","Sausage","Olives"]
three_topping_pizzas = list(itertools.combinations(toppings, 3))
len(three_topping_pizzas)

56

## What is probability?

> **Probability is the likelihood of a specific outcome/event occuring out of all possible outcomes, expressed as a fraction between 0 and 1.**

Perhaps more importantly:

> **"Probabilities do not tell us what will happen for sure; they tell us what is _likely to happen_ and what is _less likely to happen_."**
>
> -- _Naked Statistics_, by Charles Wheelan, p. 72

<!---
Example Probability Qs:
- How likely is it to end up with heads when flipping a coin once? (the answer here is 50% - not very surprising)

- How likely is it to end up with exactly 2 x heads and 3 x tails when flipping a coin 5 times?

- How likely is it to throw tails first, then heads, then tails, then heads, then tails when flipping a coin 5 times?

- If you throw 5 dice, what is the probability of throwing a ["full house"](http://grail.sourceforge.net/demo/yahtzee/rules.html)?

- What is the probability of drawing 2 consecutive aces from a standard deck of cards?

> But how do we calculate it? ..._to be continued_...

--->

In general, you can think of dividing the outcome you're exploring by all possible outcomes:

$$ P(Event) = \frac{|Event|}{|Sample\ Space|} $$

##### Sample space:

$$S = \{ 1,2,3,4,5,6\}$$ 
being the possible outcomes when throwing a dice.
- Sample space =  $\Omega$ 

##### Event space:

-   The **event space** is a subset of the sample space. It is the **desired outcome** of the experiment.
$$E \subseteq S$$
-   Example:
    -   Throwing an odd number would lead to an event space $$E = \{ 1,3,5\}$$.

### Probability of an Event

$$ P(E) = \frac{|E|}{|S|} $$
probability is the number of possible preferred outcomes over the sample space / all outcomes

<div style=color:red;>### Addition law of probability 

-   Prob of union of A and B is individual P minus intersection

$$ \large P(A\cup B) = P(A) + P(B) - P(A \cap B)$$</div>

# 🕹 Activity: Dinner Party Playlist Permutations & Probabilities

- We are constructing a Spotify playlist for a dinner party that we are planning. 

- We asked our attendees to each provide a handful of songs they would like to be played at the dinner party.

- Each guest has provided their requests in a csv.

> - Load in and join the csvs into one df, keeping track of who requested what. 

In [5]:
import pandas as pd
import numpy as np
from math import factorial
import os,glob

## Our Guests requests
datafolder = "playlist_requests/"
request_files = glob.glob(datafolder+"*.csv")
request_files

['playlist_requests/joe_recs.csv',
 'playlist_requests/james_recs.csv',
 'playlist_requests/anne_recs.csv',
 'playlist_requests/carla_recs.csv',
 'playlist_requests/john_recs.csv',
 'playlist_requests/samantha_recs.csv']

In [6]:
## Make a playlist dictionary
playlists = {}

for file in request_files:
    ## Slice out the requester's name
    name = file.split('/')[-1].replace('_recs.csv','')
    
    ## load the csv file 
    temp_df = pd.read_csv(file)
    
    ## save the person's name in a new "Requested By" column
    temp_df['Requested By'] = name.title()
    
    ## save the csv as a df to the dict, using name as the key
    playlists[name] = temp_df
    del temp_df

    
playlists.keys()

dict_keys(['joe', 'james', 'anne', 'carla', 'john', 'samantha'])

In [7]:
## Loop through the dict and display the requester's name and their df
for name,playlist in playlists.items():
    
    ## Can use .style.set_caption(name) or just print name before display(df)
    display(playlist.style.set_caption(f"{name.title()}'s Requests:"))

Unnamed: 0,artist,track,Requested By
0,Green Day,Time of your Life,Joe
1,B-52s,Rock Lobster,Joe
2,Lady GaGa,Poker Face,Joe
3,John Lennon,Imagine,Joe


Unnamed: 0,artist,track,Requested By
0,Eve 6,Here's to the Night,James
1,Neutral Milk Hotel,Into the Aeroplane Over the Sea,James
2,Rilo Kiley,With Arms Outstretched,James
3,Red Hot Chili Peppers,Otherside,James


Unnamed: 0,artist,track,Requested By
0,Smashing Pumpkins,"Tonight, Tonight",Anne
1,Black Eyed Peas,Let's Get it Started,Anne
2,Green Day,Time of your Life,Anne


Unnamed: 0,artist,track,Requested By
0,Cartman (South Park),Poker Face,Carla
1,Nicki Minaj,Right By My Side,Carla
2,Kelly Clarkson,Since You've Been Gone,Carla
3,Nicki Minaj,Marilyn Monroe,Carla
4,Kelly Clarkson,Never Again,Carla
5,Green Day,Minority,Carla


Unnamed: 0,artist,track,Requested By
0,Black Eyed Peas,Let's Get it Started,John
1,Lady GaGa,Poker Face,John
2,Lady GaGa,Bad Romance,John
3,Lady GaGa,Just Dance,John


Unnamed: 0,artist,track,Requested By
0,Black Eyed Peas,Let's Get it Started,Samantha
1,Panic at the Disco,Hallelujah,Samantha
2,Adele,Set Fire to the Rain,Samantha


> - We want to make sure the soundscape at our party is representative of the group, so let's take everyones' recommendations (even if the same song has been recommended by someone else). 


In [8]:
## Create 1 df for all rescs (and reset index)
df = pd.concat(playlists).reset_index(drop=True)
df

Unnamed: 0,artist,track,Requested By
0,Green Day,Time of your Life,Joe
1,B-52s,Rock Lobster,Joe
2,Lady GaGa,Poker Face,Joe
3,John Lennon,Imagine,Joe
4,Eve 6,Here's to the Night,James
5,Neutral Milk Hotel,Into the Aeroplane Over the Sea,James
6,Rilo Kiley,With Arms Outstretched,James
7,Red Hot Chili Peppers,Otherside,James
8,Smashing Pumpkins,"Tonight, Tonight",Anne
9,Black Eyed Peas,Let's Get it Started,Anne


## For the following questions:
> **Assume we just accept everyone's suggestions allowing duplicate songs and play on shuffle, with repeat set to None/False**

### Q1: What is the probability of the next song being by Lady GaGa?

$$ P(E) = \frac{|E|}{|S|} $$


In [9]:
## Use value_counts to make Sample Spaces for Tracks and artists
sample_space = df['artist'].value_counts()
sample_space

Lady GaGa                4
Black Eyed Peas          3
Green Day                3
Nicki Minaj              2
Kelly Clarkson           2
B-52s                    1
Neutral Milk Hotel       1
Panic at the Disco       1
Red Hot Chili Peppers    1
Adele                    1
John Lennon              1
Cartman (South Park)     1
Smashing Pumpkins        1
Rilo Kiley               1
Eve 6                    1
Name: artist, dtype: int64

In [10]:
## What would be the number of events that meet our criterion? |E|
E = sample_space['Lady GaGa']
E

4

In [11]:
## What about the sample_space?
S = sample_space.sum()
S

24

In [12]:
## P_lady_gaga 
P_lady_gaga = E/S
P_lady_gaga

0.16666666666666666

### Q2: What is the probability of the next song being "Time of Your Life"?

In [13]:
## Making Sample Spaces for Tracks and artists
sample_space = df['track'].value_counts()
display(sample_space)

Let's Get it Started               3
Poker Face                         3
Time of your Life                  2
Here's to the Night                1
Into the Aeroplane Over the Sea    1
Set Fire to the Rain               1
Otherside                          1
Hallelujah                         1
Marilyn Monroe                     1
With Arms Outstretched             1
Rock Lobster                       1
Minority                           1
Right By My Side                   1
Just Dance                         1
Bad Romance                        1
Imagine                            1
Tonight, Tonight                   1
Never Again                        1
Since You've Been Gone             1
Name: track, dtype: int64

In [14]:
## What is the event space?
E = sample_space.loc['Time of your Life']
E

2

In [15]:
## What about the sample_space?
S = sample_space.sum()
S

24

In [16]:
P_time_of_your_life = E/S
P_time_of_your_life

0.08333333333333333

### Q3: what is the probability of hearing a song by Lady GaGa or Green Day?


In [17]:
## Get the event space
sample_space = df['artist'].value_counts()
sample_space

Lady GaGa                4
Black Eyed Peas          3
Green Day                3
Nicki Minaj              2
Kelly Clarkson           2
B-52s                    1
Neutral Milk Hotel       1
Panic at the Disco       1
Red Hot Chili Peppers    1
Adele                    1
John Lennon              1
Cartman (South Park)     1
Smashing Pumpkins        1
Rilo Kiley               1
Eve 6                    1
Name: artist, dtype: int64

In [18]:
## Get the Event Space
E = sample_space.loc['Lady GaGa'] + sample_space.loc['Green Day']
E

7

In [19]:
## Get the sample sapce
S = sample_space.sum()
S

24

In [20]:
P_lady_gaga_or_greenday = E/S
P_lady_gaga_or_greenday

0.2916666666666667

### Q1: How many different ways could we build a playlist using everyone's recommendations (without shuffle, no looping, and no repeated songs)?

- Q: Combination or permutation?
    - A:
   

- Q: What formula would we use?
    - A:   $$\large P(n) = n!$$

In [21]:
from math import factorial
perm_n = factorial(len(df))
ans = f"{perm_n:,d}"
ans

'620,448,401,733,239,439,360,000'

### Q2: What if we limit the playlist to only 10 songs, without replacement? How many possible playlists?

- Q: Combination or permutation?
    - A:

- Q: What formula would we use?
    - A:$$ \large P_{k}^{n}= \dfrac{n!}{(n-k)!}$$ 



In [22]:
n = len(df)
k = 10
n,k

(24, 10)

In [23]:
"{:,}".format(factorial(n)/factorial(n-k))

'7,117,005,772,800.0'

### Q3: what if we limit the playlist to 10 songs, WITH replacement?

- Q: Combination or permutation?
    - A:

- Q: What formula would we use?
    - A: $$ \large {P}_{j}^{n} = n^j $$


In [24]:
"{:,}".format(18**10)

'3,570,467,226,624'

### Q4: what if we select 10 songs out of the total number of suggestions and allow for repitition?

- Q: Combination or permutation?
    - A:

- Q: What formula would we use?
    - A:
    $$\large C_{k}^{n} = \displaystyle\binom{n}{k} = \dfrac{P_{k}^{n}}{k!}=\dfrac{ \dfrac{n!}{(n-k)!}}{k!} = \dfrac{n!}{(n-k)!k!}$$

In [25]:
ans = factorial(18) / (factorial(18-10)*factorial(10))
"{:,}".format(ans)

'43,758.0'

## So....  We realize we need to relax and not worry about the song-order. That's what Shuffle is for, right? 😅

### Q5: How many playlists can we produce for an 8-track playlist from the unique suggested songs (10)?

- Q: Combination or permutation?
    - A:

- Q: What formula would we use?
    - A:
3
    $$\large C_{k}^{n} = \displaystyle\binom{n}{k} = \dfrac{P_{k}^{n}}{k!}=\dfrac{ \dfrac{n!}{(n-k)!}}{k!} = \dfrac{n!}{(n-k)!k!}$$

In [26]:
n = 10
k = 8

ans = factorial(n) / (factorial(n-k)*factorial(k))
ans

45.0

# CONDITIONAL PROBABILITY

**Conditional probability emerges when the outcome a trial may influence the results of the upcoming trials (when we have dependent events)**


<img src="https://raw.githubusercontent.com/jirvingphd/dsc-conditional-probability-online-ds-ft-100719/master/images/Image_71_TreeDiag.png" width = 500>

$$ P (A \mid B) = \dfrac{P(A \cap B)}{P(B)}$$

$P(A|B)$, is the probability A **given** that $B$ has just happened. 

## Types of Events

#### Indepdent Events

**Events $A$ and $B$ are independent when the occurrence of $A$ has no effect on whether $B$ will occur (or not).**
<!-- 
- A and B are independent if:
    - $P(A \cap B) = P(A)\cdot P(B)$

 <img src="https://raw.githubusercontent.com/jirvingphd/dsc-conditional-probability-online-ds-ft-100719/master/images/Image_67_independent.png" width=30%>

- Probability of A or B occurring:
    - $P (A \cup B) = P(A) + P(B) - P(A \cap B)$

 -->


#### Disjoint Events




**Events $A$ and $B$ are disjoint if $A$ occurring means that $B$ cannot occur.**

Disjoint events are **mutually exclusive**. $P (A \cap B)$ is **empty**.
<!-- 
<img src="https://raw.githubusercontent.com/jirvingphd/dsc-conditional-probability-online-ds-ft-100719/master/images/Image_68Disjoint.png" width=30%>

 -->

#### Dependent Events

**Events $A$ and $B$ are dependent when the occurrence of $A$ somehow has an effect on whether $B$ will occur (or not).**

<img src="https://raw.githubusercontent.com/learn-co-students/dsc-conditional-probability-onl01-dtsc-ft-070620/master/images/Image_69_Marb.png" width=50%>



# 🕹 Activity Part 2: Dinner Party Playlist - Conditional Probabilities

In [27]:

from math import factorial

### Q: What is the probability of hearing "Let's Get it started"?

In [28]:
## Get sample space
sample_space = df['track'].value_counts()
sample_space

Let's Get it Started               3
Poker Face                         3
Time of your Life                  2
Here's to the Night                1
Into the Aeroplane Over the Sea    1
Set Fire to the Rain               1
Otherside                          1
Hallelujah                         1
Marilyn Monroe                     1
With Arms Outstretched             1
Rock Lobster                       1
Minority                           1
Right By My Side                   1
Just Dance                         1
Bad Romance                        1
Imagine                            1
Tonight, Tonight                   1
Never Again                        1
Since You've Been Gone             1
Name: track, dtype: int64

In [29]:
## Get event space
E = sample_space.loc["Let's Get it Started"]
E

3

In [30]:
S = sample_space.sum()
S

24

In [31]:
## Get Event Space
P_lets_get_it = E/S
P_lets_get_it

0.125

### Q: what is the probability that the song playing is "Poker Face", given that the song is by Lady GaGa?

- **What would be the formula to calculate $P(\text{PokerFace}|\text{LadyGaga})$  ?**

A:


In [32]:
## Get Value Counts for all tracks
sample_space = df.groupby('artist').get_group("Lady GaGa")['track'].value_counts()
sample_space

Poker Face     2
Bad Romance    1
Just Dance     1
Name: track, dtype: int64

In [33]:
E = sample_space.loc['Poker Face']
E

2

In [34]:
S = sample_space.sum()
S

4

In [35]:
E/S

0.5

### Q: What is the  probability that the song playing is by Lady GaGa, given that it is titled "Poker Face"?


In [36]:
poker_face_space = df.groupby('track').get_group('Poker Face')
sample_space = poker_face_space['artist'].value_counts()
sample_space

Lady GaGa               2
Cartman (South Park)    1
Name: artist, dtype: int64

In [37]:
E = sample_space.loc["Lady GaGa"]
S = sample_space.sum()
E/S

0.6666666666666666

# The Law of Total Probability


<img src="https://raw.githubusercontent.com/jirvingphd/dsc-law-of-total-probability-online-ds-ft-100719/master/images/Image_55_TotProb.png" width=50%>

$$\large P(B)= \sum_i P(B \cap A_i)= \sum_i P(B \mid A_i)P(A_i)$$


- This law allows us to calculate $P(B)$ from partial/conditional probabilitie of subsets ($A_n$).
- Requires that the different $A$'s that make up sample space $S$ be disjointed events.


S $A_1, A_2, \dots, A_n$ partition sample space $S$ into disjoint regions that sum up to $S$.




<img src="https://raw.githubusercontent.com/learn-co-students/dsc-law-of-total-probability-onl01-dtsc-ft-070620/master/images/Image_56_vent.png" width=50%>





## 🕹 Activity Pt 4: Law of Total House Party Playlists

- We've decided to be a little more adventurous and turn our dinner party into a larger house party.
    - The House party spread across 4 rooms that we assume people will spread their time evenly across the various rooms:
        - living room
        - basement
        - back patio
        - kitchen
        
- We have separate play lists playing at each location that were constructed with our dinner party recommendations.


#### OUR HOUSE PARTY & LAW OF TOTAL PROB
- Our House Party = space $S$
- The 4 rooms in the house are A1,A2,A3,A4
- B represents the probability of hearing a specific song or artist as you wander the house.

<img src="https://raw.githubusercontent.com/jirvingphd/dsc-law-of-total-probability-online-ds-ft-100719/master/images/Image_55_TotProb.png" width=50%>

$$P(B)= \sum_i P(B \cap A_i)= \sum_i P(B \mid A_i)P(A_i)$$




In [40]:
import os
folder = "playlist_requests/house_party/"
os.makedirs(folder,exist_ok=True)

house_party = dict(living_room = df.sample(12,random_state=12).reset_index(drop=True).copy(),
                   basement = df.sample(10,random_state=321).reset_index(drop=True).copy(), 
                   back_patio = df.sample(9,random_state=42).reset_index(drop=True).copy(),
                  kitchen=df.sample(8,random_state=3210).reset_index(drop=True).copy())

## Save for later

for room,room_df in house_party.items():
    room_df.to_csv(f"{folder}/{room}.csv",index=False)
    
## Preview
for k,df_ in house_party.items():
    df_['Room'] = k 
    display(df_.style.set_caption(f"Playlist for {k}"))
    

Unnamed: 0,artist,track,Requested By,Room
0,Lady GaGa,Poker Face,John,living_room
1,Adele,Set Fire to the Rain,Samantha,living_room
2,Red Hot Chili Peppers,Otherside,James,living_room
3,Nicki Minaj,Marilyn Monroe,Carla,living_room
4,Green Day,Time of your Life,Anne,living_room
5,Kelly Clarkson,Since You've Been Gone,Carla,living_room
6,Smashing Pumpkins,"Tonight, Tonight",Anne,living_room
7,Black Eyed Peas,Let's Get it Started,Samantha,living_room
8,Green Day,Time of your Life,Joe,living_room
9,Kelly Clarkson,Never Again,Carla,living_room


Unnamed: 0,artist,track,Requested By,Room
0,Kelly Clarkson,Since You've Been Gone,Carla,basement
1,Green Day,Time of your Life,Joe,basement
2,Panic at the Disco,Hallelujah,Samantha,basement
3,Green Day,Minority,Carla,basement
4,John Lennon,Imagine,Joe,basement
5,Nicki Minaj,Right By My Side,Carla,basement
6,Lady GaGa,Poker Face,Joe,basement
7,Adele,Set Fire to the Rain,Samantha,basement
8,Red Hot Chili Peppers,Otherside,James,basement
9,Lady GaGa,Poker Face,John,basement


Unnamed: 0,artist,track,Requested By,Room
0,Smashing Pumpkins,"Tonight, Tonight",Anne,back_patio
1,Green Day,Minority,Carla,back_patio
2,Green Day,Time of your Life,Joe,back_patio
3,Lady GaGa,Poker Face,John,back_patio
4,Cartman (South Park),Poker Face,Carla,back_patio
5,Black Eyed Peas,Let's Get it Started,Anne,back_patio
6,Kelly Clarkson,Since You've Been Gone,Carla,back_patio
7,B-52s,Rock Lobster,Joe,back_patio
8,Black Eyed Peas,Let's Get it Started,Samantha,back_patio


Unnamed: 0,artist,track,Requested By,Room
0,Red Hot Chili Peppers,Otherside,James,kitchen
1,Lady GaGa,Poker Face,John,kitchen
2,Black Eyed Peas,Let's Get it Started,Samantha,kitchen
3,Lady GaGa,Bad Romance,John,kitchen
4,Nicki Minaj,Right By My Side,Carla,kitchen
5,Rilo Kiley,With Arms Outstretched,James,kitchen
6,Lady GaGa,Poker Face,Joe,kitchen
7,Kelly Clarkson,Since You've Been Gone,Carla,kitchen


### Q1: What is the probability of hearing a Green Day song at the house party at any given moment?

####  To Calculate $P(GD)$ for a Room

$$ P(\text{Green Day})=\sum_i P(\text{Green Day} \mid \text{Room}_i)P(\text{Room}_i)$$

- Q: **With our 4 rooms, what would our equation look like?**
    - What is the probability of being in each room?

- A:$$P(\text{Green Day})= P(GD|Room1)\times \frac{1}{4} + P(GD|Room2)\times \frac{1}{4} + P(GD|Room3)\times \frac{1}{4} + P(GD|Room4)\times \frac{1}{4} $$

In [41]:
## Make a dictionary of prob of being in each room
room_probs = {'living_room':0.25, 
              'basement':0.25,
              'back_patio':0.25,
              'kitchen':0.25}#]
room_probs

{'living_room': 0.25, 'basement': 0.25, 'back_patio': 0.25, 'kitchen': 0.25}

In [42]:
## Lets do 1 example room - living room
room_df = house_party['living_room']
room_df

## Get room sample space
room_space = room_df['artist'].value_counts()
room_space

## Get P_gd_given_room
p_gd_given_room = room_space.loc['Green Day']/room_space.sum()
p_gd_given_room

## Multiply cond prob by prob being in the room
p_gd_given_room * room_probs['living_room']

0.041666666666666664

In [43]:
#### Let's turn that process into a function

In [44]:
def prob_event_given_room(house_party,room_probs,sample_space='artist',
                          room='living_room',
                          event='Green Day',verbose=True
                         ):
    
    #### Pull out current room_df from dict
    room_df = house_party[room]

    ## Get room sample space
    room_space = room_df[sample_space].value_counts()

    try:
        ## Get P_gd_given_room
        P = room_space.loc[event]/room_space.sum()
    except:
        P =0 
        
        ## Set prob=0 if event does not exist.
#     P = P * room_probs[room] 
    ## Print the Prob event given room (if verbose==True)
    if verbose: #==True
        print(f"P({event}|{room}) = {round(P,3)}")
    ## Return the Prob 
    return  P#P * room_probs[room]


In [45]:
prob_event_given_room(house_party,room_probs)

P(Green Day|living_room) = 0.167


0.16666666666666666

In [46]:
## Let's try out the function for Green Day on the back patio

prob_event_given_room(house_party,room_probs,room='back_patio',
#                       sample_space='artist',event='Green Day',
              verbose=True);

P(Green Day|back_patio) = 0.222


In [47]:
room_probs

{'living_room': 0.25, 'basement': 0.25, 'back_patio': 0.25, 'kitchen': 0.25}

In [48]:
house_party['kitchen']

Unnamed: 0,artist,track,Requested By,Room
0,Red Hot Chili Peppers,Otherside,James,kitchen
1,Lady GaGa,Poker Face,John,kitchen
2,Black Eyed Peas,Let's Get it Started,Samantha,kitchen
3,Lady GaGa,Bad Romance,John,kitchen
4,Nicki Minaj,Right By My Side,Carla,kitchen
5,Rilo Kiley,With Arms Outstretched,James,kitchen
6,Lady GaGa,Poker Face,Joe,kitchen
7,Kelly Clarkson,Since You've Been Gone,Carla,kitchen


In [49]:
## Calculate Total Probability using a for loop
TOTAL_PROB = []

## Get rooms from house_party
for room in house_party.keys():
    
    ## Get conditional prob for room
    p_gd_given_room = prob_event_given_room(house_party,room_probs,
                                            sample_space='artist',room=room,
                                    event='Green Day',verbose=True)
    
    ## Get the prob of being in that room
    p_room = room_probs[room]
    
    ## Append p_gd_given_room * p_room
    TOTAL_PROB.append(p_gd_given_room * p_room)
print()
print(f"P(Green Day) = {sum(TOTAL_PROB)}")

P(Green Day|living_room) = 0.167
P(Green Day|basement) = 0.2
P(Green Day|back_patio) = 0.222
P(Green Day|kitchen) = 0

P(Green Day) = 0.14722222222222223


In [50]:
## Checking against actual values
counts = pd.concat(house_party)['artist'].value_counts(normalize=True)
counts

Lady GaGa                0.205128
Green Day                0.153846
Black Eyed Peas          0.128205
Kelly Clarkson           0.128205
Nicki Minaj              0.076923
Red Hot Chili Peppers    0.076923
Adele                    0.051282
Smashing Pumpkins        0.051282
Rilo Kiley               0.025641
B-52s                    0.025641
Cartman (South Park)     0.025641
John Lennon              0.025641
Panic at the Disco       0.025641
Name: artist, dtype: float64

### Q1B: But wait...what if we have unequal probabilties for being in each room?



- The True prob of people being in each room is determined by what is going on in that room.
    - The snacks are in the kitchen (prob=0.4)
    - The drinks/bar is on the back patio (prob=0.3).
    - The living room and basement have no special amenities. (prob=0.15 each)

In [51]:
## Update house_party_room_odds
room_probs['kitchen'] = 0.4
room_probs['back_patio'] = 0.3
room_probs['living_room'] =0.15
room_probs['basement'] =0.15
room_probs

{'living_room': 0.15, 'basement': 0.15, 'back_patio': 0.3, 'kitchen': 0.4}

### Q2: What is the probability of hearing a Lady GaGa song at the house party at any given moment with the new room probabilities?

> The **easiest way** to do this would be to copy the for loop above and paste it below, then modify all of the variables to match the new question...

> So INSTEAD of that, **let's do it a better more programmatic way**: take the code we produced above to calculate TOTAL_PROB and lets turn it into a function.

In [52]:
## Paste your loop from Green Day question below for reference
TOTAL_PROB = []

## Get rooms from house_party
for room in house_party.keys():
    
    ## Get conditional prob for room
    p_gd_given_room = prob_event_given_room(house_party,room_probs,
                                            sample_space='artist',room=room,
                                    event='Green Day',verbose=True)
    
    ## Get the prob of being in that room
    p_room = room_probs[room]
    
    ## Append p_gd_given_room * p_room
    TOTAL_PROB.append(p_gd_given_room * p_room)
print()
print(f"P(Green Day) = {sum(TOTAL_PROB)}")

P(Green Day|living_room) = 0.167
P(Green Day|basement) = 0.2
P(Green Day|back_patio) = 0.222
P(Green Day|kitchen) = 0

P(Green Day) = 0.12166666666666666


In [53]:
## Now how can we make that process flexible?
def law_of_total_probability(house_party, room_probs,
                             sample_space='artist',
                             event='Green Day',
                             verbose=True):
            ## Paste your loop from Green Day question below for reference
    TOTAL_PROB = []

    ## Get rooms from house_party
    for room in house_party.keys():

        ## Get conditional prob for room
        P = prob_event_given_room(house_party,room_probs,room=room,
                                                sample_space=sample_space,
                                        event=event,verbose=verbose)

        ## Get the prob of being in that room
        p_room = room_probs[room]

        ## Append p_gd_given_room * p_room
        TOTAL_PROB.append(P * p_room)
    print()
    print(f"P({event}) = {sum(TOTAL_PROB)}" )
    return sum(TOTAL_PROB)

In [54]:
law_of_total_probability(house_party,room_probs,'artist','Green Day')

P(Green Day|living_room) = 0.167
P(Green Day|basement) = 0.2
P(Green Day|back_patio) = 0.222
P(Green Day|kitchen) = 0

P(Green Day) = 0.12166666666666666


0.12166666666666666

### Q: what is the probability of hearing a song recommend by Anne?

$$ P(AnneRec)=\sum_i P(AnneRec \mid Room_i)P(Room_i)$$
- Since we made our function flexible, we can easily calculate this. 


In [55]:
law_of_total_probability(house_party,room_probs,
                         sample_space='Requested By', event='Anne')

P(Anne|living_room) = 0.25
P(Anne|basement) = 0
P(Anne|back_patio) = 0.222
P(Anne|kitchen) = 0

P(Anne) = 0.10416666666666666


0.10416666666666666

In [56]:
## Compare that to getting val counts from whole df
counts = pd.concat(house_party)['Requested By'].value_counts(normalize=True)
counts

Carla       0.282051
Joe         0.179487
John        0.153846
Samantha    0.153846
Anne        0.128205
James       0.102564
Name: Requested By, dtype: float64

# APPENDIX

### Factorials


The factorial of $n$ is calculated by multiplying every number below $n$ together.

$$n! = n  \times (n-1) \times (n-2)\times...\times1$$



**Factorial Rules/Operations:**
- Negative numbers do not have a factorial.
- $0! = 1$
- $n! = (n-1)! \cdot n$
- $(n+1)! = n! \cdot (n+1)$ 

- $ (n+k)! = n! \cdot (n+1) \cdot (n+2) \cdot... (n+k) $

- $ (n-k)! = \frac{n!}{(n-k+1)\cdot(n-k+2) \cdot ... (n-k+k) }$
- When $n>k$:
    - $\frac{n!}{k!} = (k+1)\cdot(k+2)\cdot...n $

### Permutations with repetition

- Permutation where there are some elements that may appear multiple times. 
    - i.e. looking at the word TENNESSEE by itself, you can swap the 3rd and the 4th letter and have the same word. So the total number is less than $9!$.
    
    - The solution is to divide $9!$ by the factorials for each letter that is repeated!
    - The answer here is then (9 letters, 4 x E, 2 x N, 2 x S)

    - $\dfrac{9!}{4!2!2!} = 3780$

The general formula can be written as:

$$\dfrac{n!}{n_1!n_2!\ldots n_j!}$$

where $n_j$ stands for identical objects of type $j$ (the distinct letters in our TENNESSEE example). 

## Laws & Theorems Based on Conditional Probability


#### Theorem 1: Product Rule

The intersection of events $A$ and $B$ can be given by

\begin{align}
    \large P(A \cap B) = P(B) P(A \mid B) = P(A) P(B \mid A)
\end{align}




#### Theorem 2: Chain Rule AKA "General Product Rule"

- Allows calculation of any member of the joint distribution of a set of random variables using _only_ conditional probabilities.

- Builds upon the product rule: 
$$P(A \cap B) = P(A \mid B) P(B)$$

- When applied to 3 variables, becomes:


$$P(A\cap B \cap C) = P(A\cap( B \cap C))$$
$$ = P(A\mid B \cap C) P(B \cap C)$$
$$ = P(A \mid B \cap C) P(B \mid C) P(C)$$

And you can keep extending this to $n$ variables.



#### Theorem 3 - Bayes Theorem



The **Bayes theorem**, which is the outcome of this section. Below is the formula that we will dig deeper into in upcoming lessons.

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

- It uses that $P(A \cap B) = P(B) P(A \mid B) = P(A) P(B \mid A)$. 
- Note that, using Bayes theorem, you **can compute conditional probabilities without explicitly needing to know $P(A \cap B)$!** 


#### Additional note: the complement of an event
- Basic complements:
$$P(A) + P(A') = 1$$
with A' being the complement of A.

- Conditional Probability Complements

$$P(A|B) + P(A'|B) = 1$$