In [1]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

import pandas as pd
import seaborn as sns
sns.set()

import plotly.express as px

Condition Prob: 

Prob that something occuring given that sometging else occured first. Ex: What's the prob that you purchase another item given that you purcahsed another item first.

* If I have two events that depend on each other, what's the probability that both will occur ?

* P(A, B) is the probability of A and B both occuring indenpendtly of each other

* P(B|A): probability of B given that A has occured, that implies some dependcies between A and B

* We know:

$$P(B|A)=\frac{P(A,B)}{P(A)}$$

Example:

* I give my students two tests. 60% of them passed both tests, but the first test was easier - 80% passed that one. What percentage of students who passed the first test also passed the second?

* A = passing the first test, B = passing the second test

* So we are asking for P(B|A) - the probability of B given A

* $P(B|A)=\frac{P(A,B)}{P(A)}\ = \frac{P(0.6)}{P(0.8)}\ = 0.75$

* 75% of students who passed the first test passed the second.

Activity:

Below is some code to create some fake data on how much stuff people purchase given their age range.

It generates 100,000 random `people` and randomly assigns them as being in their 20's, 30's, 40's, 50's, 60's or 70's.

It then assigns a lower probability for young people to buy stuff.

In the end, we have two Python dictionaries:

"total" contains the total number of people in each age group. "Purchase" contains the tital number of things purchased by people in each age group. The grand total of purchases is in totalPurchases, and we know the total number of people is 100,000.

In [3]:
from numpy import random
random.seed(0)

totals = {20:0, 30:0, 40:0, 50:0, 60:0, 70:0}
purchases = {20:0, 30:0, 40:0, 50:0, 60:0, 70:0}
totalPurchases = 0
for _ in range(100000):
    ageDecade = random.choice([20, 30, 40, 50, 60, 70])
    purchasesProbability = float(ageDecade) / 100.0
    totals[ageDecade] += 1
    if (random.random() < purchasesProbability):
        totalPurchases += 1
        purchases[ageDecade] += 1

In [4]:
totals

{20: 16576, 30: 16619, 40: 16632, 50: 16805, 60: 16664, 70: 16704}

In [5]:
purchases

{20: 3392, 30: 4974, 40: 6670, 50: 8319, 60: 9944, 70: 11713}

In [6]:
totalPurchases

45012

* Let's compute P(E|F), where E is "purchase" and F is "your are in 30's". The probability of someone in their 30's buying something is just the percentage of how many 30-year-olds bought something:

In [7]:
PEF = float(purchases[30]) / float(totals[30])
print('P(purchase | 30s): ' + str(PEF))

P(purchase | 30s): 0.29929598652145134


P(F) is just the probability of being 30 in this data set:

In [8]:
PF = float(totals[30]) / 100000.0
print('P(30s): ' + str(PF))

P(30s): 0.16619


P(E) is the overall probability of buying something, regardless of your age:

In [9]:
PE = float(totalPurchases) / 100000.0
print('P(Purchase): ' + str(PE))

P(Purchase): 0.45012


If E and F were independt, then we would expect P(E|F) to be about the same as P(E). But they're not; P(E) is 0.45, and P(E|F) is 0.3. So, that tells us that E and F are dependent

P(E, F) is different from P(E|F). P(E, F) would be the probability of both being in your 30's and buying solething, out of the total population - not just the population of people in their 30's:

In [10]:
print("P(30's, Purchase): " + str(float(purchases[30]) / 100000.0))

P(30's, Purchase): 0.04974


Let's also compute the product of P(E) and P(F), P(E)P(F):

In [15]:
print("P(30's)P(Purchase)" + str(PE * PF))

P(30's)P(Purchase)0.07480544280000001


Something you may learn in stats is that P(E, F) = P(E)P(F), but this assumes E and F are independent. We've found here that P(E, F) is about 0.5, while P(E)P(F) is about 0.75. So when E and F are dependent - and we have a conditional probability going on - we can't just say that P(E,F) = P(E)P(F).

We can also check that P(E|F) = P(E,F)/P(F), which is the relationship we showed in the slides - and sure enough, it is:

In [18]:
print((purchases[30] / 100000.0) / PF )

0.29929598652145134
