In [None]:
import comp_prob_inference
import matplotlib as plt

# Week 2 Relating Two Random Variables 

- joint probability table: can do table representation of two random variable relation by putting each variable on diff axis/side of the table, denoted by P_t,w (where t and w are the random variables)
    - t and w are said to be "jointly distributed"

- finding probability of a variable happening given another variable is set can be tricky

![title](images/week2_2vargraph.png)

- ex: finding the probability of temperature given that W = rainy? 
    - can't just say its 1/30 + 2/15 or that the prob space is that since it doesn't add up to 1
    - rather, we normalize it by dividing by what that adds up to in the broader table 
    - they add up to 1/6, so divide each value by 1/6 to find its value



Representing a Joint Probability Table in Code

In [None]:
#METHOD 1: not an actual table, but stores the info 
prob_table = {('sunny', 'hot'): 3/10,
    ('sunny', 'cold'): 1/5,
    ('rainy', 'hot'): 1/30,
    ('rainy', 'cold'): 2/15,
    ('snowy', 'hot'): 0,
    ('snowy', 'cold'): 1/3}

# accessing W = rainy and T = cold
prob_table[('rainy', 'cold')]

# METHOD 2: dictionaries within dictionaries, doesn't have any specific ordered rows/columns like the table 

prob_W_T_dict = {}
for w in {'sunny', 'rainy', 'snowy'}:
    prob_W_T_dict[w] = {}

prob_W_T_dict['sunny']['hot'] = 3/10
prob_W_T_dict['sunny']['cold'] = 1/5
prob_W_T_dict['rainy']['hot'] = 1/30
prob_W_T_dict['rainy']['cold'] = 2/15
prob_W_T_dict['snowy']['hot'] = 0
prob_W_T_dict['snowy']['cold'] = 1/3

comp_prob_inference.print_joint_prob_table_dict(prob_W_T_dict)

# accessing W = rainy and T = cold
prob_W_T_dict['rainy']['cold']

# METHOD 3: 2D Array, separate lists that are ordered to correspond to each other
import numpy as np
prob_W_T_rows = ['sunny', 'rainy', 'snowy']
prob_W_T_cols = ['hot', 'cold']
prob_W_T_array = np.array([[3/10, 1/5], [1/30, 2/15], [0, 1/3]])
comp_prob_inference.print_joint_prob_table_array(prob_W_T_array, prob_W_T_rows, prob_W_T_cols)

# accessing W = rainy and T = cold
prob_W_T_array[prob_W_T_rows.index('rainy'), prob_W_T_cols.index('cold')]



### Marginalization

- summarizing randomness
- given two random variables X and Y that have a joint probability table $p_{x,y}$, for any x that $\in$ X, the marginal probability that X = x is given by this summation equation
$$p_X(x) = \sum_{y} p_{X,Y}(x,y)$$
- recall the above example of being sunny, rainy, and snowy while also hot or cold 
- given each intersection has a different probability, we can sum up the "hot" and "cold" probabilities for each row to get a total probability 
- ex: p(sunny,hot) = 3/10 and p(sunny,cold) = 1/5, p(sunny) = 1/2
p(w) is called marginal distribution of W 
- typical way to calculate this: 
$$p_w(w)=\sum_{t\in \tau} p_{w,t}(w,t)$$

where $\tau$ is the set of values that random variable T can take on. The above equation is more often written as:
$$p_w(w) = \sum_{t} p_{w,t}(w,t)$$


In [None]:
# Exercise: Marginalization

prob_table_WI = {('sunny','1'):1/2,
              ('sunny','0'):0,
              ('rainy','1'):0,
              ('rainy','0'):1/6,
              ('snowy','1'):1/3,
              ('snowy','0'):1/3}

prob_table_XY = {('sunny','1'):1/4,
                ('sunny','0'):1/4,
                ('rainy','1'):1/12,
                ('rainy','0'):1/12,
                ('snowy','1'):1/6,
                ('snowy','0'):1/6}

prob_table_X = {('sunny'):1/2,
                ('rainy'):1/6,
                ('snowy'):1/3}

prob_table_Y = {('1'):1/2,('0'):1/2}

### Marginalization for Many Random Variables

- when having multiple random variables, our joint probability table becomes multi-dimensional
- ex: with a 3 var table, if we were to marginalize one variable, the resulting probability distribution would be a 2d table still 
- if we wanted to marginalize two of the 3 variables, resulting equation for one example variable is as follows:
$$p_X(x) = \sum_{X,Y}p_{X,Y}(x,y) = \sum_{y}(\sum_{z}p_{X,Y,Z}(x,y,z))

Conditioning for Random Variables
- conditioning: randomness of a variable given that another variable takes on a specific value 
- $p_{T|W}(cold | rainy)$ = probability of cold given rainy is set
- when doing so, make sure to normalize the probability according to what distribution you have 

QUESTION what is conditioning helpful for? what makes it diffrent from marginalization?
- conditional probability of event X =x given Y = y is described by this equation
$$p_{X,Y}(x,y) \triangleq p_{X,Y}(x,y)/p_Y(y)$$

- essentially dividing the probability of the two events by the sum of that total variable

In [1]:
from simpsons_paradox_data import *

# ex: to access female C admitted probability
joint_prob_table[gender_mapping['female'], department_mapping['C'], admission_mapping['admitted']]

# to marginalize with numpy, we sum across axis 1 (in this case, department axis)
joint_prob_gender_admission = joint_prob_table.sum(axis=1)


0.12298276623950503

### Simpson's Paradox

A real life application where a school was accused of disproportionately biasing gender when admitting students to the school

Let's say we want to find the probability that a woman applies and is admitted. Since we have three axes to our data (gender, department, and acceptance status), we need to abstract department away. This is done by marginalizing the department variable away. Then, we need to do a conditional probability: the probability that a person is admitted given that they are a woman 

$$p_{A,G} (admitted | female) = \frac{p_{A,G}(admitted,female)}{p_G(female)}$$


In [6]:
# probability of female and admitted 
joint_prob_gender_admission[gender_mapping['female'], admission_mapping['admitted']] 

# finding probability of female
female_only = joint_prob_gender_admission[gender_mapping['female']]
# normalizing it 
prob_admission_given_female = female_only / np.sum(female_only)

# turning this new conditional table into a dict format
prob_admission_given_female_dict = dict(zip(admission_labels, prob_admission_given_female))
print(prob_admission_given_female_dict)

prob_admission_given_female_dict['admitted']


{'admitted': 0.3033351498637601, 'rejected': 0.6966648501362399}


0.3033351498637601

In [8]:
male_only = joint_prob_gender_admission[gender_mapping['male']]

prob_male = male_only/np.sum(male_only)

prob_male_dict = dict(zip(admission_labels, prob_male))
prob_male_dict['admitted']

0.44519509476031227

SYNTAX NOTE: when conditioning something that is NOT on the 0th axis

In [12]:
# the : is to indicate that we want to keep everything in the 0th axis
admitted_only = joint_prob_gender_admission[:, admission_mapping['admitted']]

# probability of gender given admitted 
prob_gender_given_admitted = admitted_only / np.sum(admitted_only)
prob_gender_given_admitted_dict = dict(zip(gender_labels, prob_gender_given_admitted))
print(prob_gender_given_admitted_dict)

# conditioning in admitted and gender 
female_and_A_only = joint_prob_table[gender_mapping['female'], department_mapping['A']]

# probabilities of admitted given gender is female and deparment is A
prob_dg = joint_prob_table[gender_mapping['female'], department_mapping['A']]

restricted = prob_dg/ np.sum(prob_dg)
restricted = dict(zip(admission_labels,restricted/np.sum(restricted)))
restricted['admitted']

{'female': 0.3172274654630008, 'male': 0.6827725345369992}


0.8200000000000004

## Conditioning on Events

- somewhat of a zooming in 
- given a probability space $\Omega$, there may be two events inside called A and B, and an intersection between them
- if we choose to observe event A, for ex, we can go from p($\omega$,p) --> p(a,p(.|a)) where probability of omega is now conditioned on a
    - when we find overall probability of a now, its the probability of a combined w probability of some event given a 
    - we have to normalize each time for this case, since A's probability is still taken in terms of $\Omega$

product rule: when finding the conditional probability, can also model it as A interesct B event probability
--> $ P(A)*P(B|A)$

### Bayes' Theorem for Events 

- foundational for inference
- recall the formulas we found for conditional probability (P(a intersect b)/p(b)) and for p(a intersect b) (which is p(a)p(b|a))
- if we plug in the equation for a intersect b into the original conditional probability equation, we get bayes' theorem 

$$ P(A|B) = \frac{P(A)P(B|A)}{P(B)} $$

This is useful because it makes it easier to find conditional probabilities since we know these smaller terms more easily 


### Law of Total Probability

 - we can break down probability spaces into sections/pieces to find the probability of an event in a space 

 $$P(A) = \sum{i=1}^{n} P(A \cap B_i)$$

 where B1...Bn represents disjoint partitions that make up $\Omega$

 and A is a probability that is splayed across these partitions 

 finding P(A) is the same as finding the sum of the intersect of A given B1...Bn


conditional distribution function ? !!

Random Variables Conditioned on Events

