### Problem statement

We want to calculate the following probabilities
$$
(1): P(A=0 | R=1)
$$
$$
(2): P(A=1 | R=2, GR=0, GP=1)
$$

The second expression can be calculated with the given data, but the first one must we rewritten as

$$
(1)= \frac{P(A=0, R=1)}{P(R=1)} = \frac{\sum_{gp\in\{0,1\}} \sum_{gr\in\{0,1\}} P(A=0, R=1, GP=gp, GR=gr)} {\sum_{gp\in\{0,1\}} \sum_{gr\in\{0,1\}} \sum_{a\in\{0,1\}} P(A=a, R=1, GP=gp, GR=gr)}
$$

Using the probability factorization theorem we get that
$$
(1) = \frac{\sum_{gp\in\{0,1\}} \sum_{gr\in\{0,1\}} P(A=0 | R=1, GP=gp, GR=gr) * P(GP=gp | R=1) * P(GR=gr | R=1) * P(R=1)} {\sum_{gp\in\{0,1\}} \sum_{gr\in\{0,1\}} \sum_{a\in\{0,1\}} P(A=a | R=1, GP=gp, GR=gr) * P(GP=gp | R=1) * P(GR=gr | R=1) * P(R=1)}
$$

### Implementation

Load and transform the data by mapping GPA and GRE values to categories [0,1] based on their threshold conditions 

In [2]:
import pandas

df = pandas.read_csv('../data/binary.csv')

df.loc[df['gpa'] < 3, 'gpa'] = 0
df.loc[df['gpa'] >= 3, 'gpa'] = 1
df.loc[df['gre'] < 500, 'gre'] = 0
df.loc[df['gre'] >= 500, 'gre'] = 1

Define a function for calculating conditional probability applying the Laplace correction

In [3]:
def p(var_pred, cond_pred, class_count):
    return (len(df[(var_pred) & (cond_pred)]) + 1) / (len(df[cond_pred]) + class_count)

Calculate the numerator and denominator of the factorized expression and then print the value of $ P(A=0 | R=1) $

In [None]:
num = 0
for gpa in [0,1]:
    for gre in [0,1]:
        num += (len(df[df['rank'] == 1]) + 1) / (len(df) + 4) \
        * p(df['admit'] == 0, (df['rank']==1) & (df['gpa']==gpa) & (df['gre']==gre), 2) \
        * p(df['gpa'] == gpa, (df['rank']==1), 2) \
        * p(df['gre'] == gre, (df['rank']==1), 2)
        
den = 0
for admit in [0,1]:
    for gpa in [0,1]:
        for gre in [0,1]:
            den += (len(df[df['rank'] == 1]) + 1) / (len(df) + 4) \
            * p(df['admit'] == admit, (df['rank']==1) & (df['gpa']==gpa) & (df['gre']==gre), 2) \
            * p(df['gpa'] == gpa, (df['rank']==1), 2) \
            * p(df['gre'] == gre, (df['rank']==1), 2)

print(f'P(A=0 | R=1) = {round(num / den, 2)}')


We do the same for the value of $ P(A=1 | R=2, GP=0, GR=1) $

In [29]:
num = p(df['admit'] == 1, (df['rank']==2) & (df['gpa']==0) & (df['gre']==1), 2)
        
den = 1

print(f'P(A=1 | R=2, GP=0, GR=1) = {round(num / den, 2)}')

P(A=1 | R=2, GP=0, GR=1) = 0.19
