# Bayes Theorem

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

In [2]:
df = sns.load_dataset('titanic')
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [3]:
print(df.sex.value_counts())

# Making sure there are no N/A values:
assert df.sex.isna().value_counts().shape[0] == 1

df['passenger_male'] = (df.sex == 'male').astype(int)

male      577
female    314
Name: sex, dtype: int64


In [4]:
assert df.alone.isna().value_counts().shape[0] == 1

df['alone'] = df.alone.astype(int)

In [5]:
print(df.who.value_counts())

df['child'] = (df.who == 'child').astype(int)

man      537
woman    271
child     83
Name: who, dtype: int64


In [6]:
df[['Queenstown_departure', 'Southampton_departure']] = pd.get_dummies(df.embark_town, drop_first=True)

In [7]:
df = df[['survived', 'child', 'Queenstown_departure', 'Southampton_departure', 'passenger_male']]
df

Unnamed: 0,survived,child,Queenstown_departure,Southampton_departure,passenger_male
0,0,0,0,1,1
1,1,0,0,0,0
2,1,0,0,1,0
3,1,0,0,1,0
4,0,0,0,1,1
...,...,...,...,...,...
886,0,0,0,1,1
887,1,0,0,1,0
888,0,0,0,1,0
889,1,0,0,0,1


In [8]:
X = df.drop(columns='survived').to_numpy()
y = df.survived.to_numpy()

### Bayes Theorem

Here is Bayes theorem:
$$
P(H|E)=\frac{P(E|H)P(H)}{P(E)}
$$

The $P(H|E)$ represents the probability of a given result based on the evidence $E$. The evidence $E$ typically represents a row of data, and the $H$ is typically the class label, so a 0 or 1, or whatever.

$P(E|H)$ represents the probability of the outcome $H$ given the piece of evidence $E$. Because this typically applies to a row of data, in practice, this works more like 

In [9]:
np.sum(X[y==1,:], axis=0) / X[y==1,:].shape[0]

array([0.14327485, 0.0877193 , 0.63450292, 0.31871345])

In [10]:
def standard_naive_bayes(X, y, sample):
    sample = np.array(sample).reshape(1,-1)
    X_compare = (X == np.matmul(np.ones((X.shape[0],1)), sample)).astype(int)
    
    numerator_1 = np.product(np.sum(X_compare[y == 1,:], axis=0) / X_compare[y == 1,:].shape[0]) * (np.sum(y == 1) / y.shape[0])
    numerator_0 = np.product(np.sum(X_compare[y == 0,:], axis=0) / X_compare[y == 0,:].shape[0]) * (np.sum(y == 0) / y.shape[0])

    return numerator_0 / (numerator_1 + numerator_0), numerator_1 / (numerator_1 + numerator_0)

In [11]:
standard_naive_bayes(X, y, [1, 0, 1, 0])

(0.15585100033179958, 0.8441489996682005)

Naive Bayes does not work if one of the input variables only appears with one class, since the numerator will always become 0 when that variable appears. This is not great, so to combat this behavior, a weight can be added called a Laplace estimator. 

Here is what it looks like when combined with the naive bayes formula:
$$
P(H|E)=\frac{(\frac{c}{f} + P(E|H))P(H)}{P(E)(c)}
$$

Where $c$ is some constant, typically 1, and $f$ is the number of features in the dataset. I am using these letters to represent these variables because I think that they make more sense than using $\mu$ to represent the constant. 

This formula above makes sense if you can suspend your sense of reality for a sec and pretend that $P(E|H)$ is a vector like $P(E_1|H) + P(E_2|H) ... P(E_N|H)$, and the scalar result of $\frac{c}{f}$ is added to each element in that array.

In [32]:
np.random.seed(5)

only_y_feature = (y * (np.random.rand(X.shape[0]) < 0.4)).astype(int)
X_with_made_up_col = np.concatenate([X, only_y_feature.reshape(-1,1)], axis=1)

In [33]:
standard_naive_bayes(X, y, [0, 0, 1, 1])

(0.852423298711321, 0.1475767012886789)

In [34]:
standard_naive_bayes(X_with_made_up_col, y, [0, 0, 1, 1, 1])

(0.0, 1.0)

In [35]:
def laplace_naive_bayes(X, y, sample, c=1):
    sample = np.array(sample).reshape(1,-1)
    X_compare = (X == np.matmul(np.ones((X.shape[0],1)), sample)).astype(int)
    f = sample.shape[1]
    
    numerator_1 = (np.product(c/f + (np.sum(X_compare[y == 1,:], axis=0) / X_compare[y == 1,:].shape[0]))) * (np.sum(y == 1) / y.shape[0])
    numerator_0 = (np.product(c/f + (np.sum(X_compare[y == 0,:], axis=0) / X_compare[y == 0,:].shape[0]))) * (np.sum(y == 0) / y.shape[0])

    return numerator_0 / (numerator_1 + numerator_0), numerator_1 / ((numerator_1 + numerator_0) * c)

In [36]:
laplace_naive_bayes(X, y, [1, 0, 1, 0], 1)

(0.38751561647902244, 0.6124843835209774)

In [37]:
X_with_made_up_col

array([[0, 0, 1, 1, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 1, 0, 1],
       ...,
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 1, 0, 1, 0]])

In [38]:
laplace_naive_bayes(X_with_made_up_col, y, [0, 1, 0, 0, 1])

(0.13450624776624698, 0.8654937522337529)

In [39]:
standard_naive_bayes(X_with_made_up_col, y, [0, 1, 0, 0, 1])

(0.0, 1.0)