# Probabilstic Binary Classifier (Part 1)

## Towards Naive Bayes
So far, ...
- Developed our first simple variational hybrid quantum-classical binary classification algorithm. We used a parameterized quantum circuit (PQC) that measured the quantum state.
- We have not made use of the probabilistic characteristics of quantum systems yet because we were able to construct a classical program that determined the resulting probability of measuring a `0` or `1`.

🥅 Starting with an initial prior probability, we update the resulting probability inside the PQC based on the evidence given by the passenger data.

### Load the Raw Data

In [52]:
import pandas as pd 
train = pd.read_csv('../titanic_data/train.csv')
train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


### Calculate the Probability of Survival

📝 Notes:
- The `.eq(1)` function is chained to filter rows where the 'Survived' column's value equals `1`
- This is already a probabilistic classifier. 
  - Specifically, it is another `predict_death` classifier.
  - This is also an example of a **hypocrite classifier** because it does not consider individual passengers when predicting survival.
  - Despite being a hypocrite classifier, it yields a higher precision than a purely random classifier does.

In [53]:
# List of all survivors
survivors = train[train.Survived.eq(1)]

# Calculate the probability
prob_survival = len(survivors)/len(train)
print('P(Survival) is {:.2f}'.format(prob_survival))

P(Survival) is 0.38


### Calculating the conditional probability of survival

In [54]:
# List all the passengers with a first class ticket
firstclass = train[train.Pclass.eq(1)]
# Find the probability that a randomly sampled first class passenger survives
prob_survival_firstclass = len(firstclass[firstclass.Survived.eq(1)])/len(firstclass)
print('P(Survived|FirstClass) is {:.2f}'.format(prob_survival_firstclass))

# List all the passengers with a second class ticket
secondclass = train[train.Pclass.eq(2)]
# Find the probability that a randomly sampled second class passenger survives
prob_survival_secondclass = len(secondclass[secondclass.Survived.eq(1)])/len(secondclass)
print('P(Survived|SecondClass) is {:.2f}'.format(prob_survival_secondclass))

# List all the passengers with a third class ticket
thirdclass = train[train.Pclass.eq(3)]
# Find the probability that a randomly sampled third class passenger survives
prob_survival_thirdclass = len(thirdclass[thirdclass.Survived.eq(1)])/len(thirdclass)
print('P(Survived|ThirdClass) is {:.2f}'.format(prob_survival_thirdclass))

P(Survived|FirstClass) is 0.63
P(Survived|SecondClass) is 0.47
P(Survived|ThirdClass) is 0.24


In [55]:
# List all the female passengers
female = train[train.Sex.eq("female")]
# Find the probability that a randomly selected female passenger survives
prob_survival_female = len(female[female.Survived.eq(1)])/len(female)
print('P(Survived|Female) is {:.2f}'.format(prob_survival_female))

# List all the male passengers
male = train[train.Sex.eq("male")]
# Find the probability that a randomly selected male passenger survives
prob_survival_male = len(male[male.Survived.eq(1)])/len(male)
print('P(Survived|Male) is {:.2f}'.format(prob_survival_male))

P(Survived|Female) is 0.74
P(Survived|Male) is 0.19


In [56]:
firstclass_female = firstclass[firstclass.Sex.eq("female")]
secondclass_female = secondclass[secondclass.Sex.eq("female")]
thirdclass_female = thirdclass[thirdclass.Sex.eq("female")]

prob_survivla_firstclass_female = len(firstclass_female[firstclass_female.Survived.eq(1)])/len(firstclass_female)
prob_survival_secondclass_female = len(secondclass_female[secondclass_female.Survived.eq(1)])/len(secondclass_female)
prob_survival_thirdclass_female = len(thirdclass_female[thirdclass_female.Survived.eq(1)])/len(thirdclass_female)

print('P(Survival|First Class, Female) is {:.2f}'.format(prob_survivla_firstclass_female))
print('P(Survival|Second Class, Female) is {:.2f}'.format(prob_survival_secondclass_female))
print('P(Survival|Third Class, Female) is {:.2}'.format(prob_survival_thirdclass_female))

P(Survival|First Class, Female) is 0.97
P(Survival|Second Class, Female) is 0.92
P(Survival|Third Class, Female) is 0.5


In [57]:
firstclass_male = firstclass[firstclass.Sex.eq("male")]
secondclass_male = secondclass[secondclass.Sex.eq("male")]
thirdclass_male = thirdclass[thirdclass.Sex.eq("male")]

prob_survival_firstclass_male = len(firstclass_male[firstclass_male.Survived.eq(1)])/len(firstclass_male)
prob_survival_secondclass_male = len(secondclass_male[secondclass_male.Survived.eq(1)])/len(secondclass_male)
prob_survival_thirdclass_male = len(thirdclass_male[thirdclass_male.Survived.eq(1)])/len(thirdclass_male)

print('P(Survival|First Class, Male) is {:.2f}'.format(prob_survival_firstclass_male))
print('P(Survival|Second Class, Male) is {:.2f}'.format(prob_survival_secondclass_male))
print('P(Survival|Third Class, Male) is {:.2}'.format(prob_survival_thirdclass_male))

P(Survival|First Class, Male) is 0.37
P(Survival|Second Class, Male) is 0.16
P(Survival|Third Class, Male) is 0.14


### Counting Passengers

In [58]:
print('There are {} female passengers in the dataset'.format(len(female)))
print('There are {} male passengers in the dataset'.format(len(male)))
print()
print('There are {} passengers with a first class ticket in the dataset'.format(len(firstclass)))
print('There are {} passengers with a second class ticket in the dataset'.format(len(secondclass)))
print('There are {} passengers with a third class ticket in the dataset'.format(len(thirdclass)))


There are 314 female passengers in the dataset
There are 577 male passengers in the dataset

There are 216 passengers with a first class ticket in the dataset
There are 184 passengers with a second class ticket in the dataset
There are 491 passengers with a third class ticket in the dataset


In [59]:
print('There are {} female passengers with a first class ticket in the dataset'.format(len(firstclass_female)))
print('There are {} female passengers with a second class ticket in the dataset'.format(len(secondclass_female)))
print('THere are {} female passengers with a third class ticket in the dataset'.format(len(thirdclass_female)))
print()
print('There are {} surviving first class female passengers in the dataset'.format(len(firstclass_female[firstclass_female.Survived.eq(1)])))
print('There are {} surviving second class female passengers in the dataset'.format(len(secondclass_female[secondclass_female.Survived.eq(1)])))
print('There are {} surviving third class female passengers in the dataset'.format(len(thirdclass_female[thirdclass_female.Survived.eq(1)])))

There are 94 female passengers with a first class ticket in the dataset
There are 76 female passengers with a second class ticket in the dataset
THere are 144 female passengers with a third class ticket in the dataset

There are 91 surviving first class female passengers in the dataset
There are 70 surviving second class female passengers in the dataset
There are 72 surviving third class female passengers in the dataset


In [60]:
print('There are {} male passengers with a first class ticket in the dataset'.format(len(firstclass_male)))
print('There are {} male passengers with a second class ticket in the dataset'.format(len(secondclass_male)))
print('THere are {} male passengers with a third class ticket in the dataset'.format(len(thirdclass_male)))
print()
print('There are {} surviving first class male passengers in the dataset'.format(len(firstclass_male[firstclass_male.Survived.eq(1)])))
print('There are {} surviving second class male passengers in the dataset'.format(len(secondclass_male[secondclass_male.Survived.eq(1)])))
print('There are {} surviving third class male passengers in the dataset'.format(len(thirdclass_male[thirdclass_male.Survived.eq(1)])))

There are 122 male passengers with a first class ticket in the dataset
There are 108 male passengers with a second class ticket in the dataset
THere are 347 male passengers with a third class ticket in the dataset

There are 45 surviving first class male passengers in the dataset
There are 17 surviving second class male passengers in the dataset
There are 47 surviving third class male passengers in the dataset


## Bayes' Theorem

$$ \mathbb{P}(A|B)\mathbb{P}(B) = \mathbb{P}(B|A)\mathbb{P}(A) $$



Calculating the posterior probability of survival given being a female passenger with a second class ticket:
 
$$ \mathbb{P}(\text{Survived}|\text{Second Class, Female}) = \frac{  \mathbb{P}(\text{Second Class}|\text{Survived})    }{ \mathbb{P}(\text{Second Class})  } \cdot 
 
 \frac{  \mathbb{P}(\text{Female}|\text{Survived})    }{ \mathbb{P}(\text{Female})  } \cdot \mathbb{P}(\text{Survived}) 
 $$


In [66]:
#  calculate the backwards probabilities
p_surv_female = len(survivors[survivors.Sex.eq("female")])/len(survivors)
p_surv_male = len(survivors[survivors.Sex.eq("male")])/len(survivors) 

p_surv_firstclass = len(survivors[survivors.Pclass.eq(1)])/len(survivors)
p_surv_secondclass = len(survivors[survivors.Pclass.eq(2)])/len(survivors)
p_surv_thirdclass = len(survivors[survivors.Pclass.eq(3)])/len(survivors)

print('P(Female|Survived) is {:.2}'.format(p_surv_female))
print('P(Male|Survived) is {:.2}'.format(p_surv_male))
print()
print('P(First Class|Survived) is {:.2}'.format(p_surv_firstclass))
print('P(Second Class|Survived) is {:.2}'.format(p_surv_secondclass))
print('P(Third Class|Survived) is {:.2}'.format(p_surv_thirdclass))

P(Female|Survived) is 0.68
P(Male|Survived) is 0.32

P(First Class|Survived) is 0.4
P(Second Class|Survived) is 0.25
P(Third Class|Survived) is 0.35


### Modifier and the informativeness of a survivor being female:

$$mod_{\text{female}}=\frac{N_{\text{survivor,female}}}{\frac{N_{female}}{N_{tot}}} \quad\quad \text{(modifier score)}$$

$$info_\text{female} = |modifier_\text{female}-1| \quad\quad \text{(informativeness)}$$


In [71]:
# calculate the modifier and the informativeness of a survivor being female
mod_female = p_surv_female / (len(female)/len(train))
info_female = abs(mod_female-1)

# calculate the modifier and the informativeness of a survivor being male
mod_male = p_surv_male / (len(male)/len(train))
info_male = abs(mod_male-1)

print('The modifier of being female is {:.2f}. \nThe informativeness is {:.2f}.'.format(mod_female,info_female))
print()
print('The modifier of being male is {:.2f}. \nThe informativeness is {:.2f}.'.format(mod_male,info_male))

The modifier of being female is 1.93. 
The informativeness is 0.93.

The modifier of being male is 0.49. 
The informativeness is 0.51.


In [74]:
# calculate the modifier and the informativeness of a survivor being first class
mod_firstclass = p_surv_firstclass / (len(firstclass)/len(train))
info_firstclass = abs(mod_firstclass-1)

# calculate the modifier and the informativeness of a survivor being second class
mod_secondclass = p_surv_secondclass / (len(secondclass)/len(train))
info_secondclass = abs(mod_secondclass-1)

# calculate the modifier and the informativeness of a survivor being third class
mod_thirdclass = p_surv_thirdclass / (len(thirdclass)/len(train))
info_thirdclass = abs(mod_thirdclass-1)

print('The modifier of being first class is {:.2f}. \nThe informativeness of being first class is {:.2f}.'.format(mod_firstclass,info_firstclass))
print()
print('The modifier of being second class is {:.2f}. \nThe informativeness of being second class is {:.2f}.'.format(mod_secondclass,info_secondclass))
print()
print('The modifier of being third class is {:.2f}. \nThe informativeness of being third class is {:.2f}.'.format(mod_thirdclass,info_thirdclass))

The modifier of being first class is 1.64. 
The informativeness of being first class is 0.64.

The modifier of being second class is 1.23. 
The informativeness of being second class is 0.23.

The modifier of being third class is 0.63. 
The informativeness of being third class is 0.37.
