#Bayes' Theorem 

It is said that the Reverend Thomas Bayes developed his rule on inverse probability while he was trying to prove the existence of God somewhere around 1740. He came up with a method for calculating the probability of an event occurring given that another event has occurred. Starting out with the prior probability (or believe) $P(A)$, when given a likelihood) $P(B\ |\ A)$ and evidence $P(B)$ we arrive at the posterior probability $P(A\ |\ B)$. Bayes Rule proves to be a powerful tool and is widely used in diverging areas like economics, artificial intelligence, medicine, journalism, military, just to name a few. Most spam filters use Bayes Rule in one way or another. The Bayes' Theorem formula is, posterior = likelihood times prior, over evidence:

$$
P(A\ |\ B)=\frac{P(B\ |\ A)\cdot P(A)}{P(B)}
$$
The practical power of Bayes Rule is that we often can't find the posterior directly, yet we do know the likelihood of the test and $P(A)$.

In [0]:
print("-"*100)

----------------------------------------------------------------------------------------------------






Q1- What is the chance of someone having COPD (a life-threatening lung disease) given he or she is a smoker - $P(A|B)$. This statistic is hard to figure out, but we do know from medical studies the probability of someone being a smoker given that he/she has COPD - $P(B|A)$. We also know $P(B)$ - the probability that a person is a smoker and $P(A)$ - the chance that someone has COPD. The figures below are rough estimations:

$$
P(A)=0.07\ \small{having\ COPD}\\
P(B)=0.18\ \small{smokers}\\
P(B\ |\ A)=0.85\ \small{is\ or\ was\ smoker\ and\ given\ COPD\ diagnosis}
$$


In [0]:
#Q1- Now Answer- What is the probability of someone having COPD given the person is or was a smoker?:

In [0]:
#import packages
import numpy as np
import pandas as pd

In [6]:
# load dataset
df = pd.read_csv('cancer_test_data.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2914 entries, 0 to 2913
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   patient_id   2914 non-null   int64 
 1   test_result  2914 non-null   object
 2   has_cancer   2914 non-null   bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 48.5+ KB


### Given,
### P(A) = Probability that a person has COPD
### P(B) = Probability that a person is a smoker
$$
P(A)=0.07\ \small{having\ COPD}\\
P(B)=0.18\ \small{smokers}\\
P(B\ |\ A)=0.85\ \small{is\ or\ was\ smoker\ and\ given\ COPD\ diagnosis}
$$

### We need to find P(A | B), which is the probablity of someone having COPD given a person is a smoker.
### Therefore,
### P(A | B) = P(B | A) * P(A) / P(B)
##### (Bayes' theorem)

In [8]:
# Given values
Pa=0.07
Pb= 0.18
PBgivenA= 0.85
# printing result
print('Probablity of someone having COPD given a person is a smoker={0:0.3}'.format(PBgivenA*Pa/Pb) )

Probablity of someone having COPD given a person is a smoker=0.331


<a href='https://drive.google.com/open?id=1ygFo91YMrHECMX9g0XKq3flK9XepdHex'> Data Set Link 1</a>   Drive



In [0]:
#Q2- What proportion of patients who tested positive has cancer?

In [9]:
df.head()

Unnamed: 0,patient_id,test_result,has_cancer
0,79452,Negative,False
1,81667,Positive,True
2,76297,Negative,False
3,36593,Negative,False
4,53717,Negative,False


In [21]:
# Let A = Patients who have cancer, B = Patients who tested Positive (given)
# finding all patients who have tested positive (this is B)
total_positive=df[df['test_result']=='Positive']
# finding all patients who have cancer who tested positive (this is A ∩ B)
positive_cancer= total_positive[total_positive['has_cancer']==True]

#calculating probabilities
PAintersectionB= len(positive_cancer)/len(df)
Pb=len(total_positive)/len(df)

#Using the conditional probability formula P(A | B)= P(A ∩ B)/P(B)
print('Proportion of patients who tested positive and have cancer={0:0.3}'.format(PAintersectionB/Pb))

Proportion of patients who tested positive and have cancer=0.343


In [0]:
#Q3- What proportion of patients who tested positive doesn't have cancer?

### Reusing previous values:

In [24]:
# Reusing previous values:
#Therefore due to mutual exclusivity, total probability =1, then required probability = 1 - P(A ∩ B)/P(B)
print('Proportion of patients who tested positive and do not have cancer={0:0.3}'.format(1-PAintersectionB/Pb))

Proportion of patients who tested positive and do not have cancer=0.657


### Conversely, calculating from scratch:

In [25]:
# Let A = Patients who do not have cancer, B = Patients who tested Positive (given)
# finding all patients who have tested positive (this is B)
total_positive=df[df['test_result']=='Positive']
# finding all patients who do not have cancer who tested positive (this is A ∩ B)
positive_nocancer= total_positive[total_positive['has_cancer']==False]

#calculating probabilities
PAintersectionB= len(positive_nocancer)/len(df)
Pb=len(total_positive)/len(df)

#Using the conditional probability formula P(A | B)= P(A ∩ B)/P(B)
print('Proportion of patients who tested positive and do not have cancer={0:0.3}'.format(PAintersectionB/Pb))

Proportion of patients who tested positive and do not have cancer=0.657


In [0]:
#Q4- What proportion of patients who tested negative has cancer?

In [27]:
# Let A = Patients who have cancer, B = Patients who tested Negative (given)
# finding all patients who have tested Negative (this is B)
total_negative=df[df['test_result']=='Negative']
# finding all patients who have cancer who tested negative (this is A ∩ B)
negative_cancer= total_negative[total_negative['has_cancer']==True]

#calculating probabilities
PAintersectionB= len(negative_cancer)/len(df)
Pb=len(total_negative)/len(df)

#Using the conditional probability formula P(A | B)= P(A ∩ B)/P(B)
print('Proportion of patients who tested negative and have cancer={0:0.3}'.format(PAintersectionB/Pb))

Proportion of patients who tested negative and have cancer=0.0138


In [0]:
#Q5- What proportion of patients who tested negative doesn't have cancer?

### Reusing previous values:

In [28]:
# Reusing previous values:
#Therefore due to mutual exclusivity, total probability =1, then required probability = 1 - P(A ∩ B)/P(B)
print('Proportion of patients who tested negative and do not have cancer={0:0.3}'.format(1-PAintersectionB/Pb))

Proportion of patients who tested negative and do not have cancer=0.986


### Conversely, calculating from scratch:

In [30]:
# Let A = Patients who do not have cancer, B = Patients who tested Negative (given)
# finding all patients who have tested Negative (this is B)
total_negative=df[df['test_result']=='Negative']
# finding all patients who do not have cancer who tested negative (this is A ∩ B)
negative_cancer= total_negative[total_negative['has_cancer']==False]

#calculating probabilities
PAintersectionB= len(negative_cancer)/len(df)
Pb=len(total_negative)/len(df)

#Using the conditional probability formula P(A | B)= P(A ∩ B)/P(B)
print('Proportion of patients who tested negative and do not have cancer={0:0.3}'.format(PAintersectionB/Pb))

Proportion of patients who tested negative and do not have cancer=0.986
