# Conditional Probability & Bayes Rule Quiz

In [7]:
# load dataset
import pandas as pd

df = pd.read_csv('./data/cancer_test_data.csv')
df.head()

Unnamed: 0,patient_id,test_result,has_cancer
0,79452,Negative,False
1,81667,Positive,True
2,76297,Negative,False
3,36593,Negative,False
4,53717,Negative,False


### Notes on Bayes Rule

In this dataset the following is true:

* Patients with cancer: 0.105
* Patients without cancer: 0.895
* Patients with cancer who tested positive: 0.905
* Patients with cancer who tested negative: 0.095
* Patients without cancer who tested positive: 0.204
* Patients without cancer who tested negative: 0.796

Based on the above proportions observed in the data, we can assume the following probabilities.

* `P(cancer)` = 0.105	Probability a patient has cancer
* `P(~cancer)` = 0.895	Probability a patient does not have cancer
* `P(positive|cancer)` = 0.905	Probability a patient with cancer tests positive
* `P(negative|cancer)` = 0.095	Probability a patient with cancer tests negative
* `P(positive|~cancer)` = 0.204	Probability a patient without cancer tests positive
* `P(negative|~cancer)` = 0.796	Probability a patient without cancer tests negative

So for example, using Bayes Rule, to calculate:

Probability a patient who tested positive has cancer, or `P(cancer|positive)`, then the formula is as follows:

`P(cancer|positive)` = `P(cancer) * P(positive|cancer) / P(positve)`

We already have the values for `P(cancer)` and `P(positive|cancer)` which we can easily plug into the above formula like so:

`P(cancer|positive)` = 0.105 * 0.905 / `P(positve)`

So how then to calulate `P(positive)`?

Well it is the sum of the probabily of outcomes of actually getting a positive result (i.e. `P(positive|~cancer)` and `P(positive|cancer)`) both multiplied by the pobability of the precondition being true, so:

```
P(positive) = (P(cancer) * P(positive|cancer)) + (P(~cancer) * P(positive|~cancer))
P(positive) = (0.105 * 0.905) + (0.895 * 0.204)
P(positive) = (0.105 * 0.905) + (0.895 * 0.204)
P(positive) = 0.277605
```

Plug that into the formula to get the final result for `P(cancer|positive)`

```
P(cancer|positive) = 0.105 * 0.905 / 0.277605
P(cancer|positive) = 0.342
```

In [2]:
# What proportion of patients who tested positive has cancer?
# This should confirm our calculation above for P(cancer|positive) using Bayes rule:
df.query('test_result == "Positive"')['has_cancer'].mean()

0.34282178217821785

In [3]:
# What proportion of patients who tested positive doesn't have cancer?
1 - df.query('test_result == "Positive"')['has_cancer'].mean()

0.6571782178217822

In [4]:
# What proportion of patients who tested negative has cancer?
df.query('test_result == "Negative"')['has_cancer'].mean()

0.013770180436847104

In [5]:
# What proportion of patients who tested negative doesn't have cancer?
1 - df.query('test_result == "Negative"')['has_cancer'].mean()

0.9862298195631529