# Application of Bayes Theorem to breast cancer screening

## Induction and deduction

**Induction**: Evidence to hypothesis; generalising from observation.  Example of an inductive argument: "All ravens I have seen are black. Therefore ravens are black."

**Deduction**: Hypothesis to conclusion; applying what is known. Example of a deductive argument: "All noble gasses are stable. Neon is a noble gas. Therefore neon is stable." Multiple hypotheses may be generated (by induction), and their conclusions checked against the facts; "When you have eliminated the impossible, whatever remains, however improbable, must be the truth" (Sherlock Holmes).

## Probability and likelihood

See: https://youtu.be/pYxNSUDSFH4

Probability and likelihood are often used interchangably, but strictly:
 
* *Probability*: Given a fixed distribution, what is the probability of a given observation. Probabilities are areas under a fixed distribution: $pr(data|distribution)$.
 
* *Likelihood*: Given a fixed observation, what is the likelihood of seeing that observation with the given distribution? $L(distribution|data)$.

 
## Bayes theorem

$P(H|E) = \frac{P(E|H)\times P(H)}{P(E)}$

* $P(H|E)$: Probability of hypothesis, H, is true given evidence, E (induction).
* $P(E|H)$: Probability of seeing evidence, E, if hypothesis, H, is true (deduction).
* $P(H)$: *Prior* Probability of hypothesis, H, being true, without observing evidence, E.
* $P(E)$: Probability of seeing evidence, E, considering hypothesis, H, may be true or not.

or:

$P(H|E) = \frac{P(E|H)}{P(E)} P(H)$

* $\frac{P(E|H)}{P(E)}$: *Likelyhood ratio* - How many times more likely is it that we'll see E, if H is true? $\frac{P(E|H)}{P(E|H)+P(E|¬H)}$


## Breast cancer screening

Let us consider the application of breat cancer screening to a 4 0 year old woman.

Data we have avilable to us:

* 1/700 40 year old women will develop cancer in the next year.
* Mamogram screening has 73% sensitivity and 88% sensitivity:
    * 73% of cancers are detected
    * Of those without cancer 12% will return a false positive



Our hypothesis, H, is that the woman has cancer.

* P(E|H) = probability of a positive test if H is true = 0.73
* P(H) = prior probability of having cancer without evidence (test) = 1/700 = 0.001429
* P(E):
    * = probability of a positive test if H is true (= 0.73) + probability of a positive test if H is false (0.12), but weighted by population of cancer:
    * = (0.73 * 1/700) + (0.12 * 699/700) = 0.12087

In [1]:
P_E_given_H = 0.73
P_H = 1/700
P_E = (0.73 * 1/700) + (0.12 * 699/700)

P_H_given_E = (P_E_given_H * P_H) / P_E

print(f'Probability of having cancer: {P_H_given_E:0.4f}')

Probability of having cancer: 0.0086


Or, if likelyhood ration is calculated (or given):

In [2]:
P_E_given_H = 0.73
P_E_given_not_H = 0.12
likelihood_ratio =  P_E_given_H / ((P_E_given_H  * 1/700) + (P_E_given_not_H * 699/700))
print(f'Likelihood ratio: {likelihood_ratio:.2f}')

P_H_given_E = likelihood_ratio * P_H
print(f'Probability of having cancer: {P_H_given_E:0.4f}')


Likelihood ratio: 6.04
Probability of having cancer: 0.0086


This is why they don't use mamograms on 40 year old women.....