- Assume there is a diagnotic druc for detecting a certain cancer
- It is 99% sensitive and 99% specific
- 0.5% of the population has this cancer
- *What is the probability that a randomly selected person from the population has this cancer given that their test result comes back positive?*

____

- Since the test is 99% sensitive, it means that if we tested 100 people that do in fact have the cancer, the test would come back positive for 99 of them
- Since the test is 99% specific, it means that the test results are accurate 99% of the time

- So, we want to solve:

$$
P(\text{Sick} | +) = \frac{P(\text{Sick} \cap +)}{P(+)}
$$

- Let's separate these probabilities

- First, $\text{Sick} \cap +$ means that someone is actually sick, and their test result came back positive
    - The probability of being sick is 0.005, and the probability of the test result being accurate is 0.99 therefore:

$$
P(\text{Sick} \cap +) = 0.005 \cdot 0.99 = 0.00495
$$

- $+$ simply means someone's test results came back positive
    - There are two ways this can happen
        1. Person actually is sick and gets a positive test result
            - This is just $(\text{Sick} | +)$ so it has probability $0.005\cdot0.99$
        2. Person isn't really sick, but they got a false positive test result
            - Since 99.5% of the population isn't sick, and there's a 1% chance of the test getting the wrong diagnosis, then $P(\text{NOT Sick} | +) = 0.995\cdot0.01$

- Combining these, we get $P(+) = 0.005\cdot0.99 + 0.995\cdot0.01 = 0.0149$

- Therefore

$$
P(\text{Sick}|+)  \frac{0.00495}{0.0149} \approx 1/3
$$

- So, if we get a positive test result, the likelihood that we're acutally sick is only 1/3

- Let's simulate some tests to confirm our results

In [5]:
import numpy as np

In [16]:
N_PATIENTS = 10000000

- With 10 million patients, we'd expect about 149k to get positive test results and only around 49.6k of that subset to actually be sick

In [18]:
p = 0.005
array_cancer = np.random.binomial(1, p, size=N_PATIENTS)

In [19]:
array_correct_test_results = np.random.binomial(1, 0.99, size=N_PATIENTS)
array_incorrect_test_results = 1 - array_correct_test_results

In [20]:
array_results = array_cancer * array_correct_test_results + (1 - array_cancer) * array_incorrect_test_results

In [21]:
np.sum(array_results)

148846

- So, we have 148.8k people with positive test results

In [22]:
np.sum(array_results * array_cancer)

49467

- And about 49.46k actually have cancer
    - So our simulated results are pretty close

In [23]:
49467/148846

0.3323367776090725