# Bayes' Theorem

- Bayes' theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It follows from the definition of conditional probability: 
  $$P(A\vert B)={\frac {P(A)P(B\vert A)}{P(B)}}$$
- In plain English, this means that the probability of A happening, given that B has happened, is equal to the probability of A happening times the probability of B happening given that A has happened, divided by the probability of B happening.
- The key insight is that the probability of something A that depends on B depends
  very much on the base probability of B and A, and the relationship between them. People ignore this all the time.


# Bayes’ Theorem to the rescue

Drug testing is a common example where we can apply Bayes' theorem to calculate conditional probabilities. Even a “highly accurate” drug test can produce more false positives than true positives.

Let’s say we have a drug test that can accurately identify users of a drug 99% of the time, and accurately has a negative result for 99% of non-users. But only 0.3% of the overall population actually uses this drug.

We want to find out what is the probability of someone being an actual user of the drug given that they tested positive for the drug.

We can use the following notation:

- Event A = Is a user of the drug
- Event B = Tested positively for the drug
- P(A) = The probability of being a user of the drug
- P(B) = The probability of testing positive for the drug
- P(A|B) = The probability of being a user of the drug given that they tested positive for the drug
- P(B|A) = The probability of testing positive for the drug given that they are a user of the drug

According to Bayes' theorem, we can calculate P(A|B) as follows:

$
P(A|B) = \frac{P(A)P(B|A)}{P(B)}
$

We know that:

- P(A) = 0.003 (the proportion of users in the population)
- P(B|A) = 0.99 (the accuracy rate of the test for users)
- P(B) = 0.99 * 0.003 + 0.01 * 0.997 = 0.01294 (the probability of testing positive if you do use, plus the probability of testing positive if you don’t)

Plugging these values into the formula, we get:

$
P(A|B) = \frac{0.003 \times 0.99}{0.99 \times 0.003 + 0.01 \times 0.997} = \frac{0.00297}{0.013}= 0.228
$

So the odds of someone being an actual user of the drug given that they tested positive is only **22.8%**!

This means that even though P(B|A) is high (99%), it doesn’t mean P(A|B) is high. In other words, a positive test result does not strongly imply that the person is a user of the drug.

To illustrate this, we can use a table to show the possible outcomes of the test and the number of people in each category, assuming a population of 100,000 people.

|Test Result \ Wether use drug | User | Non-user | Total |
| --- | --- | --- | --- |
| Positive | 297 | 9970 | 10267 |
| Negative | 3 | 89730 | 89733 |
| Total | 300 | 98700 | 100000 |

We can see that out of the 10,267 people who tested positive, only 297 are actual users of the drug, while 9,970 are false positives. This shows how a low prevalence of the drug in the population and a high false positive rate can affect the reliability of the test.

Here is an example scenario to help you understand this better:

Suppose you are a doctor and you have a patient who tested positive for the drug. Based on the information above, what would you do?

- You would not immediately conclude that the patient is a user of the drug, because there is only a **22.8%** chance that they are.
- You would consider other factors, such as the patient's medical history, symptoms, risk factors, etc., to assess their likelihood of being a user of the drug.
- You would perform a confirmatory test or use a different method to verify the result, such as a blood test or a hair analysis.
- You would explain to the patient that the test result is not conclusive and that they need further testing to confirm or rule out the possibility of being a user of the drug.

## Why P(B) = 0.99 * 0.003 + 0.01 * 0.997?

P(B) is the probability of testing positive for the drug, regardless of whether the person is a user or a non-user. There are two ways to test positive for the drug:

- The person is a user of the drug and the test correctly identifies them as such. This happens with a probability of P(A) * P(B|A), which is **0.003 * 0.99** = **0.00297**.
- The person is not a user of the drug and the test falsely identifies them as such. This happens with a probability of (1 - P(A)) * (1 - P(B|A)), which is **0.997 * 0.01** = **0.00997**.

Therefore, the total probability of testing positive for the drug is the sum of these two probabilities, which is **0.00297 + 0.00997** = **0.01294**, or approximately **0.013**. That's why P(B) = 0.99 * 0.003 + 0.01 * 0.997.