# Introduction to Bayes' Theorem

In this lesson, we'll learn about Bayes' Theorem. Bayes' Theorem is the basis of a branch of statistics called Bayesian Statistics, where we take prior knowledge into account before calculating new probabilities.

Before we learn this theorem, we'll need to review independence and conditional probability.

By the end of this lesson, you'll be able to solve simple problems involving prior knowledge.

## Independent Events

The ability to determine whether two events are independent is an important skill for statistics.

If two events are independent, then the occurrence of one event does not affect the probability of the other event. Here are some examples of independent events:

* I wear a blue shirt; my co-worker wears a blue shirt
* I take the subway to work; I eat a tuna sandwich for lunch
* The NY Giants win their football game; the NY Rangers win their hockey game

If two events are dependent, then when one event occurs, the probability of the other event occurring changes in a predictable way.

Here are some examples of dependent events:

* It rains on Tuesday; I carry an umbrella on Tuesday
* I eat spaghetti; I have a red stain on my shirt
* I wear sunglasses; I go to the beach

## Conditional Probability

Conditional probability is the probability that two events happen. It's easiest to calculate conditional probability when the two events are independent.

Note: For the rest of this lesson, we'll be using the statistical convention that the probability of an event is written as P(event).

If the probability of Event A is P(A) and the probability of Event B is P(B) and the two events are independent, then the probability of both events occurring is the product of the probabilities:

`P(A ∩ B) = P(A) × P(B)`

(The symbol ∩ just means "and", so P(A ∩ B) means probability that both A and B happen)

For instance, suppose we are rolling a pair of dice, and want to know the probability of rolling two sixes. Each die has six sides, so the probability of rolling a six is 1/6. Each die is independent (i.e., rolling one six does not increase or decrease our chance of rolling a second six), so

`P(6 ∩ 6) = P(6) × P(6) = 1/6 × 1/6 = 1/36`

## Testing for a Rare Disease

Suppose you are a doctor and you need to test if a patient has a certain rare disease. The test is very accurate: it's correct 99% of the time. The disease is very rare: only 1 in 100,000 patients have it.

You administer the test and it comes back positive, so your patient must have the disease, right?

Not necessarily. If we just consider the test, there is only a 1% chance that it is wrong, but we actually have more information: we know how rare the disease is.

Given that the test came back positive, there are two possibilities:

* The patient had the disease, and the test correctly diagnosed the disease
* The patient didn't have the disease and the test incorrectly diagnosed that they had the disease.

What is the probability that the patient had the disease and the test correctly diagnosed the disease?

Save your answer to the variable p_disease_and_correct.

In [1]:
p_disease_and_correct = 0.99 * 1/100000
p_no_disease_and_incorrect = 0.01 * 99999/100000
print(p_disease_and_correct)
print(p_no_disease_and_incorrect)

9.9e-06
0.0099999


## Bayes Theorem

In the previous exercise, we determined two probabilities:

* The patient had the disease, and the test correctly diagnosed the disease ≈ 0.00001
* The patient didn't have the disease and the test incorrectly diagnosed that they had the disease ≈ 0.01

Both events are rare, but we can see that it was about 1,000 times more likely than the test was incorrect than that the patient had this rare disease.

We're able to come to this conclusion because we had more information than just the accuracy of the test; we also knew the prevalence of this disease. That extra information about how we expect the world to work is called a prior.

When we only use the first piece of information (the result of the test), it's called a Frequentist Approach to statistics. When we incorporate our prior, it's called a Bayesian Approach.

In statistics, if we have two events (A and B), we write the probability that event A will happen, given that event B already happened as P(A|B). In our example, we want to find P(rare disease | positive result).

We can calculate P(A|B) using Bayes' Theorem, which states:

![](formula_1_black.svg)

So in this case, we'd say: 

![](formula_2_black.svg)

Calculate P(positive result | rare disease ), or the probability of a positive test result, given that a patient really has this rare disease. Save your answer (as a decimal) to p_positive_given_disease.

In [3]:
p_positive = 0.99
p_rare = 1/100000

p_positive_given_disease = ((p_positive * p_rare) * p_rare) / p_positive

print(p_positive_given_disease)

1e-10


What is P(rare disease), the probability that a randomly selected patient has the rare disease? Save your answer to p_disease.

In [7]:
p_disease = 1/100000
p_disease

1e-05

As we discussed previously, there are two ways to get a positive result:

* The patient had the disease, and the test correctly diagnosed the disease
* The patient didn't have the disease and the test incorrectly diagnosed that they had the disease.

Using these two probabilities, calculate the total probability that a randomly selected patient receives a positive test result, P(positive result). Save your answer to the variable p_positive.

In [6]:
p_positive = 0.99 + 1/100000
p_positive

0.99001

## Spam Filters

Let's explore a different example. Email Spam filters use Bayes Theorem to determine if certain words indicate that an email is spam.

Let's a take word that often appears in spam: "enhancement". With just 3 facts, we can make some preliminary steps towards a good spam filter:

* "enhancement" appears in just 0.1% of non-spam emails
* "enhancement" appears in 5% of spam emails
* Spam emails make up about 20% of total emails

Given that an email contains "enhancement", what is the probability that the email is spam?

In this example, we are dealing with two probabilities:

* P(enhancement) — the probability that the word "enhancement" appears in an email
* P(spam) — the probability that an email is spam.

Using Bayes' Theorem to answer our question means that we want to calculate P(A|B). But what are A and B referring to in this case?

In [8]:
p_spam = 0.2
p_enhancement_given_spam = 0.05
p_enhancement = 0.05 * 0.2 + 0.001 * (1 - 0.2)
p_spam_enhancement = p_enhancement_given_spam * p_spam / p_enhancement

print(p_spam_enhancement)

0.9259259259259259


Review

In this course, we learned several new definitions:

* Two events are independent if the occurrence of one event does not affect the probability of the second event
* If two events are independent then,
* P(A ∩ B) = P(A) × P(B)
* A prior is an additional piece of information that tells us how likely an event is
* A frequentist approach to statistics does not incorporate a prior
* A Bayesian approach to statistics incorporates prior knowledge
* Bayes' Theorem is:

![](bayes.png)