<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Bayes rule

---

## Learning Objectives

- Review conditional probability, joint probability and Bayes' rule
- Illustrate the rule with a few examples

<h1>Lesson Guide<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1">Learning Objectives</a></span></li><li><span><a href="#Joint-probability" data-toc-modified-id="Joint-probability-2">Joint probability</a></span></li><li><span><a href="#Conditional-probability" data-toc-modified-id="Conditional-probability-3">Conditional probability</a></span></li><li><span><a href="#Bayes'-Theorem" data-toc-modified-id="Bayes'-Theorem-4">Bayes' Theorem</a></span></li><li><span><a href="#Example" data-toc-modified-id="Example-5">Example</a></span></li><li><span><a href="#Practice" data-toc-modified-id="Practice-6">Practice</a></span></li></ul></div>

## Joint probability

The joint probability gives the probability of two (or more) events occurring together. It is often denoted in one of the following forms

$$ P(A\ \text{and}\ B) =  P(A, B) = P(A \cap B)$$

Think for example of throwing two dice. Then we can ask for the probability of one turning up as 1 and the other one as 6:

$$P(A=1, B=6) = \frac{1}{6}\cdot\frac{1}{6} = \frac{1}{36}$$

Note that here we assumed independence of the events enabling us to simply multiply the individual probabilities.

We could also ask how likely it is that we obtain 1 and 6 no matter which of the dice gives which:

$$P(A=1, B=6)+P(A=6, B=1) = \frac{1}{18}$$

Let's look at a dataset and estimate joint probabilities from the data samples.

In [1]:
import seaborn as sns

In [2]:
data = sns.load_dataset('tips')

In [3]:
data.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [4]:
data.shape

(244, 7)

For later use, let's estimate some of the group probabilities by looking at the occurrences in our sample.

In [5]:
P_male = len(data[(data.sex == 'Male')])/len(data)
P_male

0.6434426229508197

In [6]:
P_nonsmoker = len(data[(data.smoker == 'No')])/len(data)
P_nonsmoker

0.6188524590163934

In [7]:
P_tiplow = len(data[(data.tip < 3.00)])/len(data)
P_tiplow

0.5040983606557377

The joint probability for two discrete variables can be determined by slicing the data appropriately and comparing the number of observations remaining compared to the number of all observations.

In [8]:
P_male_nonsmoker = len(
    data[(data.sex == 'Male') & (data.smoker == 'No')])/len(data)
P_male_nonsmoker

0.3975409836065574

Multiplying the fractions of each of the subsets won't give the same.

In [9]:
P_male * P_nonsmoker

0.39819604944907283

The procedure works in the same way with continuous variables by asking for ranges of values.

In [10]:
P_male_tiplow = len(data[(data.sex == 'Male') & (data.tip < 3.00)])/len(data)
P_male_tiplow

0.3073770491803279

In [11]:
P_male * P_tiplow

0.3243583714055361

## Conditional probability

The conditional probability asks for the probability of an event happening once another event has already occurred. For two events $A$ and $B$, the conditional probability is denoted as $P(A|B)$, pronounced as _the probability of event $A$ occurring given event $B$_.

The conditional probability can be related to the joint probability:

$$ P(A | B) = \frac{P(A, B)}{P(B)} $$


Consider again the two dice with $A=1$ and $B=6$. Then

$$P(A=1| B=6) = \frac{P(A=1, B=6)}{P(B=6)} = \frac{\frac{1}{36}}{\frac{1}{6}} = \frac{6}{36} = \frac{1}{6}$$

The events are independent of each other, so $P(A|B)=P(A)$.
In general, this won't be the case as the data example shows.

In [12]:
cond_male_smoker = P_male_nonsmoker / P_male
cond_male_smoker

0.6178343949044586

In [13]:
cond_smoker_male = P_male_nonsmoker / P_nonsmoker
cond_smoker_male

0.642384105960265

We can relate the probability that either of two events $A$ or $B$ occur to either $A$ or $B$ occurring, or both of them together:

$$
P(A\ \text{or}\ B) = P(A)+P(B)-P(A,B)
$$

In [14]:
P_male_nonsmoker

0.3975409836065574

In [15]:
P_male_or_nonsmoker = len(
    data[(data.sex == 'Male') | (data.smoker == 'No')])/len(data)
P_male_or_nonsmoker

0.8647540983606558

In [16]:
P_nonsmoker + P_male - P_male_nonsmoker

0.8647540983606556

## Bayes' Theorem

In the relation between joint and conditional probability, we could equally have written 

$$ P(A, B) = P(A|B) \; P(B) $$

or

$$ P(A, B) = P(B|A) \; P(A) $$

Equating the two we arrive at

$$  P(A|B) \; P(B) = P(B|A) \; P(A) $$

Rearranging, we obtain Bayes' theorem

$$ P(A|B) = \frac{P(B|A)\;P(A)}{P(B)} $$

Bayes' theorem relates the probability of $A$ given $B$ to the probability of $B$ given $A$. This rule is critical for performing statistical inference, as we'll see shortly.

Let's verify the theorem with the data:

In [17]:
cond_male_smoker

0.6178343949044586

In [18]:
cond_smoker_male * P_nonsmoker / P_male

0.6178343949044586

## Practice

Determine $P({\rm nonsmoker}| {\rm male,\ tip}<3.00)$ by

- directly by slicing on the data
- using Bayes' rule before slicing on the data

In [19]:
len(data[(data.smoker == 'No') & (data.sex == 'Male')
         & (data.tip < 3.00)])/len(data)/P_male_tiplow

0.6533333333333333

In [20]:
male_lowtip = data[(data.sex == 'Male')
         & (data.tip < 3.00)]

len(male_lowtip[male_lowtip.smoker=='No']) / len(male_lowtip)

0.6533333333333333

In [21]:
non_smokers = data[data.smoker == 'No']
smokers = data[data.smoker == 'Yes']

nonsmokers_male_tiplow = len(non_smokers[(non_smokers.sex == 'Male') & (
    non_smokers.tip < 3.00)])/len(non_smokers)
smokers_male_tiplow = len(
    smokers[(smokers.sex == 'Male') & (smokers.tip < 3.00)])/len(smokers)

cond_on_nonsmokers = nonsmokers_male_tiplow*P_nonsmoker
cond_on_smokers = smokers_male_tiplow*(1-P_nonsmoker)

cond_on_nonsmokers/(cond_on_nonsmokers+cond_on_smokers)

0.6533333333333333