In [1]:
# Slides for Probability and Statistics module, 2016-2017
# Matt Watkins, University of Lincoln

## Summary of conditional probabilities

- Definition of conditional probabilities $$
P(E \mid F) = \frac{P(E \cap F)}{P(F)}
$$
- Multiplicative rule $$
P(E_1 E_2 E_3 \dots E_n) = P(E_1) \frac{P(E_1 \cap E_2)}{P(E_1)} \frac{P(E_1 \cap E_2 \cap E_3)}{P(E_1 \cap P(E_2)} \cdots \frac{P(E_1 \cap E_2 \cdots \cap E_n)}{P(E_1 \cap E_2 \cdots \cap E_{n-1})}
$$
- Law of total probability $$
P(A) = \sum_{i=1}^n P(A \cap E_i) = \sum_{i=1}^n P(A \mid E_i) P(E_i)
$$

- Bayes' Formula $$
P(E_i \mid A) = \frac{P(A \cap E_i)}{P(A)} = \frac{P(A \mid E_i) P(E_i)}{\sum_{j=1}^n P(A \mid E_j) P(E_j)}
$$


# Independent events


<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 
8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">
$\textbf{Definition}$ Two experiments are *independent* if the result of one can not in any way affect the possible results of the other.<br><br>

$\textbf{Definition}$ Two events ($E, F$) are *independent* if the probability that one of them occurs is in no way influenced by whether or not the other has occurred. 
<br>
So  

\begin{align}
P(E) = P(E \mid F) = P(E \mid \bar{F}), \\
P(F) = P(F \mid E) = P(F \mid \bar{E}).
\end{align}

put in a different way this means that

$$
P(E \cap F) = P(E)P(F)
$$

the probability of $E$ and $F$ occurring is just the product of the probability of $E$ occuring and the probability of $F$ occurring.
</div>

![](../Images/probmean3.jpg)


<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 
8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">
**Theorem** Two events ($E, F$) are *independent* if and only if

$$
P(E \cap F) = P(E)P(F)
$$
</div><br>
For more than two events things become a bit more restrictive.
<br><br>
<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 
8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">

**Theorem** The events $E_1, E_2, E_3, \cdots E_n$ are said to be mutually independent if for every subset $E_1', E_2', E_3', \cdots E_r', r \leq n$ of the events

$$
P(E_1'\cap E_2' \cap E_3' \cap \cdots \cap E_r') = P(E_1') \cdot P(E_2') \cdot P(E_3') \cdots P( E_r')
$$
</div>

The idea of independent processes will be extremely important as we move forward. 

We will use the idea that repetitions of experiments constitute independent processes very often. Note that this is more or less an assumption of the frequentist definition of probability.

### Example

consider the compound experiment of throwing two fair coins.

The sample space is $S = \{(H,H),(H,T),(T,H),(T,T)\}$.

Define two events 

$A = \textrm{'the first coin is a head'} = \{(H,H),(H,T)\}$

$B = \textrm{'the second coin is a tail'} = \{(H,T),(T,T)\}$

And $P(A) = |A|/|S| = 2 /4  = 1/2$ and $P(B) = |B|/|S| = 2/4 = 1/2$.

Now $P(A \cap B) = P({H,T}) = |A \cap B|/|S| = 1/4 = P(A) * P(B)$ so $A$ and $B$ are independent.



But consider

$C = \textrm{'both coins are heads'} = \{(H,H)\}$

$P(C) = |C| / |S| = 1/4$

now $P(B \cap C) = P(\emptyset) = |B \cap C|/|S| = 0/4 \neq P(A) * P(B)$, so $B$ and $C$ are not independent.

$A$ is also not independent of $C$.

In general, it is possible for all pairs of events to be independent, but the complete set of events not to be.

# Tabular presentation of conditional probabilities

It can sometimes be handy to view conditional probabilities using a tabular representation of relative frequencies - this can also be how real data arrives to us.

**Example** If we go back to the bus problem we did before the break, but instead we consider what we'd expect if 1000 buses in total ran through the town, we'd end up with something like

| |A |B |C |total |
|-|-|-|-|-|
|Late |250|50|25|325|
|Not late |250|200|225|675|
|Total |500|250|250|1000|

Let $L = \textrm{'a bus is late'}$

The conditional probabilities can easily be read off, for instance $P(B \mid L) = P(B \cap L)/ P(L) = \frac{50}{325} = \frac{2}{13}$. 

This is of course exactly equivalent to our pen and paper solution.

The values along the margins are called the marginal probabilities. They give the straight forward probabilities. 

This is because of the law of total probability.

$$
P(A) = \sum_{i=1}^n P(A \cap E_i) = \sum_{i=1}^n P(A \mid E_i) P(E_i)
$$

In this case 

$$P(L) = P(L \cap A) + P(L \cap B) + P(L \cap C) = 250 + 50 +25 = 325$$

or vertically we have

$$P(A) = P(A \cap L) + P(A \cap \bar{L})$$

note that that last relationship is a useful and general one. It is a special case of the law of total probability.


### Tabular Bayes' Theorem example

A doctor is trying to decide if a patient has one of three diseases d1, d2, or d3. 

Two tests are to be carried out, each of which results in a positive (+) or a negative (−) outcome. 

There are four possible test patterns ++, +−, −+, and −−. 

National records have indicated that, for 10,000 people having one of these three diseases, the distribution of diseases and test results are as in the table below.

|Disease|number| + +| + –| – +| – –|
|-|-|-|-|-|-|
|d1 |3215| 2110| 301| 704| 100|
|d2 |2125| 396| 132| 1187| 410|
|d3 |4660| 510| 3568 |73| 509|
|Total |10000||||||

We can use this data to estimate $P(d_1),P(d_2),P(d_3)$ - these are called prior probabilities, and the conditional probabilities like $P(+- \mid d_1) = \frac{301}{3215}=0.094$.



What the doctor wants though is the probability a patient has disease $d_i$ given the results of the tests. These are the Bayes' or inverse, or posterior probabilities. 

We can compute them using Bayes' formula and we'll get results like

||$d_1$|$d_2$|$d_3$|
|-|-|-|-|
|+ +| .700| .131| .169|
|+ –| .075| .033| .892|
|– +| .358| .604| .038|
|– –| .098| .403| .499|

these are $P(d_i \mid ++)$ etc. Judicious use of these posterior probabilities can be used to inform decision making: 

In this case the prior probability of a patient having disease $d_1$ was $\frac{3215}{10000} = 0.3215$. 

If the test result came back ++ then the posterior probability $P(d_1 \mid ++) = 0.700$ and we'd suspect that $d_1$ was the culprit.

### Updating information sequentially

In the last example we updated the likelihood of a patient having a particular disease based on extra information in the form of the test results.

The previous example with the medical data should give a hint how we can update our ideas about a system as new information comes in - this is why the original probabilities are call priors, and the reversed conditional probabilities are called posterior probabilities. 

We'll use this type of method in the computing lab later as a simple form of machine learning.

# Continuous Conditional Probability

we've stayed focussed upon discrete probability distributions. 

However similar observations also apply to the case of continuous probabilities. 

### Continous probability spaces

Consider a spinner - schematically a circle _of unit circumference_ and a pointer

![](../Images/spinner.jpg)

this could end up being a model for a [Roulette wheel](https://en.wikipedia.org/wiki/Roulette), for instance. 

If we give the spinner a whirl, the pointer will be pointing somewhere a distance $x$ along the circumference. 

It seems reasonable that every value $0 \leq x \lt 1$ of the distance between the pointer and the mark on the spinner is equally likely to occur. This means that the sample space is the interval  $S = [0,1)$. 



We can satisfy

$$
P\left( a\leq X \lt b \right) = b - a
$$

for every $a$ and $b$ for the event $E = [a,b]$ by a formula of the form

$$
P(E) = \int_{E} f(x) \mathrm{d}x,
$$

and $f(x)$ is the constant function with value 1. 

We call $f(x)$ the _density function_ of $X$. 

This is the generalisation of the discrete case we saw earlier:

$$
P(E) = \sum_{i \in E} P(i).
$$

We want a probability model where every value of the sample space is equally likely (we'll call the result of a spin $X$ for now, later we'll see that this is a _continuous random variable_).

In a similar way to the discrete case we must have

$$
P\left( 0\leq X \lt 1 \right) = 1.
$$

It is also the case that we expect the probability of a reading in the top half of the spinner is equal in likelihood to one in the lower half,

$$
P\left( 0\leq X \lt \frac{1}{2} \right) = P\left( \frac{1}{2} \leq X \lt 1 \right) = \frac{1}{2}.
$$

More generally, if we consider an event, $E = \{[a,b]\} $, we'd like

$$
P\left( a\leq X \lt b \right) = b - a
$$


for every $a$ and $b$.

## Conditional continuous probabilities

If we look at a process that has a density function $f(x)$, and if $E$ is an event. We define a conditional density function by

$$
f(x \mid E) = \Bigg \{ \begin{array}{ll}
f(x)/P(E) & \mbox{if  $x \in E$},\\
0 & \mbox{if  $x \notin E$}
\end{array} 
$$

Then for any event $F$, we have

$$
P(F \mid E) = \int_F f(x \mid E)\ \mathrm{d}x.
$$

We call this the conditional probability of $F$ given $E$. A little manipulation makes the connection to the discrete case:

$$
P(F \mid E) = \int_F f(x \mid E)\ \mathrm{d}x = \int_{E \cap F} \frac{f(x)}{P(E)}\mathrm{d}x = \frac{P(E \cap F)}{P(E)}$$

Definition of conditional probabilities $$
P(E \mid F) = \frac{P(E \cap F)}{P(F)}
$$

### Example of conditional continuous probability distribution

In the spinner experiment, suppose we know that the spinner has stopped with head in the upper half of the circle, $0 \leq x \leq 1/2$. What is the probability that $1/6 \leq x \leq 1/3$? 

Here $E = \{[0, 1/2]\}, F = \{[1/6, 1/3]\}$

Also we note that $F \cap E = F$. 

Hence$$
P(F \mid E) = P(F \cap E)P(E)=\frac{\frac{1}{6}}{\frac{1}{2}}=\frac{1}{3}
$$, which is reasonable, since $F$ is $1/3$ the size of $E$. 

The conditional density function here is given by 
$$f(x \mid E) = \Bigg \{ \begin{array}{ll}
2, & \mbox{if  0 ≤ x < 1/2},\\
0, & \mbox{if 1/2 ≤ x < 1}.
\end{array}
$$

Thus the conditional density function is nonzero only on $[0, 1/2]$, and is uniform there. 

## Summary

- Definition of conditional probabilities $$
P(E \mid F) = \frac{P(E \cap F)}{P(F)}
$$
- Law of total probability $$
P(A) = \sum_{i=1}^n P(A \cap E_i) = \sum_{i=1}^n P(A \mid E_i) P(E_i)
$$

- Bayes' Formula $$
P(E_i \mid A) = \frac{P(A \cap E_i)}{P(A)} = \frac{P(A \mid E_i) P(E_i)}{\sum_{j=1}^n P(A \mid E_j) P(E_j)}
$$

- Two events ($E, F$) are *independent* if and only if

$$
P(E \cap F) = P(E)P(F)
$$

- The events $E_1, E_2, E_3, \cdots E_n$ are said to be mutually independent if for every subset $E_1', E_2', E_3', \cdots E_r', r \leq n$ of the events

$$
P(E_1'\cap E_2' \cap E_3' \cap \cdots \cap E_r') = P(E_1') \cdot P(E_2') \cdot P(E_3') \cdots P( E_r')
$$
