# Solutions

## Bayes Theorem

**Question 1.** We let $CF$ denote the event that an individual has cystic fibrosis, and $+ve$ denote the event that an individual receives a positive test result.

We are told that $P(CF)=0.0325$. 

The sensitivity of the sweat chloride test is 91.7\%, i.e. $P(+ve | CF) = 0.917.$ And the specificity is 99.9\%, so $P(-ve|\bar{CF})=0.999$.

**1.1** The positive predictive value is the probability that an individual has the disease (cystic fibrosis) given that the individual has received a positive test, $P(CF | +ve)$. It is calculated, using Bayes theorem, as

$$
P(CF | +ve) = \frac{P(+ve | CF)P(CF)}{P(+ve | CF)P(CF) + P(+ve | \bar{CF})P(\bar{CF})}
$$

Putting in the values above gives

$$
P(CF | +ve) = \frac{0.917 \times 0.0325}{0.917 \times 0.0325 + (1-0.999) \times (1 - 0.0325)} = 0.968
$$

**1.2** The PPV is 96.8\%, meaning that a patient whose sweat test comes back positive has a 96.8\%
chance of having cystic fibrosis.

**1.3** We are now told that in the population $P(CF) = 0.0004$. Then the PPV is 

$$
P(CF | +ve) = \frac{0.917 \times 0.0004}{0.917 \times 0.0004 + (1-0.999) \times (1 - 0.0004)} = 0.268
$$

If we screened everyone in the general population using this test, 26.8\% of those who had a positive
test would actually have cystic fibrosis. This is probably not, therefore, a very useful test in this
setting. (It would also be very expensive and logistically difficult!).

**Question 2** [Optional]

**2.1**

We will use $S$ to denote the event that a patient has sepsis and $\bar{S}$ that the patient does not. We will indicate the classification by $C_{+ve}$ and $C_{-ve}$. 

We are told that 
- $P(S) = 0.01$
- $P(C_{+ve} | S) = 0.7$
- $P(C_{+ve} \cap S) + P(C_{-ve} \cup \bar{S}) = 0.8$

We want to find:
- $P(C_{+ve} \cap S)$
- $P(C_{-ve} \cap S)$
- $P(C_{+ve} \cap \bar{S})$
- $P(C_{-ve} \cap \bar{S})$

First, we note that

$$
P(C_{+ve} \cap S) = P(C_{+ve} | S) P(S)
$$

by the usual multiplicative rule. Thus,

$$
P(C_{+ve} | S) = P(C_{+ve} \cap S) P(S) = 0.7 \times 0.01 = 0.007
$$

Then by the theorem of total probability, noting that $\{C_{+ve}, C_{-ve} \}$ partitions the sample space,

$$
P(S) = P(S | C_{+ve}) P(C_{+ve}) + P(S | C_{-ve}) P(C_{-ve}) = P(S \cap C_{+ve}) + P(S \cap  C_{-ve})
$$

Rearranging for $P(S \cap  C_{-ve})$ gives

$$
P(S \cap  C_{-ve}) = P(S) - P(S \cap C_{+ve})  = 0.01 - 0.007 = 0.003
$$

Then we can use the total accuracy to obtain $P(C_{-ve} \cup \bar{S})$, since

$$
P(C_{+ve} \cap S) + P(C_{-ve} \cap \bar{S}) = 0.8$
$$

which gives

$$
P(C_{-ve} \cap \bar{S}) = 0.8 - 0.007 = 0.793
$$

And finally, 

$$
P(\bar{S}) = P(\bar{S} \cap C_{+ve}) + P(\bar{S} \cap C_{-ve})
$$

giving

$$
P(\bar{S} \cap C_{+ve}) = P(\bar{S}) - P(\bar{S} \cap C_{-ve}) = (1-0.01) -  0.793 = .197
$$


So we can fill the confusion matrix in with joint probabilities, as follows:


| Prediction  |  Truth: | Positive | Negative |
|:---- | ---- | :-: | :-: |
| Positive |  | 0.007 | 0.197 |
| Negative |  | 0.003 | 0.793 |


**2.2** 

Among 1000 patients, we would expect the following numbers to occur in each cell (where we get each number by multiplying the joint probability in the table above by 1000):


| Prediction  |  Truth: | Positive | Negative |
|:---- | ---- | :-: | :-: |
| Positive |  | 7 | 197 |
| Negative |  | 3 | 793 |

So we would expect to miss 3 cases of sepsis and incorrectly flag 197 patients as having sepsis when they do not. This seems a reasonable balance, since there are substantial consequences of missing cases of sepsis (likely death), but much less severe consequences of assessing and potentially treating extra patients.


## The binomial distribution

**Question 3.** A study of miscarriage among 70 women each of whom had 3 pregnancies.

**3.1** We are told that the study included 210 pregnancies, from which 60 resulted in miscarriage. Thus $60/210 = 0.2857$, i.e. 28.6\% of pregnancies, resulted in miscarriage. 

**3.2** Let $X$ be the number of miscarriages in three pregnancies for a single woman. We can model $X$ as a binomial variable if we are prepared to assume that 
-  the probability of a miscarriage in one pregnancy is independent of the outcomes of the other pregnancies (from the same woman) 
- the probability of miscarriage is constant (esp. between women)

Then the study consists of $N=70$ observations from a binomial variable $X \sim binomial(3, 0.286)$. 
 

**3.3** Using the standard binomial PDF, with $n=3$ and $\pi=0.286$, we have

$$
P(X=x) = \begin{pmatrix}3 \\ x \end{pmatrix} 0.286^{x} (1-0.286)^{3-x}, \mbox{ for } x=0,1,2, 3.
$$

Plugging in individual values of x gives:

| $x$  |  P(X=$x$) |
|---- | ---- |
| 0 | 0.364 |
| 1 | 0.437 |
| 2 | 0.175 |
| 3 | 0.023 |

We can obtain the same information from R using the code below:

In [2]:
# Obtain the probability distribution function (for values x=0,1,...,8)
x <- seq(0,3)
pi <- 0.286
px <- dbinom(x, 3, pi)
px

**3.4** Among 70 women, we would expect 


| $x$  |  P(X=$x$) | Expected number with $x$ miscarriages | Observed number |
| :-: | :-: | :-: | :-: |
| 0 | 0.364 | 70 $\times$ 0.364 =  25.5 | 27 |
| 1 | 0.437 | 70 $\times$ 0.437 =  30.6 | 28 |
| 2 | 0.175 | 70 $\times$ 0.175 =  12.3 | 7 |
| 3 | 0.023 | 70 $\times$ 0.023 = 1.6 | 8 |

**3.5**  There is poor agreement between observed and expected frequencies under the assumption of a
Binomial model. In particular, we have more women than expected with 2 and particularly 3 miscarriages. This implies that the assumptions of constant probability of miscarriage and independence between pregnancies are probably invalid. It is possible that some women are more prone to miscarriage than others.


**Question 4.**  $X \sim binomial(n, \pi)$. Then $X$ is the sum of $n$ independent Bernoulli trials. So let $X_j$ be the $j^{th}$ trial. Then $X = \sum_{j=1}^n X_j$. So

$$
E(X) = E\left(\sum_{j=1}^n X_j \right) = \sum_{j=1}^n E(X_j) = \sum_{j=1}^n\pi = n \pi
$$

$$
Var(X) = Var\left(\sum_{j=1}^n X_j \right) = \sum_{j=1}^n   Var( X_j )
$$

where we can only do that last step because the Bernoulli trials are assumed *independent*.  Then this gives

$$
Var(X) = n \pi (1-\pi)
$$


## The Poisson distribution

**Question 5** Accidental drownings. 

**5.1** We are told the average number of accidental drownings per year in the US is 3 per 100,000. In a city of 200,000 people, $\mu =$ expected number of drownings in a year = 2 $\times$ 3 = 6.

**5.2** The random variable of interest here is the number of accidental drownings (in this city) in a year.
This is a count of the total number of events occurring at random in a fixed period of time, hence we model this count using the Poisson distribution.

If the random variable $X$ denotes the number of drownings in one year in this city, then

$$
X \sim Poisson(\mu = 6)
$$

Using the standard probability distribution function for a Poisson variable, with $\mu = 6$, we have:

$$
P(X=x) = \frac{e^{-\mu} \mu^x}{x!} = \frac{e^{-6} 6^x}{x!}  \mbox{  for  }  x=0, 1, ...
$$

**5.2.A** Therefore, 

$$
P(X=0) = \frac{e^{-6} 6^0}{0!} = 0.00247.
$$

**5.2.B** Between 1 and 2 (inclusive) means that $X$ must take values 1 or 2, so the probability we want is $P(X=1) + P(X=2)$ 

$$
P(1 \leq X \leq 2) = P(X=1) + P(X=2) =  \frac{e^{-6} 6^1}{1!} +  \frac{e^{-6} 6^2}{2!} = 0.0595.
$$


**5.2.C** More than 2 drownings means that $X$ must take values 3, 4, 5, ... 

$$
P(X > 2) = 1 - P(X \leq 2) = 1 - \left\{ P(X=0) + P(X=1) + P(X=2) \right\} 
$$

Using the answers above, this is

$$
P(X > 2) = 1 - 0.00247 - 0.0595 = 0.93803
$$

We can also obtain the probability distribution function from R, using the code below.

In [4]:
# P(X=0)
dpois(0, 6)

# P(1<=X<=2)
dpois(1, 6) + dpois(2, 6)

# P(X>2)
1 - (dpois(0, 6) + dpois(1, 6) + dpois(2, 6))


**Question6** 

*Expectation*

To derive the expectation of a Poisson random variable $X$ with parameter $\mu$, we start by putting the probability distribution function for a Poisson in the standard expression for a mean.

$$
E(X) = \sum_{j=0}^\infty j P(X=j) =  \sum_{j=0}^\infty j \frac{e^{-\mu} \mu^j}{j!}  =  \sum_{j=0}^\infty \frac{e^{-\mu} \mu^j}{(j-1)!} 
$$

The first term (when $j=0$) is zero, so we can write this as

$$
E(X) = \sum_{j=1}^\infty  \frac{e^{-\mu} \mu^j}{(j-1)!} 
$$

Now we are going to let $i=j-1$. So we have $j=i+1$. And when $j=1$, $i=0$. Then the sum above becomes

$$
E(X) = \sum_{i=0}^\infty \frac{e^{-\mu} \mu^{i+1}}{i!}  = \mu \sum_{i=0}^\infty  \frac{e^{-\mu} \mu^{i}}{i!} 
$$

We recognise the summation at the end as the sum of the probability distribution function over the whole sample space, thus this must sum to one. Thus

$$
E(X) = \mu
$$

*Variance*

We first start by calculating $E(X^2)$. Similarly to above,

$$
E(X^2) = \sum_{j=0}^\infty j^2 P(X=j) =  \sum_{j=1}^\infty j^2 \frac{e^{-\mu} \mu^j}{j!} 
$$

Letting $i=j-1$, we have

$$
E(X^2) = \sum_{i=0}^\infty (i+1)^2 \frac{e^{-\mu} \mu^{i+1}}{(i+1)!}  =  \sum_{i=0}^\infty (i+1) \frac{e^{-\mu} \mu^{i+1}}{i!}
$$

Splitting this into two terms, and taking a single $\mu$ out of each term, we have

$$
E(X^2) = \mu \sum_{i=0}^\infty i \frac{e^{-\mu} \mu^i}{i!} + \mu \sum_{i=0}^\infty \frac{e^{-\mu} \mu^i}{i!}
$$

The second summation is simply the probability distribution, summed over the sample space, so sums to one. The first summation is the expression for the mean, $E(X)$, which we have shown is equal to $\mu$. Thus

$$
E(X^2) = \mu^2 + \mu 
$$

Then using the standard formula, $Var(X) = E(X^2) - E(X)^2$ gives

$$
Var(X) = \mu^2 + \mu - \mu^2 = \mu
$$