## Conditional Probability

#### *25 September 2019*
#### *DATA 1010*

We continue our review of probability theory. Today's topics include: independence, variance, Bayes' theorem and conditional expectation.

![](images/0.PNG)

![](images/0a.PNG)

Note: If X and Y are independent, X can take values on 0, 1, 2, and so can Y. There are a total of 9 possibilities. If X and Y are independent, consider them as sequence of events. Then, for X and Y to be 0, each takes 1/4 of probability. Thus the joint probability of X=Y=0 is 1/16.

![](images/0b.PNG)

## Independence and variance  

### Problem 1   
 
Suppose that $X_1, \ldots, X_n$ are independent random variables with the same distribution. Find the mean and variance of 
$$\frac{X_1 + \cdots + X_n}{n}$$

In [1]:
using Distributions
rand(Exponential(1),100)

100-element Array{Float64,1}:
 1.4655097966088284  
 0.013250842488430286
 1.6094203395158782  
 1.5643613690457656  
 0.9491064452452656  
 2.202982566026309   
 1.494785213157604   
 0.2655132194281573  
 1.6773171750997042  
 1.628021148043729   
 0.3421406074463508  
 1.3056563395980108  
 2.4367358226314817  
 ⋮                   
 0.6310031965310358  
 1.7180027365842552  
 1.272117852028623   
 1.4760981203544887  
 0.28208974632566247 
 1.171017830828369   
 0.4909047857584042  
 0.20493172613814067 
 0.15414451871904328 
 2.167040731558729   
 2.0160574416727406  
 1.5103577259830594  

In [2]:
mean(rand(Exponential(1),1000))

1.01790436074049

![](images/1.PNG)

*Solution: (expectation linearity does not relate to independence)*  
$$
E[\frac{X_1 + \cdots + X_n}{n}] = \frac{E[X_1] + \cdots + E[X_n]}{n} = \frac{nE[X_1]}{n} = E[X_1]
$$  
$$
Var[\frac{X_1 + \cdots + X_n}{n}] = \frac{Var[X_1] + \cdots + Var[X_n]}{n^2} = \frac{nVar{X_1}}{n^2} = \frac{Var(X_1)}{n}
$$

---

## Bayes' Theorem

![](images/1a.PNG)

### Problem 2

Assume that the prevalance of a disease in a population is 3%. The *true positive rate* for a pathological test, i.e. the probability of a positive test result for an individual carrying the disease, is 98%. The *false positive rate*, i.e. the probability of a positive test result for an individual without the disease, is 4%. What is the likelihood of an individual having the disease if he/she tested positive twice?

*Hint:* Consider the two tests independent.

![](images/2.PNG)

*Solution:*  
$$
P[D|positive twice] = \frac{P[positive twice|D] * P[D]}{P[positive twice]} = \frac{0.03 * 0.98 * 0.98}{0.03 * 0.98 * 0.98 + 0.97 * 0.04 * 0.04} = 0.949
$$

---

## Conditional expectation 

<img src="925_2_1.png" alt="p" width="200" align=right>

### Problem 3  
  
Consider a pair of random variables $X$ and $Y$ with joint distribution $m$, where $m$ is the probability mass function shown. Find the conditional distribution of $Y$ given $X = x$ for each value of $x$.  

![](images/3.PNG)

*Solution:*  
$$
Distribution[Y | X=x] = Unif(0,..,x)
$$  
$$
E[Y|X] = \frac{x}{2}
$$

---

<img src="925_4_1.png" alt="p" width="500" align=right> 

### Problem 4

Given that $X$ and $Y$ have joint PDF $f(x,y) = \frac{3}{2}(x^2 +y^2)$ on $[0,1]^2$, find the conditional expectation of $Y$ given $X$.
    
*Hint:* Begin by sketching an estimate of the conditional expectation on the graph shown.

![](images/4.PNG)

In [5]:
using SymPy
@vars x y
f_XY = 3(x^2+y^2)/2
f_X = integrate(f_XY, (y,0,1)) # marginal of x
simplify(integrate(y * f_XY / f_X, (y,0,1))) # conditional expectation = y * conditional density of y

  ⎛   2    ⎞
3⋅⎝2⋅x  + 1⎠
────────────
  ⎛   2    ⎞
4⋅⎝3⋅x  + 1⎠

*Solution:*  
$$
Answer: \frac{6x^2 + 3}{12x^2 + 4} - 从左往右是从大到小的
$$


![](images/4a.PNG)

---

<img src="925_5_1.png" alt="p" width="500" align=right>

### Problem 5   

Given that $X$ and $Y$ have joint PDF shown in the figure, sketch an estimate of the conditional expectation of $Y$ given $X = x$.

![](images/5.PNG)

---

<img src="925_6_1.png" alt="p" width="200" align=right>   

### Problem 6   

Given that $X$ and $Y$ have joint PDF $f(x,y) = \frac{9}{5}(1-\sqrt{xy})$ on $[0,1]^2$, find the conditional expectation of $Y$ given $X$.

*Hint:* Begin by sketching an estimate of the conditional expectation on the graph shown.

In [6]:
f_XY = 9(1-sqrt(x*y))/5
f_X = integrate(f_XY, (y,0,1))
simplify(integrate(y * f_XY / f_X, (y,0,1)))

 3⋅(4⋅√x - 5)
─────────────
10⋅(2⋅√x - 3)

![](images/6.PNG)

---

### Challenge Problem

Imagine yourself working as a recruiter for a major tech company. You are tasked with finding the best candidate to fill a vacancy in a data scientist role. You have $n$ applicants to interview, one at a time, in a random order. However, you must make acceptance/rejection decisions immediately after each interview. Out of $n$ candidates, how many should you interview, before making an offer? You may assume that all candidates for the positon can be ranked unambigously from best to worst.

- *Hint 1:* Consider that if you interview all candidates, then you must hire the last candidate regardless of their rank. Assume that you reject first $r-1$ candidates, and then select the next candidate that performs better than these $r-1$ candidate. Write down probability $\mathbb{P}(r)$ for this interview policy.
- *Hint 2:* Since candidates are randomly ordered, each candidate is equally likely to be the best hire for the role.
- *Hint 3:* $\mathbb{P}(r) = \sum_{i=1}^n \mathbb{P}(\text{candidate i is selected|candidate i is the best})\times\mathbb{P}(\text{candidate i is the best})$
- *Hint 4:* You must approximate the sum as an integral to obtain a solution in terms of $r$ and $n$
- *Hint 5:* For various values of $n$, find the optimal value of $r$ using Julia
- *Hint 6*: Plot $\mathbb{P}(r)$ vs. $n$ for optimal values of $r$. What do these values converge to? 