# Chapter 3: Random Variables

In [None]:
# The source of the content is freely available online
# https://drive.google.com/file/d/1VmkAAGOYCTORq1wxSQqy255qLJjTNvBI/view
# https://projects.iq.harvard.edu/stat110/

<h4>Definition 3.1.1 (Random Variable)</h4>

Given an experiment with sample space $S$, a random variable is a function from the sample space $S$ to the real numbers $\mathbb{R}$. Thus, a random variable assigns a numerical value $X(s)$ to each possible outcomes of the experiment.

$P(B|E) = \sum_{i=1}^n P(B|A_i, E) P(A_i, E)$

<h4>Story 3.3.3 (Bernoulli Trial)</h4>

An experiment that can result in either a success or failure, and not both, is a Bernoulli trial.

<h4>Story 3.3.4 (Binomial Distribution)</h4>

Suppose that n independent Bernoulli trials are performed, each with the same success probability p. Let X be the number of successes. The distribution of X is called the Binomial distribution, with parameters of n and p.

$P(X=k) = \binom{n}{k} p^k(1-p)^{n-k}$

<h4>Story 3.4.1 (Hypergeometric Distribution)</h4>

Consider an urn with $w$ white balls and $b$ black balls. We draw $n$ balls out of the urn at random without replacement, such that all $\binom{w+b}{n}$ samples are equally likely. Let $X$ be the number of white balls in the sample. Then $X$ is said to have the Hypergeometric distribution with parameters $w$, $b$, and $n$.

<h4>Theorem 3.4.1 (Hypergeometric PMF)</h4>

If $X \text{~} Geom(w,b,n)$, then with all samples equally likely, the PMF of $X$ is:

$P(X=k) = \frac{ \binom{w}{k} \binom{b}{n-k} }{ \binom{w+b}{n} } $

for integers satisfying $0 \le k \le w$ and $0 \le n-k \le b$, and $P(X=k)=0$ otherwise.

This PMF is valid because the numerator, summed over all $k$, equals $\binom{w+b}{n}$ by Vandermonde's identity.

The essential structure of the Hypergeometric story is that items in a population are classified using two sets of tags. Furthermore, at least one of these sets of tags is assigned completely at random (like the balls sampled randomly from the urn). Then, $X \text{~} HGeom(w,b,n)$ represents the number of twice-tagged items (e.g., balls that are both white and sampled).

<h4>Example 3.4.3 (Elk Capture-Recapture)</h4>

A forest has $N$ elk. Today, $m$ of the elk are captured, tagged, and released into the wild. At a later date, $n$ elk are recaptured at random. Assume that the recaptured elk are equally likely to be any set of $n$ of the elk.

By the story of the Hypergeometric, the number of tagged elk in the recaptured sample is $HGeom(m, N-m, n)$. The m tagged elk correspond to the white balls and the $N-m$ untagged elk correspond to the black balls. Instead of sampling $n$ balls from the urn, we recapture $n$ elk from the forest.

<h4>Example 3.4.4 (Aces in a Poker Hand)</h4>

In a 5-card hand drawn at random, the number of aces in the hand was the $HGeom(4,48,5)$ distribution, which can $b$ seen by thinking of the aces as white balls and the non-aces as black balls.

$\frac{ \binom{4}{3} \binom{48}{2} }{ \binom{52}{5} } \approx 0.0017$

<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Story</th><th>First Set of Tags</th><th>Second Set of Tags</th></tr></thead><tbody>
 <tr><td>Urn</td><td>White, Black</td><td>Sampled, Not Sampled</td></tr>
 <tr><td>Elk</td><td>Tagged, Untagged</td><td>Recaptured, Not Recaptured</td></tr>
 <tr><td>Cards</td><td>Cards, Ace</td><td>In Hand, Not in Hand</td></tr>
</tbody></table>

<h4>Theorem 3.4.5</h4>

The $HGeom(w,b,n)$ and $HGeom(n, s+b-n, w)$ distributions are identical.

<h5>Binomial vs. Hypergeometric</h5>

The Binomial and Hypergeometric distributions are often confused. Both can be interpreted as the number of successes in $n$ Bernoulli trials. For the Hypergeometric, each tagged elk in the recaptured sample can be considered a success and each untagged elk a failure. However, in the Binomial story the Bernoulli trials are independent, and in the Hypergeometric story are dependent, since the sampling is done without replacement. Knowing that one elk in our sample is tagged decreases the probability that the second elk will also be tagged.

<h4>3.8 Independence of Random Variables</h4>

Random variables $X$ and $Y$ are said to be independent if $P(X \le x, Y \le Y) = P(X \le x) P(Y \le y)$

In the discrete case, this is equivaleent to the condition:

$P(X=x, Y=y) = P(X=x)P(Y=y)$

<h4>Definition 3.8.2 (Independence of Many Random Variables)</h4>

Random variables $X_1, \ldots, X_n$ are independent if 

$P(X_1 \le x_1, \ldots, X_n \le x_n) = P(X_1 \le x_1, \ldots, P(X_n \le x_n))$

<h4>Theorem 3.8.5 (Functions of Random Variables)</h4>

If $X$ and $Y$ are independent random variables $X$ and $Z$, the function $P(X=x | Z=x)$, when considered as a function of $x$ for fixed $z$, is called the conditional PMF of $X$ given $Z=z$.

<h4>Definition 3.8.6 (IID)</h4>

Random variables that are independent and have the same distribution are called independent and identically distributed, or IID.

<h4>Definition 3.8.11</h4>

For any discrete random variables $X$ and $Z$, the function $P(X=x|Z=z)$, when considered as a function of $x$ for fixed $z$, is called the conditional PMF of $X$ given $Z=z$.

Independence of random variables does not imply conditional independence, nor vice versa.

<h4>Example 3.8.12 (Matching Pennies)</h4>

Each of two players, $A$ and $B$, has a fair penny. They flip their pennies independently. If the pennies match, $A$ wins; otherwise, $B$ wins. Let $X$ be $1$ if $A$'s penny lands heads and $-1$ otherwise, and define $Y$ similarly for $B$.

Let $Z=XY$, which is $1$ if $A$ wins and $-1$ if $B$ wins. Then $X$ and $Y$ are unconditionally independent, but given $Z=1$, we know that $X=Y$, so $X$ and $Y$ are conditionally dependent given $Z$.

<h4>Example 3.8.13 (Two Friends)</h4>

Consider the "I only have two friends who call me" scenario from example 2.5.11. Le $X$ be the indicator of Alice calling me next Friday, $Y$ be the indicator of Bob calling me next Friday, and $Z$ be the indicator of exactly one of them calling next Friday. Then $X$ and $Y$ are independent (by assumption). But given $Z=1$, we have that $X$ and $Y$ are completely dependent: given that $Z=1$, we have $Y=1-X$.

Next we'll see why conditional independence does not imply independence.

<h4>Example 3.8.14 (Mystery Opponent)</h4>

Suppose you are going to play two games of tennis against one of two identical twins. Against one, you are evenly matched, and against the other, you have a $3/4$ chance of winning.

Suppose that you can't tell which twin you are playing against until after the two games. Let $Z$ be the indicator of playing against the twin with whom you're evenly matched, and let $X$ and $Y$ be the indicators of victory in the first and second games respectively.

Conditional on $Z=1$, $X$ and $Y$ are IID Bern(1/2), and conditional on $Z=0$, $X$ and $Y$ are IID $Bern(3/4)$. So $X$ and $Y$ are conditionally independent given $Z$. Unconditionally, $X$ and $Y$ are dependent because observing $X=1$ makes it more likely that we are playing the twin who is worse. That is, $P(Y=1, X=1) \gt P(Y=1)$.

<h3>Connections Between Binomial and Hypergeometric</h3>

<h4>Theorem 3.9.2</h4>

If $X \text{~} Bin(n,p)$, $Y \text{~} Bin(m,p)$, and $X$ is independent of $Y$, then the conditional distribution of $X$ given $X+Y=r$ is $HGeom(n,m,r)$.

<h4>Linearity of Expectation</h4>

The expected value of a sum of random variables is the sum of the individual expected values.

<h4>Theorem 4.2.1 (Linearity of Expectation)</h4>

For any random variables $X$, $Y$, and any constant $c$, 

$E(X+Y) = E(X) + E(Y)$
$E(cX) = cE(X)$

# Exercises

In [None]:
(subtitle)

<h4>Exercise 18</h4>

In the World Series, two baseball teams play a sequence of games against each other, and the first to win $4$ games wins the series. Let $p$ be the probability that $A$ wins an individual game, and assume the games are independent. What is the probability that team $A$ wins the series?

<i>Answer:</i>

Let $q=1-p$. First, let us do a direct calculation.

$P(\text{A Wins}) = $

$P(\text{A wins in 4 games}) + $

$P(\text{A wins in 5 games}) + $

$P(\text{A wins in 6 games}) + $

$P(\text{A wins in 7 games})$

</br>

$= p^4 + \binom{4}{3} p^4q + \binom{5}{3} p^4 q^2 + \binom{6}{3} p^4 q^3$

</br>

For intuition, note for example that:

$P(\text{A wins in 5}) = P(\text{A wins 3 of first}  4) \cdot P(\text{A wins } 5^{th} | \text{A wins 3 of first 4})$
$P(\text{A wins in 5}) = \binom{4}{3} p^3 qp$

A neater solution is to use the fact that we can assume that the teams play all $7$ games no matter what. Let $X$ be the number of wins for team $A$, so that $X \text{~} Bin(7,p)$. Then:

$P(X \ge 4) = P(X=4) + P(X=5) + \ldots + P(X=7)$

$P(X \ge 4) = \binom{7}{4} p^4 q^3 + \binom{7}{5} p^5 q^2 + \binom{7}{6} p^6q + p_7$

<h4>Exercise 28</h4>

There are $n$ eggs, each of which hatches a chick with probability $p$, independently. What is the distribution of the number of chicks that hatch? What is the distribution of the number of chicks that survive?

<i>Answer:</i>

Let $H$ be the number of eggs that hatch and $X$ be the number of hatchlings that survive. Each egg is a Bernoulli trial where H represents a success in terms of hatching, and $X$ is a success in terms of surviving. By the story of the Binomial, $H \text{~} Bin(n,p)$ with PMF:

$P(H=k) = \binom{n}{k} p^k(1-p^{n-k})$

The eggs independently have probability pr each of hatching a chick that survives. By the story of the Binomial, we have $X \text{~} Bin(n,pr)$ with PMF:

$P(X=k) = \binom{n}{k} (pr)^k (1-pr)^{n-k}$

<h4>Exercise 37</h4>

A message is sent over a noisy channel. The message is a sequence of n bits ($x_i \in \{0,1\}$). Assume that the error events are independent. Let $p$ be the probability that an individual bit has an error ($0 \lt p \lt \frac{1}{2}$). Let $y_1, y_2, \ldots, y_n$ be the received message (so $y_i = x_i$ if there is no error in that bit).

To help detect errors, the $n^{th}$ bit is reserved for a parity check; $x_n$ is defined to be 0 if $x_1 + x_2 + \ldots + x_{n-1}$ is even, and 1 of $x_1 + x_2 + \ldots + x_{n-1}$ is odd. When the message is received, the recipient checks whether $y_n$ has the same parity as $y_1 + y_2 + \ldots + y_{n-1}$. If the parity is wrong, the recipient knows that at least one error occurred; otherwise, the recipient assumes that there were no errors.

<b>Part A:</b>

For $n=5$, $p=0.1$, what is the probability that the received message has errors which go undetected?

<i>Answer:</i>

Note that $\sum_{i=1}^n x_i$ is even. If the number of errors is even and nonzero, the errors will go undetected; otherwise, $\sum_{i=1}^n y_i$ will be odd, so the errors will be detected. The number of errors is $Bin(n,p)$, so the probability of undetected errors when $n=5$, $p=0.1$ is:

$\binom{5}{2} p^2 (1-p)^3 + \binom{5}{4} p^4 (1-p) \approx 0.073$

<b>Part B:</b>

For general $n$ and $p$, write down an expression (as a sum) for the probability that the received message has errors which go undetected.

<i>Answer:<i/>

By the same reasoning as in part A, the probability of undetected errors is:

$\sum_{k ~even, ~k \ge 2} \binom{n}{k} p^k (1-p)^{n-k}$