# 3. Random variables and their distributions

## Brief summary

### Random variable

Given an experiment with sample space $S$, a *random variable* (r.v.) is a function from the sample space $S$ to the real numbers $\mathbb{R}$. It is common, but not required, to denote random variables by capital letters.

Thus, a random variable $X$ assigns a numerical value $X(s)$ to each possible outcome $s$ of the experiment. The randomness comes from the fact that we have a random experiment (with probabilities described by the probability function P); the mapping itself is deterministic.

### Discrete random variable

A random variable $X$ is said to be *discrete* if there is a finite list of values $a_1, a_2, ..., a_n$ or an infinite list of values $a_1, a_2, ...$ such that $P(X = a_j$ for some $j) = 1$. If $X$ is a discrete r.v., then the finite or countably infinite set of values $x$ such that $P(X=x) > 0$ is called the *support* of $X$.

### Probability mass function

The *probability mass function* (PMF) of a discrete r.v. $X$ is the function $p_X$ given by $p_X(x) = P(X=x)$. Note that this is positive if $x$ is in the support of $X$, and 0 otherwise.

#### Valid PMFs

Let $X$ be a discrete r.v. with support $x_1, x_2, ...$ (assume these values are distinct and, for notational simplicity, that the support is countably infinite; the analogous results hold if the support is finite). The PMF $p_X$ of $X$ must satisfy the following two criteria:

- Nonnegative: $p_X(x) > 0$ if $x=x_j$ for some $j$, and $p_X(x) = 0$ otherwise;
- Sums to 1: $\sum_{j=1}^\infty p_X(x_j) = 1$

### Bernoulli distribution

An r.v. $X$ is said to have the *Bernoulli distribution* with parameter $p$ if $P(X=1) = p$ and $P(X=0) = 1 - p$, where $0 < p < 1$. We write this as $X \sim Bern(p)$. The symbol $\sim$ is read "is distributed as".

#### Indicator random variable

The *indicator random variable* of an event $A$ is the r.v. which equals $1$ if $A$ occurs and $0$ otherwise. We will denote the indicator r.v. of $A$ by $I_A$ or $I(A)$. Note that $I_A \sim Bern(p)$ with $p = P(A)$.

### Binomial PMF

If $X \sim Bin(n,p)$, then the PMF of $X$ is

\begin{equation}
P(X=k) = {{N}\choose{k}}p^k(1-p)^{n-k}
\end{equation}

for $k = 0, 1, ..., n$ (and $P(X=k) = 0$ otherwise).

### Hypergeometric PMF

If $X \sim HGeom(w,b,n)$, then the PMF of $X$ is 

\begin{equation}
P(X=k) = \frac{ {{w}\choose{k}} {{b}\choose{n-k}} } { {{w+b}\choose{n}} }
\end{equation}

for integers $k$ satisfying $0 \leq k \leq w$ and $0 \leq n-k \leq b$, and $P(X=k)=0$ otherwise.

### Discrete Uniform distribution

Let $C$ be a finite, nonempty set of numbers. Choose one of these numbers uniformly at random (i.e., all values in $C$ are equally likely). Call the chosen number $X$. Then $X$ is said to have the *Discrete Uniform distribution* with parameter $C$; we denote this by $X \sim DUnif(C)$.

The PMF of $X \sim DUnif(C)$ is

\begin{equation}
P(X=x) = \frac{1} {|C|}
\end{equation}

for $x \in C$ (and $0$ otherwise). For $X \sim DUnif(C)$ and any $A \subseteq C$, we have

\begin{equation}
P(X \in A) = \frac{|A|} {|C|}.
\end{equation}


### Cumulative distribution function

The *cumulative distribution function* (CDF) of an r.v. $X$ is the function $F_X$ given by $F_X(x) = P(X \leq x)$. When there is no risk of ambiguity, we sometimes drop the subscript and just write $F$ (or some other letter) for a CDF.

#### Valid CDFs

Any CDF $F$ has the following properties.

- Increasing: If $x_1 \leq x_2$, then $F(x_1) \leq F(x_2)$.
- Right-continuous: The CDF is continuous except possibly for having some jumps. Wherever there is a jump, the CDF is continuous from the right. That is, for any $a$, we have

\begin{equation}
F(a) = \lim_{x \to -\infty} F(x).
\end{equation}

- Convergence to $0$ and $1$ in the limits:

\begin{equation}
\lim_{x \to -\infty} F(x) = 0\ {and}\ \lim_{x \to \infty} F(x) = 1.
\end{equation}

### Function of an r.v.

For an experiment with sample space $S$, an r.v. $X$, and a function $g: \mathbb{R} \to \mathbb{R}$, $g(X)$ is the r.v. that maps $s$ to $g(X(s))$ for all $s \in S$.

#### PMF of g(X)

Let $X$ be a discrete r.v. and $g:\mathbb{R} \to \mathbb{R}$. Then the support of $g(X)$ is the set of all $y$ such that $g(x) = y$ for at least one $x$ in the support of $X$, and the PMF of $g(X)$ is

\begin{equation}
P(g(x)=y) = \sum_{x:g(x)=y} P(X=x)
\end{equation}

for all $y$ in the support of $g(X)$.

#### Function of two r.v.s

Given an experiment with sample space $S$, if $X$ and $Y$ are r.v.s that map $s \in S$ to $X(s)$ and $Y(s)$ respectively, then $g(X,Y)$ is the r.v. that maps $s$ to $g(X(s),Y(s))$.

### Independence of two r.v.s

Random variables $X$ and $Y$ are said to be *independent* if

\begin{equation}
P(X \leq x, Y \leq y) = P(X \leq x)P(Y \leq y),
\end{equation}

for all $x, y \in \mathbb{R}$. In the discrete case, this is equivalent to the condition

\begin{equation}
P(X=x, Y=y) = P(X=x)P(Y=y)
\end{equation}

for all $x, y$ with $x$ in the support of $X$ and $y$ in the support of $Y$.

#### Independence of many r.v.s

Random variables $X_1, ..., X_n$ are *independent* if

\begin{equation}
P(X_1 \leq x_1, X_n \leq x_n) = P(X_1 \leq x_1)...P(X_n \leq x_n),
\end{equation}

for all $x_1, ..., x_n \in \mathbb{R}$. For infinitely many r.v.s, we say that they are independent if every finite subset of the r.v.s is independent.

#### i.i.d.

Random variables that are independent and have the same distribution are called *independent and identically distributed*, or *i.i.d.* for short.

### Conditional independency of r.v.s

Random variables $X$ and $Y$ are *conditionally independent* given an r.v. $Z$ if for all $x,y \in \mathbb{R}$ and all $z$ in the support of $Z$,

\begin{equation}
P(X \leq x, Y \leq y|Z = z) = P(X \leq x|Z = z)P(Y \leq y|Z = z)
\end{equation}

For discrete r.v.s, an equivalent definition is to require

\begin{equation}
P(X = x, Y = y|Z = z) = P(X = x|Z = z)P(Y = y|Z = z)
\end{equation}

#### Conditional PMF

For any discrete r.v.s $X$ and $Z$, the function $P(X=x|Z=z)$, when considered as a function of $x$ for fixed $z$, is called the *conditional PMF of $X$ given $Z = z$*.



## Python examples

In [8]:
import numpy as np
from scipy.stats import binom, hypergeom
from numpy.random import choice
from numpy.random import permutation

%matplotlib inline
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Binomial distribution

\begin{equation}
P(X=k) = {{N}\choose{k}}p^k(1-p)^{n-k}
\end{equation}

In [7]:
x, n, p = 3, 5, 0.2
print(binom.pmf(x, n, p))   # PMF
print(binom.pmf(np.arange(5), n, p))    # PMF for multiple values
print(binom.cdf(x, n, p))     # CDF
print(binom.rvs(n, p, size=7))    # Generating Binomial r.v.s

0.0512
[ 0.32768  0.4096   0.2048   0.0512   0.0064 ]
0.99328
[2 1 2 2 0 1 1]


### Hypergeometric distribution

\begin{equation}
P(X=k) = \frac{ {{w}\choose{k}} {{b}\choose{n-k}} } { {{w+b}\choose{n}} }
\end{equation}

In [19]:
x, n, w, b = 5, 12, 7, 13
print(hypergeom.pmf(x, w+b, w, n))
print(hypergeom.pmf(np.arange(5), w+b, w, n))
print(hypergeom.cdf(x, w+b, w, n))
print(hypergeom.rvs(w+b, w, n, size=10))

0.286068111455
[  1.03199174e-04   4.33436533e-03   4.76780186e-02   1.98658411e-01
   3.57585139e-01]
0.894427244582
[3 5 5 4 3 5 5 5 5 5]


### Discrete distributions with finite support

Generating realizations of i.i.d r.v.s $X_1, ..., X_{100}$ whose PMF is 

\begin{equation}
P(X_j=0) = 0.25, \\
P(X_j=1) = 0.5, \\
P(X_j=5) = 0.1, \\
P(X_j=10) = 0.15,
\end{equation}

and $P(X_j=x) = 0$ for all other values of $x$.

In [None]:
x = [0, 1, 5, 10]
p = [0.25, 0.5, 0.1, 0.15]

In [None]:
np.random.choice?

In [15]:
C = 2
n = 10**5
population = np.arange(C)    # [0, 1] == ['girl', 'boy']
child1 = choice(population, size=n, replace=True)    # the gender of the elder child in each of n families
child2 = choice(population, size=n, replace=True)    # the gender of the younger child in each of n families

n_b = np.sum(child1 == 0)    # N(B): the number of families where the elder is a girl
n_ab = np.sum(np.all([child1 == 0, child2 == 0], axis=0))    # N(A \cap B): the number of families where both childeren are girls and the elder is a girl
print(n_ab / float(n_b))

0.502147128467


- $A$: The event that both children are girls
- $B$: The event that at least one of the children is a girl

\begin{equation}
P(A|B) = \frac{P(A \cap B)} {P(B)} = \frac{1/4} {3/4} = 1/3
\end{equation}

In [16]:
n_b = np.sum(np.any([child1 == 0, child2 == 0], axis=0))    # N(B): the number of families where at least one of the children is a girl
n_ab = np.sum(np.all([child1 == 0, child2 == 0], axis=0))    # N(A \cap B): the number of families where both childeren are girls and the elder is a girll
print(n_ab / float(n_b))

0.334608081727


### Monty Hall simulation

#### Example: Monty Hall

On the game show Let’s Make a Deal, hosted by Monty Hall, a contestant chooses one of three closed doors, two of which have a goat behind them and one of which has a car. Monty, who knows where the car is, then opens one of the two remaining doors. The door he opens always has a goat behind it (he never reveals the car!). If he has a choice, then he picks a door at random with equal probabilities. Monty then offers the contestant the option of switching to the other unopened door. If the contestant’s goal is to get the car, should she switch doors?

(answer)
Let’s label the doors 1 through 3. Without loss of generality, we can assume the contestant picked door 1 (if she didn’t pick door 1, we could simply relabel the doors, or rewrite this solution with the door numbers permuted). Monty opens a door, revealing a goat. As the contestant decides whether or not to switch to the remaining unopened door, what does she really wish she knew? Naturally, her decision would be a lot easier if she knew where the car was! This suggests that we should condition on the location of the car. Let $C_i$ be the event that the car is behind door i, for $i = 1, 2, 3$. By the law of total probability,

\begin{equation}
P(get\ car) = P(get\ car|C_1)\cdot\frac{1}{3} + P(get\ car|C_2)\cdot\frac{1}{3} + P(get\ car|C_3)\cdot\frac{1}{3}
\end{equation}

Suppose the contestant employs the switching strategy. If the car is behind door 1, then switching will fail, so $P(get\ car|C_i) = 0$. If the car is behind door 2 or 3, then because Monty always reveals a goat, the remaining unopened door must contain the car, so switching will succeed. Thus,

\begin{equation}
P(get\ car) = 0\cdot\frac{1}{3} + 1\cdot\frac{1}{3} + 1\cdot\frac{1}{3} = \frac{2}{3}
\end{equation}

so the switching strategy succeeds 2/3 of the time. The contestant shou

In [20]:
# Assume the contestant always chooses door 0
C = 3
n = 10**5   # Number of trials
population = np.arange(C)   # [0, 1, 2]
cardoor = choice(population, n, replace=True)
print(np.sum(cardoor == 0) / float(n))   # The fraction of times when the never-switch strategy succeeds

0.33497


In [37]:
def monty(simulate=True):
    doors = np.arange(3)   # [0, 1, 2]
    # Randomly pick where the car is
    cardoor = choice(doors, 1)[0]
    
    if not simulate:
        # Prompt player - 
        # Receive the player's choice of door (should be 0, 1, or 2)
        chosen = int(input("Monty Hall says 'Pick a door, any door!'"))
    else:
        chosen = 0
    
    # Pick Monty's door (can't be the player's door or the car door)
    if chosen != cardoor:
        montydoor = doors[np.all([doors != chosen, doors != cardoor], axis=0)]
    else:
        montydoor = choice(doors[doors != chosen])
        
    if not simulate:
        # Find out whether the player wants to switch doors
        print('Monty opens door {}!'.format(montydoor))
        reply = str(input('Would you like to switch (y/n)?'))
        
        # Interpret what player wrote as 'yes' if it starts with 'y'
        if reply[0] == 'y':
            chosen = doors[np.all([doors != chosen, doors != montydoor], axis=0)]
    else:
        # FIXME: always change
        chosen = doors[np.all([doors != chosen, doors != montydoor], axis=0)]
    
    # Announce the result of the game!
    if (chosen == cardoor): 
        if not simulate: print('You won!')
        return True
    else:
        if not simulate: print('You lost!')
        return False

In [38]:
n = 10**5   # Number of trials
results = []
for i in range(n):
    results.append(monty(simulate=True))