# Chapter 5: Continuous RVs

In [None]:
# The source of the content is freely available online
# https://drive.google.com/file/d/1VmkAAGOYCTORq1wxSQqy255qLJjTNvBI/view
# https://projects.iq.harvard.edu/stat110/

<h4>Definition 5.1.1 (Continuous Random Variables)</h4>

A random variable has a continuous distribution if its cumulative distribution function (CDF) is differentiable. For continuous random variables, the derivative of the CDF is the PDF.

<h4>Definition 5.1.2 (Probability Density Function)</h4>

For a continuous random variable $X$ with CDF $F$, the probability density function (PDF) of $X$ is the derivative of the CDF, given by $f(x)=F'(x)$. The support of $X$, and its distribution, is the set of all $x$ where $f(x) \gt 0$. To get a desired probability, integrate the PDF over the appropriate range.

<h4>Theorem 5.1.5 (Valid PDFs)</h4>

The PDF of a continuous random variable must satisfy the following two criteria:
- Nonnegative, $f(x) \gt 0$
- Integrates to $1$, $\int_{-\infty}^{\infty} f(x) ~dx = 1$

<h4>Definition 5.1.10 (Expectation of a Continuous Random Variable)</h4>

The expected value of a continuous random variable $X$ with PDF $f$ ix:

$E(X) \int_{-\infty}^{\infty} x ~f(x) ~dx$

<h4>Theorem 5.1.1 (Continuous LOTUS)</h4>

If $X$ is a continuous random variable with PDF $f$ and $g$ is a function from $\mathbb{R}$ to $\mathbb{R}$, then:

$E(g(X)) = \int_{-\infty}^{\infty} g(x) ~f(x) ~dx$

<h4>Definition 5.2.1 (Uniform)</h4>

A continuous random variable $U$ is said to have the Uniform distribution on the interval $(a,b)$ if its PDF is:

$f(x) = \frac{1}{b-a}, if a \lt x \lt b$, otherwise $0$

<h4>5.3 Universality of the Uniform</h4>

Given a $\text{Unif}(0,1)$ random variable, we can construct a random variable with any continuous distribution. Conversely, we can create a $\text{Unif}(0,1)$ random variable. We call this the universality of the Uniform.

<h4>Theorem 5.3.1 (Universality of the Uniform)</h4>

Let $F$ be a CDF which is a continuous function and strictly increasing on the support of the distribution. This ensures that the inverse function $F^{-1}$ exists, as a function from $(0,1)$ to $\mathbb{R}$.

1. Let $U \text{~} \text{Unif}(0,1)$ and $X=F^{-1}(U)$. Then $X$ is a random variable with CDF $F$.
2. Let $X$ be a random variable with CDF $F$. Then, $F(X) \text{~} \text{Unif}(0,1)$

<h4>5.5 Exponential Distribution</h4>

The Exponential distribution is the continuous counterpart to the Geometric distribution. The average number of successes in a time interval of length $t$ is $\lambda t$, though the actual number of successes varies randomly. An Exponential random variable represents the waiting time until the first arrival of a success.

<h4>Definition 5.5.1</h4>

A continuous random variable $X$ is said to have the Exponential distribution with parameter $\lambda$, where $\lambda \gt 0$, if its PDF is $f(x) = \lambda e^{-\lambda x}, x \gt 0$.

The corresponding CDF is:

$F(x) = 1 - 3^{-\lambda x}, x \gt 0$

<h4>Definition 5.5.2 (Memoryless Property)</h4>

A continuous distribution is said to have the memoryless property if a random variable $X$ from that distribution satisfies:

$P(X \ge s+t | X \ge s) = P(X \ge t)$

$s$ represents the time you've already spent waiting. The definition says that after you've waited s minutes, the probability you'll have to wait another $t$ minutes is the same as the probability of having to wait $t$ minutes with no previous waiting time already accumulated.

<h4>Theorem 5.5.3</h4>

If $X$ is a positive continuous random variable with the memoryless property, then $X$ has an Exponential distribution

<h4>5.6 Poisson Processes</h4>

The Exponential and Poisson are linked by a common story, which is the story of the Poisson process. A Poisson process is a sequence of arrivals occurring at different points on a timeline, such that the number of arrivals in a particular interval of time has a Poisson distribution.

<h4>Definition 5.6.1 (Poisson Process)</h4>

A process of arrivals in continuous time is called a Poisson process with rate $\lambda$ if the following two conditions hold:

1. The number of arrivals that occur in an interval of length $t$ is a $Pois(\lambda t)$ random variable.
2. The number of arrivals that occur in disjoint intervals are independent of each other.

<h4>Example 5.6.5 (Maximum of 3 Independent Exponentials)</h4>

Three students are working independently on their homework. All 3 start at 1pm on a certain day, and each takes an Exponential time with mean 6 hours to complete. What is the earliest time at which all 3 students will have completed their homework, on average?

<i>Answer:</i>

Label the students as $1,2,3$ and let $X_j$ be how long it takes student $j$ to finish the homework. Let $\lambda = \frac{1}{6}$, and let $T$ be the time when all $3$ students will have completed the homework, so $T = max(X_1, X_2, X_3)$ with $X_i \text{~} Expo(\lambda)$. The CDF of $T$ is:

$P(T \le t) = P(X_1 \le t, X_2 \le t, X_3 \le t) = (1-e^{-\lambda t}) = (1-e^{-\lambda t})^3$

So the PDF of $T$ is:

$f_T(t) = 3 \lambda e^{-\lambda t} (1-e^{-\lambda t})^2$

$T$ is not Exponential.

An approach to finding $E(T)$ is to use the memoryless property and the fact that the minimum of Exponentials is Exponential.

$T = T_1 + T_2 + T_3$

where $T_1 = min(X_1, X_2, X_3)$ is how long it takes for one student to complete the homework, $T_2$ is the additional time it takes for a second student to complete the homework, and $T_3$ is the additional time it takes for all $3$ students to have completed the homework. Then $T_1 \text{~} Expo(3 \lambda)$, by the result of example 5.6.3.

By the memoryless property, at the first time when a student completes the homework, the other two students are starting from fresh, so $T_2 \text{~} Expo(2 \lambda)$. Again, by the memoryless property, $T_3 \text{~} Expo(\lambda)$. The memoryless property also implies that $T_1, T_2, T_3$ are independent.

$E(T) = \frac{1}{3 \lambda} + \frac{1}{2 \lambda} + \frac{1}{\lambda}$
$E(T) = 2 + 3 + 6 = 11$

On average, the $3$ students will have completed all the homework by midnight, $11$ hours after starting. 

<h4>Example 5.6.6 (Machine Repair)</h4>

A certain machine often breaks down and needs to be fixed. At time $0$, the machine is working. It works for an $Expo(\lambda)$ period of time (in days) and then breaks down. It then takes an $Expo(\lambda)$ amount of time to get fixed, after which it works for an $Expo(\lambda)$ time before breaking down again, etc. The $Expo(\lambda)$ random variables are IID.

<b>Part A:</b>

A transition occurs when the machine switches from working to being broken, or switches from being broken to working. Find the distribution of the number of transitions that occur in the time interval $(0,t)$.

<i>Answer:</i>

The times between transitions are IID $Expo(\lambda)$, so the times at which transitions occur follow a Poisson process of rate $\lambda$. So the desired distribution is $Pois(\lambda)$.


<b>Part B:</b>

The machine is redesigned so that it can continue to function even if one component has failed. The redesigned machine has $5$ components, each of which works for an $Expo(\lambda)$ amount of time and then fails, independently. The machine works properly only if at most one component has failed. Find the expected time until the machine breaks down.

<i>Answer:</i>

The time until a component fails is $Expo(5 \lambda)$. Then, by the memoryless property, the additional time until another component fails is $Expo(4 \lambda)$. So the expected time until the machine breaks down is:

$\frac{1}{5 \lambda} + \frac{1}{4 \lambda} = \frac{9}{20 \lambda}$

# Exercises

<h3>Exponential Exercises</h3>

<h4>Exercise 38</h4>

A post office has $2$ clerks. Alice enters the post office while $2$ other customers, Bob and Claire, are being served by the $2$ clerks. She is next in line. Assume the time a clerk spends serving a customer has the $Expo(\lambda)$ distribution.

<b>Part A:</b>

What is the probability that Alice is the last of the $3$ customers to be done being served?

<i>Answer:</i>

Alice begins to be served when either Bob or Claire leaves. By the memorylessness property, the additional time needed to serve whichever of Bob or Claire is still there is $Expo(\lambda)$. The time it takes to serve Alice is also $Expo(\lambda)$, so by symmetry, the probability is $\frac{1}{2}$ that Alice is the last to be done being served.

<b>Part B:</b>

What is the expected total time that Alice needs to spend at the post office?

<i>Answer:</i>

The expected time spent waiting in line is $\frac{1}{2 \lambda}$, since the minimum of two independent $Expo(\lambda)$ random variables is $Expo(2 \lambda)$. The expected time spent being served is $\frac{1}{\lambda}$. So the expected total time is:

$\frac{1}{2 \lambda} + \frac{1}{\lambda} = \frac{3}{2 \lambda}$

<h4>Exercise 41</h4>

Fred wants to sell his car to the first person to offer at least $\$15,000$. Assume the offers are independent Exponential random variables with mean $\$10,000$. Find the expected number of offers Fred will have.

<i>Answer:</i>

The offers on the car are IID $X_i \text{~} Expo \left( \frac{1}{10^4} \right)$. So the number of offers that are too low is $Geom(p)$ with $p = p(X_i \ge 15,000 = e^{-1.5})$. Including the successful offer, the expected number of offers is thus $(1-p)/p + 1 = \frac{1}{p} = e^{1.5}$.

<h3>Mixed Practice</h3>

<h4>Exercise 55</h4>

Consider an experiment where we observe the value of a random variable $X$, and estimate the value of an unknown constant $\theta$ using some variable $T = g(X)$ that is a function of $X$. The random variable $T$ is called an estimator. Think of $X$ as the data observed in the experiment, and $\theta$ as an unknown parameter related to the distribution of $X$.

The bins of an estimator $T$ for $\theta$ is defined as $b(T) = E(T) - \theta$. The MSE is the average squared error when using $T(X)$ to estimate $\theta$.

$MSE(T) = E(T-\theta)^2$

Show that $MSE(T) = Var(T) + (b(T))^2$

This implies that for fixed MSE, lower bias can only be attained at the cost of higher variance and vice versa. This is a form of the bias-variance trade-off.

<i>Answer:</i>

Using the fact that adding a constant does not affect variance, we have:

$Var(T) = Var(T - \theta) = E(T - \theta)^2 - E(T - \theta))^2 = MSE(T) - (b(T))^2$