$
\newcommand{\real}[1]{\mathbb{#1}}
\newcommand{\expect}{\mathrm{E}}
\newcommand{\prob}{\mathrm{P}}
\newcommand{\v}{\mathrm{var}}
\newcommand{\cov}{\mathrm{cov}}
\newcommand{\Comb}[2]{{}^{#1}C_{#2}}
\newcommand{\notimplies}{\;\not\!\!\!\implies}
$

# Further Topics on Random Variable

## Derived Distributions 
*Lecture 11*

Here we learn how to derive the distribution of a function of a random variable $Y = g(X)$, given the distribution of $X$ itself. We learn how to do that for discrete and continuous variables separately and then devise a common approach for both variables using CDFs. Note that the expected value of a function of random variable can be found out directly by using the formula $$\expect[g(X)] = \int_{-\infty}^{+\infty}g(x)p_X(x)dx$$. All this is needed only in case we need the full distribution. Finally we learn how to find distribution for $Z = g(X,Y)$ given the joint distribution for $X$ and $Y$.   

**Discrete Random Variable**  
Suppose $Y = 2X + 3$.
$$\begin{align}
p_Y(y) &= \prob(Y=y) \\
&= \prob(2X+3 = Y) \\
&= \prob(X = \frac{Y-3}{2}) \\
&= p_X(\frac{y-3}{2})
\end{align}$$  

Thus, 
$$Y = aX+b : \; \; \; p_Y(y) = p_X(\frac{y-3}{2})$$  

**Continuous Random Variable**  
Here $F$ denotes the CDF.
$$\begin{align}
F_Y(y) &= \prob(Y \leq y) \\
&= \prob(aX+b \leq Y) \\
&= \prob(X \leq \frac{Y-b}{a}) \\
&= F_X(\frac{y-3}{2})
\end{align}$$   

Thus, $$f_Y(y) = \frac{1}{|a|}f_X(\frac{y-b}{a})$$  

This makes intuitive sense. The PDF graph of $Y=aX+b$ is the same as that of $X$, but only scaled by $a$ and translated horizontally by $b$. If we scale a graph by $a$, the area under the graph also gets scaled by $a$. Therefore, in order to keep the area under the graph equal to $1$, we need to divide by $|a|$.  

We can use the above described technique (using CDF) to derive the distribution of any general function $g(X)$ (not necessarily linear). The two step procedure is as follows:
* Find the CDF of $Y$: $$F_Y(y) = \prob(Y < y) = \prob(g(X) \leq y)$$  
* Differentiate: $$f_Y(y) = \frac{dF_Y}{dy}(y)$$  

Now we derived a general formula for the PDF of $Y = g(X)$, when $g$ is monotonic. Assume $g$ is strictly increasing and differentiable. Let $h$ be the inverse function of $g$. Then,  
$$
\begin{align}
F_Y(y) &= \prob(Y \leq y)\\
&= \prob(X \leq h(y)) \; \; \; \textrm{, because of monotonicity} \\\
&= F_X(h(y)) \\
\therefore f_Y(y) &= f_X(h(y)) |\frac{dh}{dy}(y)|
\end{align}
$$  

The slides also give an intuitive explanation for the above formula. Refer to Lec11 Slide12 for more details.  

Till now we have considered the case where $Y$ is a function of $X$ (linear/monotonic/non-monotonic). A different case is when $X$ and $Y$ are not functionally (one-to-one) related. An extreme example is when $X$ and $Y$ are totally independent. How do we calculate the PDF of $Z = g(X,Y)$ in that case? The method remains the same. We use the joint probability distribution of $X$ and $Y$ to find the CDF of $Z$. See Lec 11.9 for an example of the same.  

## Sum of Independent R.V.s; Covariance and Correlation 
*Lecture 12*

Here we see how to calculate the PMF/PDF of $X + Y$ when $X$ and $Y$ are independent variables, in continuous case, in discrete case and when $X$ and $Y$ are independent normals ($X+Y$ is also a normal in this case). These are all special cases of the $Z=g(X,Y)$ scenario studied in Derived Distributions.  Finally we study covariance and correlation. Covariance and Correlation play an important role in predicting value of one random variable, given another random variable (machine learning).  

**The Discrete Case**  
The PMF of $Z = X + Y$, when $X$ and $Y$ are independent and discrete, is given by:
$$\prob_Z(z) = \sum_x\prob(X=x,Y=z-x) = \sum_x\prob_X(x)\prob_Y(z-x)$$

The above operation, of deriving the distribution of the sum of two random variables, is known as Convolution of the  random variables.  

**The Continuous Case**  
The convolution of continuous independent random variables $X$ and $Y$ is given by:  
$$ f_Z(z) = \int_{-\infty}^{+\infty}f_X(x)f_Y(z-x)\,dx $$  


Using the Convolution Operation, we can see that if $X$ and $Y$ are independent normal variables, then $X+Y$ is also a normal variable with mean and variance that can be derived using the Linearity Rule of Expected Values and Variance (for independent RVs). In general, the sum of finitely many independent normals is normal.

### Covariance

Covariance basically tells us if two variables $X$ and $Y$ move in the same or different directions from their respective means. Positive Covariance indicates that $X$ and $Y$ go above/below their respective means simultaneously. Negative Covariance indicates that when $X$ goes above its mean, $Y$ goes below, and vice-versa.

$$\cov(X,Y) = \expect[(X - \expect[X])(Y - \expect[Y])]$$  

If $X$ and $Y$ are zero-mean random variables, then covariance is given by $\expect[XY]$. If both $X$ and $Y$ increase and decrease in tandem, then the values of $X$ vs $Y$ graph will lie in first and third quadrant and the covariance will be positive. If $Y$ decrease when $X$ increase and vice-versa, then the values will lie in second and forth quadrant and the covariance will be negative. If the two random variables are independent, then the covariance becomes $$\expect[XY] = \expect[X]\expect[Y] = 0\cdot 0 = 0$$ 

The covariance of independent variables is always zero, but variables whose covariance is zero are not always independent. In the figure below, it is evident that the covariance is zero, as $\expect[X]$ and $\expect[X]$ are both zero. However, the two variables are not independent. Knowing that $X=1$ tells us that $Y=0$.
<img src="images/cov_independence.png">  

**Properties**
* $\cov(X,X) = \v(X)$
* $\cov(X,Y) = \expect[XY] - \expect[X]\expect[Y]$
* $\cov(aX+b, Y) = a\,\cov(X,Y)$
* Suppose $X_1$ and $X_2$ are non-independent random variables. Then $\v(X_1 + X_2) = \v(X_1) + \v(X_2) + 2\cov(X_1, X_2)$  

### Correlation  

Correlation is the dimensionless version of covariance. The sign of covariance shows if two variables move away from their respective means in the same direction or different. But it is hard to make sense of the magnitude of covariance. Correlation is defined as an alternative :  
$$
\rho(X,Y) = \expect[\frac{X-\expect[X]}{\sigma_X}\cdot\frac{Y-\expect[Y]}{\sigma_Y}] = \frac{\cov(X,Y)}{\sigma_X\sigma_Y}
$$  

**Properties**
* $\rho(X,X) = \frac{\v(X)}{\sigma_X^2} = 1$   
* Indpendent RVs $\implies \rho=0$. But, $\rho=0 \notimplies $ Independence.
* $|\rho| = 1 \Leftrightarrow (X-\expect[X]) = c(Y-\expect[Y])$  
* $\cov(aX+b, Y) = a\cdot\cov(X,Y) \implies \rho(aX+b, Y) = \frac{a\,\cov(X, Y)}{|a| \sigma_X \sigma_Y} = sign(a)\rho(X,Y)$   

Thus, unlike variance, co-variance is not affected by linear stretching or contraction. This is an important property that makes co-variance meaningful. Notice that the correlation coefficient is the expected value of the product of z-scores of $X$ and $Y$.  

Lec 12.9 proves that this quantity always lies between $-1$ and $+1$. The gist is as follows:  
If $X$ and $Y$ have zero means and unit variances, $\rho(X,Y) = \expect[XY]$. Then  
$$
\begin{align}
0 \leq \expect[(X-\rho Y)^2] &= E[X^2] - 2 \rho \expect[XY] - \rho^2\expect[Y^2]\\
&= 1 - 2 \rho^2 + \rho^2\\
&= 1 -\rho^2\\
\end{align}
$$  
Thus, $\rho^2 \leq1$.  

Also, notice that if $\rho = \pm1$, then $X = \rho Y$, which means $X$ is a linear function of $Y$. All this can also be proved if $X$ and $Y$ don't have unit variance and zero mean, using a little more complicated calculations.  

Refer to Lec 12.10 and 12.11 for intuition behind correlation and its practical use.  



# Conditional Expectation as a RV; Sum of a Random Number of Independent R.V.s 
*Lecture 13*

We will study how  conditional expectations and conditional variances can be treated as a random variables. $\expect[X \mid Y]$ is a function of $Y$ and thus is itself a random variable. Then we use that knowledge to calculate the Expected Value and Variance of the sum of variable number of random variables.  

**Conditional Expectation**  

Let $g(Y)$ be a random variable that takes the value $\expect[X \mid Y=y]$, if $Y$ happens to take the value $y$.
$$
\begin{align}
g(y) &= \expect[X \mid Y=y] \\
g(Y) &= \expect[X \mid Y] \\
\expect[g(Y)] &= \expect[\expect[X \mid Y]] \\
\end{align}
$$  

The mean of $E[X \mid Y]$ is given by the Law of Iterated Expectations, which says:
$$ \expect[\expect[X \mid Y]] = \expect[X]$$  

Lec 13.4 uses Law of Iterated Expectations to solve the stick breaking problem.  

**Conditional Variance**  

$\v(X \mid Y)$ is the random variable that takes the value $\v(X \mid Y=y)$, when $Y= y$.  

Law of Total Variance says:  
$$\v(X) = \expect[\v(X \mid Y)] + \v(\expect[X \mid Y])$$  

In order to get an intuitive understanding of the Law of Iterated Expectations and Law of Total Variance, we consider the example problem of **Section Means and Variances**. We divide the students (sample space) into multiple sections. The experiment is to pick a student at random. Random variable $X$ denotes the marks of that student and random variable $Y$ denotes the section of that student. We find $\expect[X]$ and $\v(X)$ we use the law of iterated expectations, and the law of total variance. Also, we find an alternate way of looking at law of total variance:  

<center>
    $\v(X) =$ (average variability <b>within</b> each section) $+$ (variability <b>between</b> sections)
</center>  

### Sum of random number of Independent Random Variables  

Here we learn how to find the mean and variance of the sum of a random number of independent random variables. Let $N$ be a random variable denoting the number of stores visited, and let $X_1, X_2, \dots X_n$ be the money spent at each store. All the $X_i$s are independent identically distributed random variables. They are also independent of $N$. We want to derive details of the random variable $Y$ such that $$Y = \sum_iX_i$$

$$
\begin{align}
\expect[Y \mid N=n] &= \expect[X_1 + \dots + X_N \mid N = n] \\
&= \expect[X_1 + \dots + X_n \mid N = n] \\
&= \expect[X_1 + \dots + X_n] \\
&= n\expect[X]
\end{align}
$$  

Now, Total Expectation Theorem says that:  
$$ \expect[Y] = \sum_n p_N(n)\expect[Y \mid N=n] =  \sum_n p_N(n) n \expect[X] = \expect[N]\expect[X]$$  

This can also be derived using Law of Iterated Expectations:  
$$\expect[Y] = \expect[\expect[Y \mid N]] = \expect[N\expect[X]] = \expect[N]\expect[X]$$  

Total Variance of $Y$ can be derived using Law of Total Variance. The derivation is simple and mundane.