In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns; sns.set() 
from sklearn.model_selection import train_test_split

## Joint Distributions

Goal:  How do we build the Probability Mass Function (discrete) or the Probability Density Function (continuous) for multiple random variables with joint distributions?

PMF = $P_{x,y}(x=a, y=b), P_x(x) = \sum_{0}^y P_x(x,y)$  
PMF = $P_{x,y,z}(x=a,y=b,z=c), P_x(x) = \sum_{0}^y \sum_{0}^z P_{x,y,z}(x,y,z)$  

PDF = $f_{x,y,z}(x=a, y=b, z=c), f_x(x) = \int_0^y \int_0^z f_{x,y,z}(x,y,z)*dz*dy$

### Exercise:  
Given multiple variables, $X$ and $Y$, derive the expectation forumla for the sum of $X$ and $Y$, ie - $z = x + y$

1. $E[z] = \sum_{0}^x \sum_{0}^y (x+y) * P_{x,y}(x,y)  $  

2. $E[z] = \sum_{0}^x \sum_{0}^y (x*P_{x,y}(x,y) + y*P_{x,y}(x,y)$  

3. $E[z] = \sum_{0}^x \sum_{0}^y x*P_{x,y}(x,y) + \sum_{0}^x \sum_{0}^y y*P_{x,y}(x,y)$  

4. $E[z] = \sum_{0}^x x \sum_{0}^y P_{x,y}(x,y) + \sum_{0}^y y \sum_{0}^x P_{x,y}(x,y)$  

5. $E[z] = \sum_{0}^x x*P_x(x) + \sum_{0}^y y*P_y(y)$   

Note: this works due to the following in step 4:  

$P_x(x) = \sum_{0}^x P_{x,y}(x,y)$  

Given #5 above, $E[x+y] = E[x] + E[y]$  

## Exercise:  

Derive the expectation (mean) formula for the product of two discrete independent random variables.  

Given variables x,y, $P_{x,y}(x,y)$  

1.  $E[x*y] = \sum_{0}^x \sum_{0}^y x*y*P_{x,y}(x,y)$  
2.  $E[x*y] = \sum_{0}^x \sum_{0}^y x*y * P_x(x) * P_y(y)$  <= only b/c of given independence  
3.  $E[x*y] = \sum_{0}^x x*P_x(x) * \sum_{0}^y y*P_y(y)$  
4.  $E[x*y] = E[x] * E[y]$  

## Multivariate Gaussian Joint Distribution  

For multi-random variables $x_1, x_2, x_3, ...., x_d,$  
it is commonly written as vector $\overrightarrow{x} = [x_1, x_2, x_3, ....., x_d]$  
and is used in PMF and PDF like:  

$f_{\overrightarrow{x}}(\overrightarrow{x})$  

For multi-variate Gaussion Joint Distribution, the PDF:  

$f_{\overrightarrow{X}}(\overrightarrow{x}) = (1 / \sqrt{(2*\pi)^d * |C|}) * e^{(-(\overrightarrow{x} - \overrightarrow{\mu})^T*C^{-1}*(\overrightarrow{x} - \overrightarrow{\mu}) / 2)}$    


where:  
1. $\overrightarrow{x} = [x_1, x_2, x_3, ...., x_d]$  
2. $\overrightarrow{\mu} = [\mu_1, \mu_2, \mu_3, ...., \mu_d]$  
3. C is a $C_{dxd}$ positive definite matrix (covariance)   
4. $|C|$ is the determinent of matrix $C$  
5. $C^{-1}$ is the inverse of matrix C

## Conditioning Random Variables 

Recall:  
$P(A|B) = \frac{P(A\cap B)}{P(B)}$  

Thus:  
$P_{x,y}(x|y) = \frac{P_{x,y}(x,y)}{P_y(y)}$  
and  
$P_{y|\overrightarrow{x}}(y|\overrightarrow{x}) = \frac{P_{\overrightarrow{x},y}(\overrightarrow{x},y)}{P_y(y)}$  

### Independence of multiple random variables  
$P(A \cap B) = P(A) * P(B)$  
$P(x,y) = P(x) * P(y)$  

$P(A \cap B \cap C) = P(A) * P(B) * P(C)$  
$P(x,y,z) = P(x)*P(y)*P(z)$  

### Conditional Independence of multiple random variables  
Given $P(x_1, x_2)$ != $P(x_1)*P(x_2)$, ie - not independent,  

it is possible for conditional independence:  

$P(x_1, x_2|y) = P(x_1|y) * P(x_2|y)$  

## Multi-variate Bayes Classification  
Given:  
$\overrightarrow{x} = [x_1, x_2, x_3, ..., x_d]$  

y is a finite discrete random variable  

Then:
$P_{y|\overrightarrow{x}}(y|\overrightarrow{x}) = \frac{f_{\overrightarrow{x}|y}(\overrightarrow{x}|y) * P_y(y)}{f_{\overrightarrow{x}}(x)}$  

The goal is to answer $P_{y|\overrightarrow{x}}(y|\overrightarrow{x})$ by modeling $f_{\overrightarrow{x}|y}(\overrightarrow{x}|y)$  

If we were to model the right-hand side of the equation, that is called Generative Modeling, whereas if the data supports it and we can model the left-hand side of the equation, that is called Discremitative Modeling  

## Curse of Dimensionality  

As random variables (dimensions) increases, the amount of data needed to properly model all of them raises exponentially