# Random Variables

Random variables are variables that take on random values. They are used to model uncertainty in a system. For example, if we are trying to predict the weather tomorrow, we might use a random variable to represent the temperature. We know that the temperature will be some value, but we don't know what it will be. We can use a random variable to represent this uncertainty.

Random variables are usually denoted by capital letters, such as $X$ or $Y$. The values that a random variable can take on are denoted by lower case letters, such as $x$ or $y$.

Random variables can be discrete or continuous. Discrete random variables can only take on a finite number of values. For example, if we are rolling a die, we can use a random variable to represent the number that we roll. The random variable can only take on the values 1, 2, 3, 4, 5, or 6. Continuous random variables can take on any value in a range. For example, if we are measuring the temperature, we can use a random variable to represent the temperature. The random variable can take on any value between -273.15 and infinity.

## Sample Space $\Omega$

The sample space is the set of all possible outcomes of an experiment. For example, if we are rolling a die, the sample space is the set of all possible numbers that we can roll. The sample space is usually denoted by $\Omega$.

Coin Flip 1 | Coin Flip 2 |
------------| ------------|
H | H |
H | T |
T | H |
T | T |

Sample Space $\Omega$ (uppercase omega) is the set of all possible worlds 

$\Omega = \{HH, HT, TH, TT\}$

## Possible World $\omega_i$

A possible world is a possible outcome of an experiment. For example, if we are rolling a die, a possible world is the number that we roll. A possible world is usually denoted by $\omega_i$.

Possible world | Coin Flip 1 | Coin Flip 2 |
---------------|------------| ------------|
$\omega_1$ | H | H |
$\omega_2$ | H | T |
$\omega_3$ | T | H |
$\omega_4$ | T | T |

Sample space $\Omega$ is

## Probability Model $P(\omega)$

Probability model is a function that assigns a probability to each possible world. For example, if we are rolling a die, the probability model assigns a probability to each number that we can roll. The probability model is usually denoted by $P(\omega)$.

Possible world | Coin Flip 1 | Coin Flip 2 | $P(\omega)$ |
---------------|------------| ------------|-------------|
$\omega_1$ | H | H | 0.25 |
$\omega_2$ | H | T | 0.25 |
$\omega_3$ | T | H | 0.25 |
$\omega_4$ | T | T | 0.25 |

Sample space $\Omega$ is the set of all possible worlds 

$$ 0 \lt P(\omega_i) \lt 1 \text{~for every~} \omega_i$$ 

$$ \sum_{\omega \in \Omega} P(\omega_i) = 1$$

Random variables can be either independent or dependent. If two random variables are independent, then the value of one random variable does not affect the value of the other random variable. For example, if we are rolling two dice, we can use two random variables to represent the numbers that we roll. The two random variables are independent because the value of one die does not affect the value of the other die. If two random variables are dependent, then the value of one random variable does affect the value of the other random variable. For example, if we are measuring the temperature and the humidity, we can use two random variables to represent the temperature and the humidity. The two random variables are dependent because the temperature affects the humidity and the humidity affects the temperature.

## Joint Probability

Joint probability is the probability of two events occurring together. For example, if we are rolling two dice, the joint probability is the probability of rolling a 1 on the first die and a 2 on the second die. The joint probability is usually denoted by $P(A, B)$.

$$ P(A, B) = P(A | B) \cdot P(B) = P(B | A) \cdot P(A)$$

In Data Science, we rarely know the true joint probability. Instead, we estimate the joint probability from data. For example, if we are rolling two dice, we can estimate the joint probability by counting the number of times that we roll a 1 on the first die and a 2 on the second die and dividing by the total number of rolls.

## Conditional Probability

Conditional probability is the probability of one event occurring given that another event has occurred. 

For example, if we are rolling two dice, the conditional probability is the probability of rolling a 1 on the first die given that we rolled a 2 on the second die. The conditional probability is usually denoted by $P(A | B)$.

$$ P(A | B) = \frac{P(A, B)}{P(B)} $$

## Marginalizing from Joint Probability

Marginal probability is the probability of one event occurring. For example, if we are rolling two dice, the marginal probability is the probability of rolling a 1 on the first die. The marginal probability is usually denoted by $P(A)$.

$$ P(A) = \sum_{B=b} P(A, B) $$

## Bayes Theorem

Bayes theorem is a way of calculating conditional probability. For example, if we are rolling two dice, Bayes theorem can be used to calculate the probability of rolling a 1 on the first die given that we rolled a 2 on the second die.

$$ P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)} $$

$P(A|B)$ in the context of Bayes theorem is called the posterior probability. 

$P(B|A)$ is called the likelihood. 

$P(A)$ is called the prior probability. 

$P(B)$ is called the marginal likelihood.

$$ P(\text{Posterior}) = \frac{P(\text{Likelihood})\cdot P(\text{Prior})}{P(\text{Evidence})}$$


## Expectation

Expectation is the average value of a random variable. For example, if we are rolling a die, the expectation is the average value of the number that we roll. The expectation is usually denoted by $E(X)$.

$$ E(X) = \sum_{x \in X} x \cdot P(X=x) $$

## Variance

Variance is a measure of how spread out a random variable is. For example, if we are rolling a die, the variance is a measure of how spread out the numbers that we roll are. The variance is usually denoted by $Var(X)$.

$$ Var(X) = E((X - E(X))^2) = E(X^2) - E(X)^2 $$

## Covariance

Covariance is a measure of how two random variables vary together. For example, if we are rolling two dice, the covariance is a measure of how the numbers that we roll on the two dice vary together. The covariance is usually denoted by $Cov(X, Y)$.

$$ Cov(X, Y) = E((X - E(X)) \cdot (Y - E(Y))) = E(X \cdot Y) - E(X) \cdot E(Y) $$

## Correlation

Correlation is a measure of how two random variables vary together. For example, if we are rolling two dice, the correlation is a measure of how the numbers that we roll on the two dice vary together. The correlation is usually denoted by $Corr(X, Y)$.

$$ Corr(X, Y) = \frac{Cov(X, Y)}{\sqrt{Var(X) \cdot Var(Y)}} $$