# Fundamentals of Probability

The preceding notebook shows how a risk manager can characterize the risk of a portfolio using a frequency distribution. This process uses the tools of probability, a mathematical abstraction that constructs the distribution of random variable. These random variables are financial risk factors, such as movements in stock prices, in bond prices, in exchange rates, and in commodity prices. These risk factor are then transformed into profits and losses on the portfolio, which can be described by a probability distribution function.

Table of contents:
- [Characterize Random Variables](#characterize_random_variables)
- [Multivariate Distribution Functions](#multivariate_distribution_functions)

## <a name="characterize_random_variables">Characterize Random Variables</a>
The classical approach to probability is based on the concept of the **random variable ($RV$)**. This can be viewed as the outcome from throwing a die. Each realization is generated from a fixed process. If the die is perfectly symmetrical, with six-faces, we could say that the probability of observing a face with a specified number in one throw is $p = 1/6$.
### Univariate distribution functions
For example, a random variable $X$ is characterized by a **distribution function**, 

$F(x) = P(X \leq x)$

which is the probability that the realization of the random variable $X$ ends up less than or equal to the given number $x$. This is also called a **cumulative distribution function**.

When the variable $X$ takes discrete values, this distribution is obtained by summing the step values less than or equal to $x$. That is

$F(x) = \Sigma_{x_{j} \leq x}f(x_{j})$

where the function $f(x)$ is called the **frequency function** or the **probability density function ($p.d.f$)**.

When the variable is continuous, the distribution is given by 

$F(x) = \int^{x}_{-\infty}f(u)du$

The density can be obtained from the distribution using 

$f(x) = \frac{dF(x)}{dx}$

Often, the random variable will be described interchangeably by its distribution or its density.

The density $f(u)$ must be positive for all $u$. As $x$ tends to infinity, the distribution tends to unity as it represents the total probability of any draw for $x$:

$\int^{\infty}_{-\infty}f(u)du = 1$

### Moments

A random variable is characterized by its distribution function. We can summarize it by a few parameters, or **moments**.

For instance, the expected value for $x$, or **mean**, is given by the integral 

$\mu = E(X) = \int^{+\infty}_{-\infty}xf(x)dx$

which measures the central tendency, or center of gravity of the population.

The distribtion can also be described by its **quantile**, which is the cutoff point $x$ with an associated probability $c$:

$F(x) = \int^{x}_{-\infty}f(u)du$ = c

Define this quantile as $Q(X, c)$. The $50%$ quantile is known as the **median**.

In fact, value at risk ($VAR$) can be interpreted as the cutoff point such that a loss will not happen with probability greater than $p = 95%$. If $f(u)$ us the distribution of profit and losses on the portfolio, $VAR$ is defined from 

$F(x) = \int^{x}_{-\infty}f(u)du = (1 - p)$

where $p$ is the right-tail probability, and $c$ the usual left-tail probability. $VAR$ can be defined as minus the quantile itself, or alternatively, the deviation between the expected value and the quantile, 

$VAR(c) = E(X) - Q(X, c)$

Note that $VAR$ is typically reported as a loss (i.e., a positive number), which explains the negative sign.

Another useful moment is the squared dispersion around the mean, or **variance**:

$\sigma^{2} = V(X) = \int^{+\infty}_{-\infty}[x - E(X)]^{2}f(x)dx$

The **standard deviation** is more convenient to use as it has the same units as the original variable $X$:

$SD(X) = \sigma = \sqrt{V(X)}$

The scaled third moment is the **skewness**, which describes departures from symmetry. It is defined as

$\gamma = (\int^{+\infty}_{-\infty}[x - E(X)]^{3}f(x)dx)/\sigma^{3}$

Negative skewness indicates that the distribution has a long left tail, which indicates a high probability of observing large negative values. If this represents the distribution of profits and losses for a portfolio, this is a dangerous situation.

The scaled fourth moment is the **kurtosis**, which describes the degree of flatness of a distribution, or width of its tails. It is defined as 

$\delta = (\int^{+\infty}_{-\infty}[x - E(X)]^{4}f(x)dx)/\sigma^{4}$

Because of the fourth power, large observations in the tail will have a large weight and hence create large kurtosis. Such a distribution is called **leptokurtic**, or **fat-tailed**. This parameter is very important for risk management. A kurtosis of $3$ is considered average, and represents a normal distribution. High kurtosis indicates a higher probability of extreme movements. A distribution with kurtosis lower than $3$ is called **platykurtic**.

## <a name="multivariate_distribution_functions">Multivariate Distribution Functions</a>
In practice, portfolio payoffs depend on numerous random variables. 
### Joint distributions
We can extend univariate distribution function to 

$F_{12}(x_{1}, x_{2}) = P(X_{1} \leq x_{1}, X_{2} \leq x_{2}) $

which defines a joint bivariate distributiion function. In the continuous case, this is also

$F_{12}(x_{1}, x_{2}) = \int^{x_{1}}_{-\infty}\int^{x_{2}}_{-\infty}f_{12}(u_{1}, u_{2})du_{1}du_{2}$

where $f(u_{1}, u_{2})$ is now the **joint density**.

The analysis simplifies considerably if the variables are **independent**. In this case, the joint density separates out into the product of the densities:

$f_{12}(u_{1}, u_{2}) = f_{1}(u_{1}) \times f_{2}(u_{2})$

and the integral reduces to 

$F_{12}(x_{1}, x_{2}) = F_{1}(x_{1}) \times F_{2}(x_{2})

It's also useful to characterize the distribution of x_{1} abstracting from x_{2}. By integrating over all values of x_{2}, we obtain the **marginal density**:

$f_{1}(x_{1}) = \int^{\infty}_{-\infty}f_{12}(x_{1}, u_{2})du_{2}$

and similarly for $x_{2}$. We can then define the **conditional density** as 

$f_{1 \cdot 2}(x_{1} | x_{2}) = \frac{f_{12}(x_{1}, x_{2})}{f_{2}(x_{2})}$

Here we keep $x_{2}$ fixed and divide the joint density by the marginal probability of $x_{2}$. This normalization is necessary to ensure that the conditional density is a proper density function that integrates to one. This relationship is also known as **Bayes' rule**.

### Covariances and correlations
When dealing with two random variables, the comovement can be described by the **covariance**:

$Cov(X_{1}, X_{2}) = \sigma_{12} = \int_{1}\int_{2}[x_{1} - E(X_{1})][x_{2} - E(X_{2})]f_{12}(x_{1}, x_{2})dx_{1}dx_{2}$

It's often useful to scale the covariance into a unitless number, called the **correlation coefficient**, obtained as 

$\rho(X_{1}, X_{2}) = \frac{Cov(X_{1}, X_{2})}{\sigma_{1}\sigma_{2}}$

The correlation coefficient is a measure of linear dependence. The correlation coefficient always lies in the $[-1, +1]$ interval. A correlation of $1$ means that the two variables always move in the same direction. A correlation of $-1$ means that the two variables always move in the opposite direction.

The above-mentioned equation is also called **Pearson correlation**. Another measure is the **Spearman correlation**, which replaces the value of the variables by their rank. This nonparametric measure is less sensitive to outliers, and hence more robust than the usual correlation when there might be errors in the data.

If the variables are independent, the two variables are said to be **uncorrelated**. Independence implies $0$ correlation (the reverse is not true, however).