# Fundamentals of Probability

The preceding notebook shows how a risk manager can characterize the risk of a portfolio using a frequency distribution. This process uses the tools of probability, a mathematical abstraction that constructs the distribution of random variable. These random variables are financial risk factors, such as movements in stock prices, in bond prices, in exchange rates, and in commodity prices. These risk factor are then transformed into profits and losses on the portfolio, which can be described by a probability distribution function.

Table of contents:
- [Characterize Random Variables](#characterize_random_variables)
- [Multivariate Distribution Functions](#multivariate_distribution_functions)
- [Functions of Random Variables](#functions_of_random_variables)
- [Important Distribution Functions](#important_distribution_functions)

## <a name="characterize_random_variables">Characterize Random Variables</a>
The classical approach to probability is based on the concept of the **random variable ($RV$)**. This can be viewed as the outcome from throwing a die. Each realization is generated from a fixed process. If the die is perfectly symmetrical, with six-faces, we could say that the probability of observing a face with a specified number in one throw is $p = 1/6$.
### Univariate distribution functions
For example, a random variable $X$ is characterized by a **distribution function**, 

$F(x) = P(X \leq x)$

which is the probability that the realization of the random variable $X$ ends up less than or equal to the given number $x$. This is also called a **cumulative distribution function**.

When the variable $X$ takes discrete values, this distribution is obtained by summing the step values less than or equal to $x$. That is

$F(x) = \Sigma_{x_{j} \leq x}f(x_{j})$

where the function $f(x)$ is called the **frequency function** or the **probability density function ($p.d.f$)**.

When the variable is continuous, the distribution is given by 

$F(x) = \int^{x}_{-\infty}f(u)du$

The density can be obtained from the distribution using 

$f(x) = \frac{dF(x)}{dx}$

Often, the random variable will be described interchangeably by its distribution or its density.

The density $f(u)$ must be positive for all $u$. As $x$ tends to infinity, the distribution tends to unity as it represents the total probability of any draw for $x$:

$\int^{\infty}_{-\infty}f(u)du = 1$

### Moments

A random variable is characterized by its distribution function. We can summarize it by a few parameters, or **moments**.

For instance, the expected value for $x$, or **mean**, is given by the integral 

$\mu = E(X) = \int^{+\infty}_{-\infty}xf(x)dx$

which measures the central tendency, or center of gravity of the population.

The distribtion can also be described by its **quantile**, which is the cutoff point $x$ with an associated probability $c$:

$F(x) = \int^{x}_{-\infty}f(u)du$ = c

Define this quantile as $Q(X, c)$. The $50%$ quantile is known as the **median**.

In fact, value at risk ($VAR$) can be interpreted as the cutoff point such that a loss will not happen with probability greater than $p = 95%$. If $f(u)$ us the distribution of profit and losses on the portfolio, $VAR$ is defined from 

$F(x) = \int^{x}_{-\infty}f(u)du = (1 - p)$

where $p$ is the right-tail probability, and $c$ the usual left-tail probability. $VAR$ can be defined as minus the quantile itself, or alternatively, the deviation between the expected value and the quantile, 

$VAR(c) = E(X) - Q(X, c)$

Note that $VAR$ is typically reported as a loss (i.e., a positive number), which explains the negative sign.

Another useful moment is the squared dispersion around the mean, or **variance**:

$\sigma^{2} = V(X) = \int^{+\infty}_{-\infty}[x - E(X)]^{2}f(x)dx$

The **standard deviation** is more convenient to use as it has the same units as the original variable $X$:

$SD(X) = \sigma = \sqrt{V(X)}$

The scaled third moment is the **skewness**, which describes departures from symmetry. It is defined as

$\gamma = (\int^{+\infty}_{-\infty}[x - E(X)]^{3}f(x)dx)/\sigma^{3}$

Negative skewness indicates that the distribution has a long left tail, which indicates a high probability of observing large negative values. If this represents the distribution of profits and losses for a portfolio, this is a dangerous situation.

The scaled fourth moment is the **kurtosis**, which describes the degree of flatness of a distribution, or width of its tails. It is defined as 

$\delta = (\int^{+\infty}_{-\infty}[x - E(X)]^{4}f(x)dx)/\sigma^{4}$

Because of the fourth power, large observations in the tail will have a large weight and hence create large kurtosis. Such a distribution is called **leptokurtic**, or **fat-tailed**. This parameter is very important for risk management. A kurtosis of $3$ is considered average, and represents a normal distribution. High kurtosis indicates a higher probability of extreme movements. A distribution with kurtosis lower than $3$ is called **platykurtic**.

## <a name="multivariate_distribution_functions">Multivariate Distribution Functions</a>
In practice, portfolio payoffs depend on numerous random variables. 
### Joint distributions
We can extend univariate distribution function to 

$F_{12}(x_{1}, x_{2}) = P(X_{1} \leq x_{1}, X_{2} \leq x_{2}) $

which defines a joint bivariate distributiion function. In the continuous case, this is also

$F_{12}(x_{1}, x_{2}) = \int^{x_{1}}_{-\infty}\int^{x_{2}}_{-\infty}f_{12}(u_{1}, u_{2})du_{1}du_{2}$

where $f(u_{1}, u_{2})$ is now the **joint density**.

The analysis simplifies considerably if the variables are **independent**. In this case, the joint density separates out into the product of the densities:

$f_{12}(u_{1}, u_{2}) = f_{1}(u_{1}) \times f_{2}(u_{2})$

and the integral reduces to 

$F_{12}(x_{1}, x_{2}) = F_{1}(x_{1}) \times F_{2}(x_{2})

It's also useful to characterize the distribution of x_{1} abstracting from x_{2}. By integrating over all values of x_{2}, we obtain the **marginal density**:

$f_{1}(x_{1}) = \int^{\infty}_{-\infty}f_{12}(x_{1}, u_{2})du_{2}$

and similarly for $x_{2}$. We can then define the **conditional density** as 

$f_{1 \cdot 2}(x_{1} | x_{2}) = \frac{f_{12}(x_{1}, x_{2})}{f_{2}(x_{2})}$

Here we keep $x_{2}$ fixed and divide the joint density by the marginal probability of $x_{2}$. This normalization is necessary to ensure that the conditional density is a proper density function that integrates to one. This relationship is also known as **Bayes' rule**.

### Covariances and correlations
When dealing with two random variables, the comovement can be described by the **covariance**:

$Cov(X_{1}, X_{2}) = \sigma_{12} = \int_{1}\int_{2}[x_{1} - E(X_{1})][x_{2} - E(X_{2})]f_{12}(x_{1}, x_{2})dx_{1}dx_{2}$

It's often useful to scale the covariance into a unitless number, called the **correlation coefficient**, obtained as 

$\rho(X_{1}, X_{2}) = \frac{Cov(X_{1}, X_{2})}{\sigma_{1}\sigma_{2}}$

The correlation coefficient is a measure of linear dependence. The correlation coefficient always lies in the $[-1, +1]$ interval. A correlation of $1$ means that the two variables always move in the same direction. A correlation of $-1$ means that the two variables always move in the opposite direction.

The above-mentioned equation is also called **Pearson correlation**. Another measure is the **Spearman correlation**, which replaces the value of the variables by their rank. This nonparametric measure is less sensitive to outliers, and hence more robust than the usual correlation when there might be errors in the data.

If the variables are independent, the two variables are said to be **uncorrelated**. Independence implies $0$ correlation (the reverse is not true, however).

## <a name="functions_of_random_variables">Functions of Random Variables</a>
Risk management is about uncovering the distribution of portfolio values. Consider a security that depends on a unique source of risk, such as bond. The risk manager could model the change in the bond price as a random variable ($RV$) directly. The problem with this choice is that the distribution of the bond price is not stationary, because the price converges to the face value at expiration.

Instead, the practice is to model the change in yields as a random variable because its distribution is better behaved. The next step is to use the relationship between the bond price and the yield to uncover the distribution of the bond price.

This illustrates a general principle of risk management, which is to model the risk factor first, then to derive the distribution of the instrument from information about the function that links the instrument value to the risk factor.
### Linear transformation of random variables
Consider a transformation that multiplies the original random variable by a constant and add a fixed amount, $Y = a + bX$. The expectation of $Y$ is 

$E(a + bX) = a + bE(X)$

and its variance is 

$V(a + bX) = b^{2}V(X)$

Note that adding a constant never affects the variance since the computation involves the difference between the variable and its mean. The standard deviation is 

$SD(a + bX) = bSD(X)$
### Sum of random variables
The expectation of the sum $Y = X_{1} + X_{2}$ can be written as 

$E(X_{1} + X_{2}) = E(X_{1}) + E(X_{2})$

and its variance is 

$V(X_{1} + X_{2}) = V(X_{1}) + V(X_{2}) + 2Cov(X_{1}, X_{2})$

When the variables are uncorrelated, the variance of the sum reduces to the sum of variances. Otherwise, we have to account for the cross-product term.

### Portfolios of random variables
More generally, consider a linear combination of a number of random variables. This could be a portfolio with fixed weights, for which the rate of return is 

$Y = \Sigma^{N}_{i=1}w_{i}X_{i}$

where $N$ is the number of assets, $X_{i}$ is the rate of return on asset $i$, and $w_{i}$ its weight.

In matrix notation:

$Y = w_{1}X_{1} + ... + w_{N}X_{N} = \begin{bmatrix} w_{1}&w_{2}&\dots &w_{N} \end{bmatrix} \begin{bmatrix} X_{1} \\ X_{2} \\ \vdots \\ X_{N} \end{bmatrix} = w'X$

The portfolio expected return is now 

$E(Y) = \mu_{p} = \Sigma^{N}_{i=1}w_{i}\mu_{i}$

which is a weighted average of the expected returns $\mu_{i} = E(X_{i})$. The variance is 

$V(Y) = \sigma^{2}_{p} = \Sigma^{N}_{i=1}w^{2}_{i}\sigma^{2}_{i} +\Sigma^{N}_{i=1}\Sigma^{N}_{j=1,j\neq i}w_{i}w_{j}\sigma_{ij} = \Sigma^{N}_{i=1}w^{2}_{i}\sigma^{2}_{i} + 2\Sigma^{N}_{i=1}\Sigma^{N}_{j<i}w_{i}w_{j}\sigma_{ij}$

Using matrix notation:

$\sigma^{2}_{p} = \begin{bmatrix} w_{1}&\dots&w_{N} \end{bmatrix} \begin{bmatrix} \sigma_{11}&\sigma_{12}&\sigma_{13}&\dots&\sigma_{1N} \\ \vdots & & & & \vdots \\ \sigma_{N1}&\sigma_{N2}&\sigma_{N3}&\dots&\sigma_{NN} \end{bmatrix} \begin{bmatrix} w_{1} \\ \vdots \\ w_{N} \end{bmatrix} = w'\Sigma w$

Define $\Sigma$ as the covariance matrix. This is a useful expression to describe the risk of the total portfolio.

### Product of random variables
The expectation of the product $Y = X_{1}X_{2}$ can ve written as

$E(X_{1} X_{2}) = E(X_{1})E(X_{2}) + Cov(X_{1}, X_{2})$

When the variables are independent, this reduces to the product of the means. 

The variance is more complex to evaluate. With independence, it reduces to:

$V(X_{1} X_{2}) = E(X_{1})^{2}V(X_{2}) + V(X_{1})E(X_{2})^{2} + V(X_{1})V(X_{2})$

### Distributions of transformations of random variables
The distribution of the transformed variable $Y = g(X)$ is usually complicated for all but the simplest transformations $g(\cdot)$ and densities $f(X)$. Even if there is no closed-form solution for the density, we can describe the cumulative distribution function of $Y$ when $g(X)$ is a one-to-one transformation from $X$ into $Y$. This implies that the function can be inverted, or that for a given $y$, we can find $x$ such that $x = g^{-1}(y)$. We can then write

$P[Y \leq y] = P[g(X) \leq y] = P[X \leq g^{-1}(y)] = F_{x}(g^{-1}(y))$

where $F(\cdot)$ is the cumulative distributive function of $X$. Here, we assumed the relationship is positive. Otherwise, the right-hand term is changed to $1 - F_{x}(g^{-1}(y))$.

This allows us to derive the quantile of the bond price from information about the probability distribution of the yield. Suppose we consider a zero-coupon bond, for which the market value $V$ is 

$V = \frac{100}{(1 + r)^{T}}$

where $r$ is the yield. 

$r = (100/V)^{1/T} - 1$

Unfortunately, this method cannot be easily extended. For general density functions and transformations, risk managers turn to numerical methods, especially when the number of random variables is large. This is why credit risk models all describe the distribution of credit losses through simulations.

## <a name="important_distribution_functions">Important Distribution Functions</a>
### Uniform distribution
$f(x) = \frac{1}{b - a}$, $a \leq x \leq b$

This density function is constant and indeed integrates to unity. This distribution puts the same weight on each observation within the allowable range. We denote this distribution as $U(a, b)$. It's mean and variance are given by

$E(X) = \frac{a + b}{2}$

$V(X) = \frac{(b - a)^{2}}{12}$

The uniform distribution $U(0, 1)$ is widely used as a starting distribution for generating random variables from any distribution $F(Y)$ in simulations. We need to have analytical formulas for the $p.d.f$ $f(Y)$ and its sumulative distribution $F(Y)$. As any cumulative distribution function ranges from zero to unity, we first draw $X$ from $U(0, 1)$ and then compute $y = F^{-1}(x)$. The random variable will then have the desired distribution $f(Y)$.

### Normal distribution
The daily rate of return in a stock price has a distribution similar to the normal $p.d.f$. The normal distribution can be characterized by its first two moments only, the mean $\mu$ and variance $\sigma^{2}$. The first parameter represents the location; the second, the dispersion. 

$f(x) = \frac{1}{\sqrt{2\pi \sigma^{2}}} exp[-\frac{1}{2\sigma^{2}}(x - \mu)^{2}]$

Its mean is $E[X] = \mu$ and variance $V[X] = \sigma^{2}$. We denote this distribution as $N(\mu, \sigma^2)$. Because the function can be fully specified by these two parameters, it's called a **parametric function**.

Instead of having to deal with different parameters, it's often more convenient to use a **standard normal variable** as $\epsilon$, which has been standardized, or normalized, so that $E(\epsilon) = 0, V(\epsilon) = \sigma(\epsilon) = 1$.

First, note that the function is symmetrical around the mean. Its mean of zero is the same as its **mode** (which is also the most likely, or highest, point on this curve) and **median** (which is such that the area to the left is a $50\%$ probability). The skewness of a normal distribution is $0$, which indicates that it is symmetrical around the mean. The kurtosis of a normal distribution is $3$. Distributions with fatter tails have a greater kurtosis coefficient.

$\int^{\infty}_{-\alpha}f(\epsilon)d\epsilon = c$

For example, $-\alpha = -1.645$ is the quantile that corresponds to a $95\%$ probability (confidence level).

This distribution plays a central role in finance because it represents adequately the behavior of many financial variables. For instance, it enters the Black-Scholes option pricing formula where the function $N(\cdot)$ represents the cumulative standardized normal distribution function.

The distribution of any normal variable can then be recovered from that of the standard normal, by defining

$X = \mu + \epsilon\sigma$

We can show that $X$ has indeed the desired moments, as $E(X) = \mu + E(\epsilon)\sigma = \mu$ and $V(X) = V(\epsilon)\sigma^{2} = \sigma^{2}$.

Define the random variable as the change in the dollar value of a portfolio. The expected value is $E(X) = \mu$. To find the quantile of $X$ at the specified confidence level $c$, we replace $\epsilon$ by $-\alpha$. This gives $Q(X, c) = \mu - \alpha\sigma$. We then can compute $VAR$ as

$VAR = E(X) - Q(X, c) = \mu - (\mu - \alpha\sigma) = \alpha\sigma$

The normal distribution is one of the few distributions that is stable under addition. In other words, a linear combination of joint normally distribured random variables has a normal distribution. This is extremely useful because we need to know only the mean and variance of the portfoliio to reconstruct its whole distribution.