# Modeling Risk Factors
We are going to analyze the distribution of risk factors used in financial risk management. A common practice is to use the volatility as a single measure of dispersion. More generally, risk managers need to consider the entire shape of the distribution as well as potential variation in time of this distribution.

The normal distribution is a useful starting point due to its attractive properties. Unfortunately, most financial time series are characterized by fatter tails than the normal distribution. In addition, there is ample empirical evidence that risk changes in a predictable fashion. This phenomenon, called **volatility clustering**, could also explain the appearance of fat tails. Extreme observations could be drawn from periods with high volatility. This could cause the appearance of fat tails when combining periods of low and high volatility.

* [Real Data](#real-data)
* [Normal and Lognormal Distributions](#normal-and-lognormal-distributions)
* [Distributions with Fat Tails](#distributions-with-fat-tails)
* [Time Variation in Risk](#time-variation-in-risk)

## <a name="real-data">Real Data</a>

### Measuring Returns
We observe movements in the daily yen/dollar exchange rate and wish to characterize the distribution of tomorrow's exchange rate. The risk manager's job is to assess the range of potential gains and losses on a trader's position. There is a sequence of past prices $P_{0}, P_{1}, ...,P_{t}$, from which the distribution of tomorrow's price, $P_{t+1}$, should be inferred.

The truly random component in tomorrow's price is not its level, but rather its change relative to today's price. We measure the relative rate of change in the spot price:

$r_{t} = (P_{t} - P_{t-1})/P_{t-1}$

Alternately, we could construct the logarithm of the price ratio:

$R_{t} = ln[P_{t}/P_{t-1}]$

which is equivalent to using continuous instead of discrete compounding. This is also 

$R_{t} = ln[1 + (P_{t} - P_{t-1})/P_{t-1}] = ln[1 + r_{t}]$

Because $ln(1 + x)$ is close to $x$ if $x$ is small, $R_{t}$ should be close to $r_{t}$ provided the return is small. For daily data, there is typically little difference between $R_{t}$ and $r_{t}$.

The next question is whether the sequence of variables $r_{t}$ can be viewed as independent observations. Independent observations have the very nice property that their joint distribution is the product of their marginal distribution, which considerably simplifies the analysis. The obvious question is whether this assumption is a workable approximation. In fact, there are good economic reasons to believe that rates of change on financial prices are close to independent.

The hypothesis of **efficient markets** postulates that current prices convey all relevant information about the asset. If so, any change in the asset price must be due to news, or events that are by definition impossible to forecast (otherwise, the event would not be news). This implies that changes in prices are unpredictable and, hence, satisfy our definition of independent random variables.

This hypothesis, also known as **random walk** theory, implies that the conditional distribution of returns depends on only current prices, and not on the previous history of prices. If so, technical analysis must be a fruitless exercise. Technical analysts try to forecast price movements from past price patterns. If in addition the distribution of returns is constant over time, the variables are said to be **independent and identically distributed** (i.i.d.).

### Time Aggregation
It is often necessary to translate parameters over a given horizon to another horizon. For example, we have data for faily returns, from which we compute a daily volatility that we want to extend to a monthly volatility. This is a **time aggregation** problem.

Returns can be easily aggregated when we use the log of the price ratio, because the log of a product is the sum of the logs of the individual terms. Over two periods, for instance, the price movement can be described as the sum of the price movements over each day:

$R_{t,2} = ln(P_{t}/P_{t-1}) + ln(P_{t-1}/P_{t-2}) = R_{t-1} + R_{t}$

The expected return and variance are then $E(R_{t,2}) = E(R_{t-1}) + E(R_{t})$ and $V(R_{t,2}) = V(R_{t-1}) + V(R_{t}) + 2Cov(R_{t-1}, R_{t})$. Assuming returns are uncorrelated (i.e., that the covariance term is zero) and have identical distributions across days, we have $E(R_{t,2}) = 2E(R_{t})$ and $V(R_{t,2}) = 2V(R_{t})$.

More generally, define $T$ as the number of steps. The multiple-period expected return and volatility are 

$\mu_{T} = \mu T$

$\sigma_{T} = \sigma \sqrt{T}$

When successive returns are uncorrelated, the volatility increases as the horizon extends following the square root of time.

Assume now that the distribution is stable under addition, which means that it stays the same whether over one period or over multiple periods. This is the case for the normal distribution. If so, we can use the same multiplier $\alpha$ that corresponds to a selected confidence level for a one-period and T-period return. The multiple-period $VAR$ is 

$VAR_{T} = \alpha(\alpha \sqrt{T})W = VAR_{1}\sqrt{T}$

In other words, extension to a multiple period follows a square root of time rule. In summary, the square root of time rule applies to parametric $VAR$ under the following conditions:
* The distribution is the same at each period (i.e., there is no predictable time variation in expected return nor in risk).
* Returns are uncorrelated across each period.
* The distribution is the same for one- or T-period, or is stable under addition, such as the normal distribution.

If returns are not independent, we may be able to characterize longer-term risks. For instance, when returns follow a first-order autoregressive process, 

$R_{t} = \rho R_{t-1} + u_{t}$

we can write the variance of two-day returns as 

$V[R_{t} + R_{t-1}] = \sigma^{2} \times 2[1 + \rho]$

In this case, 

$VAR_{2} = \alpha(\sigma \sqrt{2(1 + \rho)})W = [VAR_{1}\sqrt{2}]\sqrt{1 + \rho}$

Because we are considering correlations in the time series of the same variable, $\rho$ is called the **autocorrelation coefficient**, or the **serial autocorrelation coefficient**. A positive value for $\rho$ describes a situation where a movement in one direction is likely to be followed by another in the same direction. This implies that markets display **trends**, or **momentum**. In this case, the longer-term volatility increases faster than with the usual square root of time rule.

A negative value for $\rho$, by contrast, describes a situation where a movement in one direction is likely to be reversed later. This is an example of **mean reversion**. In this case, the longer-term volatility increases more slowly than with the usual square root of time rule.

### Portfolio Aggregation
Let us now turn to aggregation of returns across assets. Consider, for example, an equity portfolio consisting of investments in $N$ shares, Define the number of each share held as $q_{i}$ with unit price $S_{i}$. The portfolio value at time $t$ is then

$W_{t} = \Sigma^{N}_{i=1}q_{i}S_{i,t}$

We can write the weight assigned to asset $i$ as

$w_{i,t} = \frac{q_{i}S_{i,t}}{W_{t}}$

which by construction sum to unity. Using weights, however, rules out situations with zero net investment, $W_{t} = 0$, such as some derivatives positions. But we could have positive and negative weights if short selling is allowed, or weights greater than one if the portfolio can be leveraged.

The next period, the portfolio value is 

$W_{t+1} = \sum^{N}_{i=1}q_{i}S_{i,t+1}$

assuming that the unit price incorporates any income payment. The gross, or dollar, return is then

$W_{t+1} - W_{t} = \sum^{N}_{i=1}q_{i}(S_{i,t+1} - S_{i,t})$

and the rate of return is

$\frac{W_{t+1}\: - W_{t}}{W_{t}} = \Sigma^{N}_{i=1} \frac{q_{i}S_{i,t}}{W_{t}} \frac{S_{i,t+1}\; - S_{i,t}}{S_{i,t}} = \Sigma^{N}_{i=1} w_{i,t} \frac{S_{i,t+1}\; - S_{i,t}}{S_{i,t}}$

So, the portfolio rate of return is a linear combination of the asset returns

$r_{p,t+1} = \sum^{N}_{i=1}w_{i,t}r_{i,t+1}$

The dollar return is then 

$W_{t+1} - W_{t} = [\sum^{N}_{i=1}w_{i,t}r_{i,t+1}]W_{t}$

and has a normal distribution if the individual returns are also normally distributed.

Alternatively, we could express the individual positions in dollar terms,

$x_{i,t} = w_{i,t} W_{t} = q_{i}S_{i,t}$

The dollar return is also, using dollar amounts,

$W_{t+1} - W_{t} = [\sum^{N}_{i=1}x_{i,t}r_{i,t+1}]$

The variance of the portfolio dollar return is 

$V[W_{t+1} - W_{t}] = x'\sum x$

Because the portfolio follows a normal distribution, it is fully characterized by its expected return and variance. The portfolio $VAR$ is then 

$VAR = \alpha\sqrt{x'\sum x}$

where $\alpha$ depends on the confidence level and the selected density function.

## <a name="">Normal and Lognormal Distributions</a>

### Normal Distribution
The normal, or Gaussian, distribution is usually the first choice when modeling asset returns. This distribution plays a special role in statistics, as it is easy to handle and is stable under addition, meaning that a combination of jointly normal variables is itself normal. It also provides the limiting distribution of the average of independent random variables (through the central limit theorem).

Empirically, the normal distribution provides a rough, first-order approximation to the distribution of many random variables: rates of changes in currency prices, rates of changes in stock prices, rates of changes in bond prices, changes in yields, and rates of changes in commodity prices. All of these are characterized by many occurrences of small moves and fewer occurrences of large moves. This provides a rationale for a distribution with more weight in the center, such as the bell-shaped normal distribution. For many applications, this is a sufficient approximation. This may not be appropriate for measuring tail risk, however.

### Computing Returns
Given the random variable the new price $P_{1}$, the current price $P_{0}$, defining $r = (P_{1} - P_{0})/P_{0}$ as the rate of return in the price, we can start with the assumption that this random variable is drawn from a normal distribution,

$r \sim \Phi(\mu, \sigma)$

with some mean $\mu$ and standard deviation $\sigma$. Turning to prices, we have $P_{1} = P_{0}(1 + r)$ and 

$P_{1} \sim P_{0} + \Phi(P_{0}\mu, P_{0}\sigma)$

For instance, starting from a stock price of $\$100$, if $\mu = 0\%$ and $\sigma = 15\%$, we have $P_{1} \sim \$100 + \Phi(\$0, \$15)$.

However, in this case, the normal distribution cannot be even theoretically correct. Because of limited liability, stock prices cannot go below zero. Similarly, commodity prices and yields cannot turn negative. This is why another popular distribution is the **lognormal distribution**, which is such that

$R = ln(P_{1}/P_{0}) \sim \Phi(\mu, \sigma)$

By taking the logarithm, the price is given by $P_{1} = P_{0}exp(R)$, which precludes prices from turning negative, as the exponential function is always positive.

Comparing the normal distribution with the lognormal distribution over a one-year horizon with $\sigma = 15\%$ annually. The distributions are very similar, except for the tails. The lognormal is skewed to the right.

The difference between the two distributions is driven by the size of the volatility parameter over the horizon. Small values of this parameter imply that the distributions are virtually identical. This can happen either when the asset is not very risky, that is, when the annual volatility is small, or when the horizon is very short. In this situation, there is very little chance of prices turning negative. The limited liability constraint is not important.

|    | Daily | Annual |
|:---|:-----:|:------:|
| Initial price | $100$ | $100$ |
| Ending price | $101$ | $115$ |
| Discrete return | $1.0000$ | $15.0000$ |
| Log return | $0.9950$ | $13.9762$ |
| Discrete return | $0.50\%$ | $7.33\%$ |

The normal and lognormal distributions are very similar for short horizons or low volatilities.

The above table compares the computation of returns over a one-day horizon and a one-year horizon. The one-day returns are $1.000\%$ and $0.995\%$ for discrete and log returns, respectively, which translates into a relative difference of $0.5\%$, which is minor. In contrast, the difference is more significant over longer horizons.

## <a name="distributions-with-fat-tails">Distributions with Fat Tails</a>
Perhaps the most serious problem with the normal distribution is the fact that its tails disappear too fast, at least faster than what is empirically observed in financial data. We typically observe that every market experiences one or more daily moves of four standard deviations or more per year. Such frequency is incompatible with a normal distribution. With a normal distribution, the probability of this happening is $0.0032\%$ for one day, which implies a frequency of once every $125$ years. And in any year, there is usually at least one market that has a daily move greater than $10$ standard deviations.

The empirical observation can be explained in a number of ways:
1. The true distribution has fatter tails (e.g., the Student's $t$).
2. The observations are drawn from a mix of distributions (e.g., a mix of two normals, one with low risk, the other with high risk).
3. The distribution is nonstationary. 

he Student's $t$ density has fatter tails, which better reflect the occurrences of extreme observations in empirical financial data.






## <a name="time-variation-in-risk">Time Variation in Risk</a>


Fat tails can also occur when risk factors are drawn from a distribution with time-varying volatility. To be practical, this time variation must have some predictability.

### Moving Average
Consider a traditional problem where a risk manager observes a sequence of $T$ returns $r_{t}$, from which the variance must be estimated. To simplify, ignore the mean return. At time $t$, the traditional variance estimate is 

$\sigma_{t}^{2} = (1/T)\sum^{T}_{i=1}r^{2}_{t-i}$

This is a simple average where the weight on each past observation is $w_{i} = 1/T$. This may not be the best use of the data, however, especially if more recent observations are more relevant for the next day.

If ploting daily returns on the S&P 500 index, we observe **clustering** in volatility after the Lehman bankruptcy.

### GARCH
A practical model for volatility clustering is the **generalized autoregressive conditional heteroskedastic (GARCH)** model. This class of models assumes that the return at time $t$ has a particular distribution such as the normal, conditional on parameters $\mu_{t}$ and $\sigma_{t}$:

$r_{t} \sim \Phi(\mu_{t},\sigma_{t})$

### EWMA