## Modelling Correlation between Risks
In credit derivative valuation and credit risk management, one of the most important issue is the estimate of default probabilities and their correlations. 

Default correlation measures the tendency of two companies to default at about the same time. For this, generally speaking, there are two ways: using historical default data or using mathematical models, like copulas. 

Historical default data has played an important role in the estimation of default probabilities. However, because default events are rare, there is very limited default data available. Moreover, historical data reflects the historical default pattern only and it may not be a proper indicator of the future. This makes the estimation of default probabilities from historical data difficult and inexact. To use this same data to estimate default correlations is even more difficult and more inexact. 

On the other hand mathematical models don't rely on historical default data. We have already seen how it is possible to derive default probabilities from market data. Before going into the details of the application of the copula to them let's introduce two more kind of credit derivatives.

## Basket Default Swaps
A basket default swap is a credit derivative on a portfolio of reference entities. The simplest basket default swaps are the nth-to-default swaps.

With respect to a basket of reference entities, a first-to-default swap provides insurance for only the first default, a second-to-default swap provides insurance for only the second default, an nth-to-default swap provides insurance for only the nth default. 

For example, in an nth-to-default swap, the protection seller does not make a payment to the protection buyer for the first n−1 defaulted reference entities, and makes the payment only for the nth defaulted reference entity. Once there is a payment upopn the default of the a defaulted reference entity, the swap terminates. It behaves like a standard CDS but refers to then nth entity.


### Collateralized Debt Obligation
A collateralized debt obligation (CDO) is a security backed by a diversified pool of
one or more kinds of debt obligations such as bonds, loans, credit default swaps or
structured products (mortgage-backed securities, asset-backed securities, and even
other CDOs). A CDO can be initiated by one or more of the following: banks,
nonbank financial institutions, and asset management companies, is referred to as
the sponsor. The sponsor of a CDO creates a company so-called the special purpose
vehicle (SPV). The SPV works as an independent entity. In this way, CDO investors
are isolated from the credit risk of the sponsor. Moreover, the SPV is responsible
for the administration. The SPV obtains the credit risk exposure by purchasing
debt obligations (bonds or residential and commercial loans) or selling CDSs; it
transfers the credit risk by issuing debt obligations (tranches/credit-linked notes).
The investors in the tranches of a CDO have the ultimate credit risk exposure to
the underlying reference entities.
The SPV issues four kinds of tranches. Each tranche has
an attachment percentage and a detachment percentage. When the cumulative
percentage loss of the portfolio reaches the attachment percentage, investors
in the tranche start to lose their principal, and when the cumulative percentage loss
of principal reaches the detachment percentage, the investors in the tranche lose all
their principal and no further loss can occur to them.

In the literature, tranches of a CDO are classified as subordinate/equity tranche,
mezzanine tranches, and senior tranches according to their subordinate levels. 
Because the equity tranche is extremely
risky, the sponsor of a CDO holds the equity tranche and the SPV sells other tranches
to investors.

### Role of Correlation in Basket CDS and CDO
The cost of protection in a nth-to-default CDS or a tranche of a CDO is critically dependent on default correlation. 

As we have seen in the example, correlation plays a very important role. Consequently it is clear that it will affects the price of first-to-default CDS with respect to a 10th-to-default CDS for example.

### Calculating nth-to-default Probabilities

The valuation of a basket default swap comes down to the calculation of relevant default probabilities. So let's see how we can compute them.

#### Independent Defaults
If the default times of the names of a basket are independent nth-to-default probabilities can be calculated through multiplication and integration of the default probability curves of each basket components. 

As an example, we consider the second-to-default probability of a 3-name basket. Let $\tau_i$ be the default time of name $i$ and $F_i(t)$ its distribution. Then the probability that name 1 defaults second in the basket before time $t$: 

$$
\begin{align*}
&P((\tau_2\lt\tau_1)\cap (\tau_1\lt t)\cap (\tau_1\lt\tau_3)) +
P((\tau_3\lt\tau_1)\cap (\tau_1\lt t)\cap (\tau_1\lt\tau_2)) = \\
&\int_0^t{F_2 (s)\cdot (1-F_3 (s))~dF_1(s)} +  \int_0^t{F_3 (s)\cdot (1-F_2 (s))~dF_1(s)}
\end{align*}
$$

The formula for nth-to-default probability in a general basket can be derived similarly, however, complexity increases as the number of names increases.

Suppose the default probabilities of three companies, A, B and C are given as in the following table:

|time in years | A | B | C |
| :-:|:-:|:-:|:-:|
|0 | 0 | 0 | 0 |
|1 | 0.022032 | 0.0317 | 0.035 |
|2 | 0.046242 | 0.0655 | 0.075 |
|3 | 0.07266 | 0.1022 | 0.121 |
|4 | 0.101233 | 0.142 | 0.153 |
|5 | 0.131885 | 0.1752 | 0.205 |

and suppose that the default events of the three companies are independent. 

The default probabilities are linear in each time interval so the integral above can be solved by substitution:

$$ \int_{x_0}^{x_1}{(1-F_B(x))(1-F_C(x))dF_A(x)}$$

Setting $t=m_A x + q_A$ it becomes with $m_A, q_A$ are the parameters of the line joining the default probabilities of company A:

$$ \int_{m_A x_0 + q_A}^{m_A x_1 + q_A}{(1-F_B(x(t)))(1-F_C(x(t)))dt}\qquad\textrm{, with}~x(t) = \cfrac{t -q_A}{m_A} $$
and similarly for company B and C.

To convert it into python we can use $\tt{scipy.integrate.quad}$ to perform the integral and $\tt{numpy.interp}$ to determine the intermediate default probabilities.

In [60]:
from scipy.integrate import quad
from numpy import interp

default_rates = {"A":(0, 0.022032, 0.046242, 0.07266, 0.101233, 0.131885), # company A
                 "B":(0, 0.0317, 0.0655, 0.1022, 0.142, 0.1752), # company B
                 "C":(0, 0.035, 0.075, 0.121, 0.153, 0.205)} # company C

def func(x, default, companies, t):
    m = default[companies[0]][t] - default[companies[0]][t-1]
    q = default[companies[0]][t-1] - m * (t-1)
    t = (x-q)/m
    F2 = 1 - interp(t, range(len(default[companies[1]])), default[companies[1]])
    F3 = 1 - interp(t, range(len(default[companies[2]])), default[companies[2]])
    return F2*F3

def integral(default, companies, t):
    return quad(func, 0, default[companies[0]][t], args=(default, companies, t))[0]
                 
for companies in [("A", "B", "C"), ("B", "A", "C"), ("C", "A", "B")]:
    prob = 0
    for t in range(1, 6):
        prob = integral(default_rates, companies, t)
        print ("First to default prob at time ({}) for company {}: {:.5f}".format(t, companies[0], prob))


First to default prob at time (1) for company A: 0.02131
First to default prob at time (2) for company A: 0.04301
First to default prob at time (3) for company A: 0.06460
First to default prob at time (4) for company A: 0.08573
First to default prob at time (5) for company A: 0.10606
First to default prob at time (1) for company B: 0.03080
First to default prob at time (2) for company B: 0.06160
First to default prob at time (3) for company B: 0.09245
First to default prob at time (4) for company B: 0.12315
First to default prob at time (5) for company B: 0.15018
First to default prob at time (1) for company C: 0.03407
First to default prob at time (2) for company C: 0.07071
First to default prob at time (3) for company C: 0.10986
First to default prob at time (4) for company C: 0.13879
First to default prob at time (5) for company C: 0.17011


#### Correlated Defaults
When the default probabilities of the companies are correlated the copula approach can be used like in the example shown above.

Suppose we would like to simulate the defaults for the next 5 years for 6 companies. The copula default correlation between each company is 0.2 and the cumulative probability of default during the next 1,2,3,4 5 years is 1%, 3%, 6%, 10%, 13% respectively for each company.

When a Gaussian copula is used in order to simulate the defaults we need to sample from a multivariate normal distribution a vector $\mathbf{x}$, transform then each $x_i$ into the corresponding default probability $p_i$.

Let's check the 3th-to-default probabilities for each year.

In [79]:
from scipy.stats import multivariate_normal

p_default = [0, 0.01, 0.03, 0.06, 0.10, 0.13]

mvnorm = multivariate_normal(mean=[0]*6,
                             cov = [[1, 0.2, 0.2, 0.2, 0.2, 0.2],
                                    [0.2, 1, 0.2, 0.2, 0.2, 0.2],
                                    [0.2, 0.2, 1, 0.2, 0.2, 0.2],
                                    [0.2, 0.2, 0.2, 1, 0.2, 0.2],
                                    [0.2, 0.2, 0.2, 0.2, 1, 0.2],
                                    [0.2, 0.2, 0.2, 0.2, 0.2, 1]])

trials = 100000
result = [0., 0., 0., 0., 0., 0.]
x = mvnorm.rvs(size=trials)

for n in range(len(x)):
    p = sorted(norm.cdf(x[n]))
    for i in range(1, len(p_default)):
        if p_default[i-1] <= p[2] <= p_default[i]:
            result[i] += 1

print ("3rd-to-default probabilies")
for i in range(len(p_default)):
    print ("{}: {:.4f}".format(i, result[i]/trials))

3rd-to-default probabilies
0: 0.0000
1: 0.0003
2: 0.0033
3: 0.0109
4: 0.0250
5: 0.0267


### Standard Market Model
While there are several types of copula function models, the first introduced was the one-factor Gaussian copula model. These models have the advantage that can be solved semi-analytically.

Consider a portfolio of $N$ companies and assume that the marginal probabilities of default are known for each company. Define:

* $t_i$, the time of default of the $i$th company:
* $Q_i(t)$, the cumulative probability that company $i$ will default before time $t$; that is, the probability that $t_i \le t$;
* $S_i(t) = 1 – Q_i(t)$, the probability that company $i$ will survive beyond time $t$; that is, the probability that $t_i > t$. 

To generate a one-factor model for the $t_i$ we define random variables $X_i$ $(1\le i \le N)$

$$X_i = a_i M + \sqrt{1-a_i^2 Z_i},\qquad i = 1, 2,\ldots, n$$

where $M$ and the $Z_i$ have independent zero-mean unit-variance distributions and $–1 \le a_i \lt 1$. 

The previous equation defines a correlation structure between the $X_i$ dependent on a single common factor $M$. The correlation between $X_i$ and $X_j$ is $a_i a_j$.

Let $F_i$ be the cumulative distribution of $X_i$. Under the copula model the $X_i$ are mapped to the $t_i$ using a *percentile-to-percentile* transformation. The five-percentile point in the probability distribution for $X_i$ is transformed to the five-percentile point in the probability distribution of $t_i$ and so on.

In general the point $X_i = x$ is transformed to $t_i = t$ where $t = Q_i^{–1}[F_i(x)]$.

Let $H$ be the cumulative distribution of the $Z_i$.
It follows from the previous equation that 

$$\mathbb{P}(X_i < x|M) = H\left(\cfrac{x-a_i M}{\sqrt{1-a_i^2}}\right)$$

When $x = F_i^{–1}[Q_i(t)]$, $\mathbb{P}(t_i < t) = \mathbb{P}(x_i < x)$. Hence

$$\mathbb{P}(t_i < t|M) = H\left\{\cfrac{F_i^{–1}[Q_i(t)]-a_i M}{\sqrt{1-a_i^2}}\right\}$$

The conditional probability that the $i$th bond will **survive** beyond time $T$ is therefore

$$S_i(T|M) = 1 - H\left\{\cfrac{F_i^{–1}[Q_i(t)]-a_i M}{\sqrt{1-a_i^2}}\right\}$$

Although in principle any distribution could be used for $M$’s and the $Z$’s (provided they have zero mean and unit variance),
one common choice is to let them be standard normal distributions (resulting in a Gaussian copula). 

Different choice of distributions results in different copula model, each one, and in different natures of the default dependence. For example, copulas where the $M$’s have heavy tails generate models where there is a greater likelihood of a clustering of early defaults for several companies. 
<!----
Later on we
will explore the effect of using normal and t-distributions for the $M$'s and $Z$'s. 
Suppose that a CDO includes $n$ assets $i = 1, 2,\ldots, n$ and the default times $\tau_i$ of the $i$th asset has a default intensity $\lambda_i$. Then the probability of a default occurring before the time $t$ is
$$\mathbb{P}(\tau_i \lt t) = 1 - \mathrm{exp}(-\lambda_i t)$$
see Chapter~\ref{hazard}.----->

For simplicity, the following two assumptions are made:

* all the companies have the same default intensity, i.e, $\lambda_i = \lambda$;
* the pairwise default correlations are the same, i.e $a_i = a$.

The second assumption means that the contribution of the market component is
the same for all the companies and the correlation between any two companies is
constant, $\rho = a^2$.

Under these assumptions, given the market situation $M = m$, all the companies
have the same cumulative default probability $D_{t|M}=\mathbb{P}(t_i < t|M)$. Moreover, for a
given value of the market component $M$, the defaults are mutually independent for
all the underlying companies. Letting $N_{t|m}$ be the total defaults that have occurred
by time $t$ conditional on the market condition $M = m$, then $N_{t|m}$ follows a binomial
distribution, and

$$\mathbb{P}(N_{t|m} = j) = \cfrac{n!}{j!(n-j)!}D^j_{t|m}(1-D_{t|m})^{n-j},\qquad  j=0, 1, 2,\ldots,n$$
The probability that there will be exactly $j$ defaults by time $t$ is

$$\mathbb{P}(N_{t} = j) = \int_{-\infty}^{\infty}{\mathbb{P}(N_{t|m} = j)f_M(m)dm}$$

where $f_M(m)$ is the probability density function (PDF) of the random variable $M$.

<!------- In a one-factor Gaussian
copula model, the distributions of the common market component $M$ and the individual component $Z_i$ are standard normal Gaussian distributions.
Because the sum of two independent Gaussian distributions is still a Gaussian distribution, the $X_i$ have a standard normal distribution.---->

The one-factor copula Gaussian copula model under the assumptions outlined above is the *market standard model*.

### Basket CDS Valuation under Market Standard Model

Consider a first-to-default basket: if the 1st-to-default probabilities are known, then the default probability of a basket of entities is simply the sum of the 1s-to-default probabilities of the individual entities in the basket. Then the fair valu of the premium payments of a basket default swap can be easily calculated: the protection is only for the first entity in the basket only the payoff value of the 1st defaulted entity needs to be calculated. For an nth-to-default swap similar arguments also hold. 

We now present some numerical results for an $n$th to default basket. We assume that the
principals and expected recovery rates are the same for all underlying reference assets.
The valuation procedure is similar to that for a regular CDS where there is only one
reference entity.

In a regular CDS the valuation is based on the probability that a default
will occur between times T1 and T2. Here the valuation is based on the probability that the
$n$th default will occur between times T1 and T2.

We assume the buyer of protection makes quarterly payments in arrears at a specified rate
until the $n$th default occurs or the end of the life of the contract is reached. 

In the event of the $n$th default occurring, the seller pays $N\cdot(1-R)$. 
The contract can be valued by calculating the expected present value
of payments and the expected present value of payoffs in a risk-neutral world. 

Consider first a 5-year $n$th to default CDS on a basket of 10 reference entities in the
situation where the expected recovery rate, R, is 40%. The term structure of interest rates
is assumed to be flat at 5%. The default probabilities for the 10 entities are generated by
Poisson processes with constant default intensities, $\lambda_i$, $(1 \le i \le 10)$ so that 

10 CDS rho = 0.3 R = 40% rate = 5% 
Q(t) = [0.02, 0.0396, 0.0588, 0.0776, 0.0961]


In [47]:
from finmarkets import DiscountCurve, CreditCurve, CreditDefaultSwap
from finmarkets import GaussianQuadrature
from datetime import date
from dateutil.relativedelta import relativedelta
from scipy.stats import norm, binom
from math import sqrt, exp

n_cds = 10
rho = 0.6
l = 0.01
pillar_dates = []
df = []
today = date.today()
for i in range(6):
    pillar_dates.append(today + relativedelta(years=i))
    df.append(1/(1+0.05*i))
dc = DiscountCurve(today, pillar_dates, df)
gq = GaussianQuadrature()
Q = [1-exp(-(l*t)) for t in range(6)]
#Q = [0, 0.02, 0.0396, 0.0588, 0.0776, 0.0961]
values, weights = gq.M(60)
cds = CreditDefaultSwap(1, today, 0.01, 5)

for ndefault in range(1, 11):
    S = []
    
    for j in range(len(values)):
        temp = []
        for i in range(6):
            P = norm.cdf((norm.ppf(Q[i]) - sqrt(rho)*values[j])/
                         (sqrt(1-rho)))
            b = binom(n_cds, P)
            temp.append(1 - (b.cdf(n_cds)-b.cdf(ndefault-1)))
        S.append(temp)
    
    s = 0
    for j in range(len(values)):
        s += weights[j] * cds.breakevenRate(dc, 
                                            CreditCurve(pillar_dates, 
                                                        S[j]))

    print (s)

0.03214370055078652
0.015337587442302281
0.00725616955902986
0.004215577657531996
0.0022014564160901615
0.001225042309192564
0.0006383163501547495
0.0003071509287843383
0.00012742929384463388
3.669614592811156e-05


	0	0.3	0.6
1	153	181	321
2	25	51	153
3	3	18	73
4	0	7	42
5	0	3	22
6	0	0	12
7	0	0	6
8	0	0	3
9	0	0	1
10	0	0	0
			
			
	0.01	0.02	0.03
1	181	393	627
2	51	136	238
3	18	57	109
4	7	25	52
5	3	11	25
6	0	5	12
7	0	2	5
8	0	0	2
9	0	0	0
10	0	0	0

### Valuation of CDO
Suppose that the payment date on a CDO tranche are at times $\tau_i$. Define $\mathbb{E}_j$ the expected 
tranche principal at time $\tau$ and $D(\tau)$ the discount factor at time $\tau$. Suppose also that the spread
on a particular tranche (i.e. the number of basis point paid for protection on the remaining tranche principal) is $s$. 

The present value of the expected regular spread payments on the CDO is given by
\begin{equation}
s\cdot A = s\cdot \sum_{j=1}^{m}(\tau_j - \tau_{j-1})\mathbb{E}_{j}D(\tau_j)
\label{eq:A}
\end{equation}
The expected loss between times $\tau_{j-1}$ and $\tau_j$ is $\mathbb{E}_{j-1}-\mathbb{E}_j$. For simplicity assume
the loss occurs only at the midpoint of the time interval, so the present value of the expected payoffs on the CDO tranche is
\begin{equation}
C=\sum_{j=1}^{m}(\mathbb{E}_{j-1}-\mathbb{E}_j)D\left(\frac{\tau_{j-1}+\tau_j}{2}\right)
\label{eq:C}
\end{equation}
The accrual payment due on the losses is finally given by
\begin{equation}
s\cdot B = s\cdot\sum_{j=1}^{m}\frac{1}{2}(\tau_j - \tau_{j-1})(\mathbb{E}_{j-1}-\mathbb{E}_j)D(\frac{\tau_{j-1}+\tau_j}{2})
\label{eq:B}
\end{equation}

The value of the tranche, valued from the point of view of the protection buyer is $C-sA-sB$. The breakeven spread 
on the tranche occurs when the present value of the payments equals the present value of the payoffs so

$$ s = \cfrac{C}{A+B}$$

Suppose that the tranche under consideration covers losses on the portfolio between $\alpha_L$ and $\alpha_H$ and define

$$n_L = \cfrac{\alpha_L n}{1-R}\qquad \mathrm{and}\qquad n_H = \cfrac{\alpha_H n}{1-R}$$
where $R$ is the recovery rate. Finally define $m(x)$ as the smallest integer greater than $x$.
By definition the tranche principal stays $N$ while the number of defaults $k$ is less than $m(n_L)$, it is zero when the number of default is greater or equal to $m(n_H)$, otherwise is

$$\cfrac{\alpha_H -k(1-R)/n}{\alpha_H - \alpha_L}$$

The expected tranche principal at time $\tau_j$ conditional of the value of the factor $M$ is
$$\begin{equation}
\mathbb{E}_j(M) = \sum_{k=0}^{m(n_L)-1}\mathbb{P}(k, \tau_j|M) + \sum_{k=m(n_L)}^{m(n_H)-1}\mathbb{P}(k, \tau_j|M) \cfrac{\alpha_H -k(1-R)/n}{\alpha_H - \alpha_L}
\label{eq:E}
\end{equation}
$$

To compute the breakeven spread it is necessary to substitute Eq.~\ref{eq:E} into Eq.~\ref{eq:A},~\ref{eq:B} and~\ref{eq:C}
and we need to integrate the result over the variable $M$ (remember that has a standard normal distribution). 
The integration is quite complicated and is best accomplished with a technique called *Gaussian quadrature* which exploits the approximation

$$\int_{-\infty}^{\infty}{\cfrac{1}{\sqrt{2}}e^{-M^{2}/2}g(M)dM} \approx \sum_{k=1}^{k=L}w_k g(F_k)$$
as $L$ increases, accuracy increases.


## Complex Correlation Structures and the Financial Crisis

In the example above we have used the multivariate normal which gave rise to the Gaussian copula.However, we can use other and more complex copulas as well. For example we might want to assume the correlation is non-symmetruc which is useful in quant finance where correlations become very strong during market crashes and returns very negative.

Infact, Gaussian copulas are said to have played a key role in the 2008 financial crisis as tail-correlations were severely underestimated. Consider a set of mortgages in CDOs (a particular kind of contract that we are going to see) they are clearly correlated, if one mortgage fails, the likelihood that another failing is increased. In the early 2000s, the banks only knew how to model the marginals of the default rates. An (in)famous paper by Li then suggested to use copulas to model the correlations between those marginals. Rating agencies relied on thid model so heaviy, severely underestimating risk and giving false ratings...

If you are interested in the argument read \href{http://samueldwatts.com/wp-content/uploads/2016/08/Watts-Gaussian-Copula_Financial_Crisis.pdf}{this paper} for an excellent description of Gaussian copulas and the Financial Crisis which argues that different copula choices would not have made a difference but instead the assumed correlation was way too low.