## Quantiles

Given a probability distribution $f(x)$ (e.g. Gaussian, Binomial, Poisson...) the *cumulative distribution function* (CDF) is defined as 

$$ F(x) = \mathbb{P}(X\le x)\qquad\textrm{or equivalently}\int_{-\infty}^{x}f(x)dx$$

<img src="percentile.png">

The quantile function $Q$, associated with a probability distribution of a random variable, specifies the value  $x$ of the random variable such that the probability $p$ of the variable being less than or equal to that value equals a given probability. It is also called the percent-point function (PPF) or inverse cumulative distribution function

$$Q = F^{-1}\qquad\textrm{returns $x_p$ such that}~F(x_p)\le p$$

In [None]:
# compute ppf and cdf 
from scipy.stats import norm

Percentile (or quantile) can be computed also if there is not a probability distribution but a list of measurements with $\tt{numpy.percentile}$

N.B. 50-percentile is equivalent to 0.5-quantile

In [None]:
# compute percentile on list from 0 to 9
import numpy as np

The median of a distribution is the 50-percentile (the mean only if the distribution is symmetric).

## Credit curves

Just like a discount curve is a way of representing the underlying interest rates (or equivalently discount factors) implicit in the market quotes of a collection of real-world interest rate products, **credit curves** are a way of representing survival probabilities implied by credit default swaps.

**Credit default swaps** (**CDS**) are instruments whose value depends on the likelihood that a given company (the curve's **issuer**) will suffer a credit event over a given period.

A **credit event** can be a default, the failure to make payments, the issuer entering into bankruptcy proceedings, or the occurence of other legal events. The exact definition of what constitutes a credit event depends on a series of factors and is usually defined in some kind of ISDA (International Swaps and Derivatives Association) master agreement.

In any case, we will generically call a credit event a *default*, and talk about **non-default probabilities** (**NDP**), i.e. the probability that the issuer will not suffer a credit event before a given date i.e. *non-default probability is a cumulative probability* since refers to a time period. 

<table>
  <tr width=200>
    <th>Discount Curve</th>
    <th>Credit Curve</th>
  </tr>
  <tr>
    <td>Represents underlying rates implicit in market quotes of IR products</td>
    <td>Represents default probability implied by credit default swaps</td>
  </tr>
  <tr>
    <td>made of pillar_dates and discount factors</td>
    <td>made of pillar_dates and survival probabilities</td>
  </tr>
  <tr>
    <td>discount factors</td>
    <td>non-default probabilities</td>
  </tr>
  <tr>
    <td>short rate</td>
    <td>hazard rate</td>
  </tr>
</table>   

The short rate, $r_t$, is the interest rate at which an entity can borrow money for an infinitesimally short period of time from time $t$.

### Hazard Rate

Hazard rate is often called a *conditional failure rate* since it's expression is a direct
application of the conditional probability concept.

Conditional probability answers to the question "how should you update probabilities of events when there is additional information available ?". To derive the general formula let's start with an example.

A fair die is rolled. Let $A$ be the event that the outcome is an odd number ($A={1,3,5}$). Also let $B$ be the event that the outcome is less than or equal to $3$ ($B={1,2,3}$). What is the probability of $A$ ($\mathbb{P}(A)$) ? What is the probability of $A$ given $B$ ($\mathbb{P}(A|B)$) ?

Being a simple example we can compute the result by hand:

$$\mathbb{P}(A) = \cfrac{|A|}{|S|} = \cfrac{|\{1,3,5\}|}{6} = \cfrac{1}{2}\qquad\textrm{(where S is the entire sample space)}$$

Now let's find the conditional probability of $A$ given that $B$ occurred. If we know $B$ has occurred, the outcome must be among $\{1,2,3\}$. For $A$ to also happen the outcome must be in $A\cap B = \{1,3\}$. Since all die rolls are equally likely, we argue that $\mathbb{P}(A|B)$ must be equal to

$$\mathbb{P}(A|B) = \cfrac{|A\cap B|}{|B|} = \cfrac{2}{3}$$

To generalize our example we can rewrite the calculation by dividing the numerator and denominator by the entire space of the events $|S|$ hence:

$$\mathbb{P}(A|B) = \cfrac{|A\cap B|}{|B|} = \cfrac{\cfrac{|A\cap B|}{|S|}}{\cfrac{|B|}{|S|}} = \cfrac{\mathbb{P}(A\cap B)}{\mathbb{P}(B)}$$

<img src="conditional_b.png" width=500>

Hazard rate represents the instantaneous probability of the issuer defaulting *conditioned* on it not having defaulted until that moment.

In formula if the default (survival) probability is indicated by $DP$ ($N$), the hazard rate by $\lambda$ and the time of default with $\tau$:

$$\lambda(t) = \cfrac{\mathbb{P}(A\cap B)}{\mathbb{P}(B)} = \cfrac{DP(\tau \in (t, t+dt))}{DP(\tau\gt t)} = \cfrac{\cfrac{d(DP)}{dt}}{N(t_0, t)} = \cfrac{\cfrac{d(1-N)}{dt}}{N(t_0, t)} = -\cfrac{dN}{dt}\cfrac{1}{N(t_0, t)}$$

where the minus sign derives from the fact that $N$ is a **non** default probability while the hazard rate is defined in terms of the probability of default $DP$.

In practice we will calculate it numerically, and therefore it'll be the (annualized) conditional probability of the issuer defaulting between the value date and the day after.
Conversely given the hazard rate the survival probability can be determined as:

$$\lambda(t) = -\cfrac{1}{dt}\cdot\cfrac{dN}{N} = -\cfrac{d(\textrm{log}N)}{dt}$$

$$N(t_0, t) = e^{-\int_{t_0}^{t}\lambda(s) ds}$$

### CreditCurve class

In [None]:
# implement CreditCurve class
import math, numpy
from dateutil.relativedelta import relativedelta

As usual we test the newly developed class with some dummy data.

In [None]:
# set observation_date and CreditCurve
from datetime import date

In [None]:
# check ndp in 1 year

In [None]:
# check hazard in 1 year

## Credit Deafult Swaps

Once we have implemented a $\tt{CreditCurve}$ class which allows us to interpolate survival probabilities, and also to calculate the hazard rate at arbitrary dates, we can use it to price **credit default swaps** (CDSs).

A CDS is a financial swap agreement that the seller of the CDS will compensate the buyer in the event of a debt default or other credit event. That is, the seller of the CDS insures the buyer against some reference asset defaulting. The buyer of the CDS makes a series of payments (the CDS "fee" or "spread") to the seller and, in exchange, may expect to receive a payoff if the asset defaults. 

CDSs are made up of two legs:

* the *default* leg: which pays $LGD = F(1 - R)$, known as the **loss given default**, if and when the credit event occurs, $F$ is the face value of the contact, $R$ is the recovery rate (usually set around 40%);
* the *premium* leg: which pays the *spread* $S$ every m months until the credit event occurs.

### Premium leg

Let's start with the premium leg. We will use the following notation:

* $d$ today's date;
* $d_0$ the start date of the CDS (could be different from $d$);
* $d_1, ..., d_n$ the payment dates of the premium leg, which occur at a m-month frequency (we assume that $d_n$ is the end date of the CDS);
* $D(d')$ the discount factor between $d$ and $d'$;
* $N(d')$ the survival probability between $d$ and $d'$;
* $\tau$ the random variable representing the date of the credit event.

At each payment date $d_i$, a flow $S$ is paid if and only if the credit event has not occurred before that date. Therefore the NPV of the each flow is

$$f_{\textrm{premium}}^i = \mathbb{E}\left[ S \times D(d_i) \times \mathbb{1}(\tau > d_i) \right]$$

where $\mathbb{1}(\tau > d_i)$ means that the expectation value has to be evaluated when $\tau > d_i$. Remember that if $x$ is a random variable with a finite number of finite outcomes $x_{1},x_{2},\ldots ,x_{k}$ occurring with probabilities $p_{1},p_{2},\ldots ,p_{k}$ respectively, the expectation of $x$ is:

$$\mathbb{E}[x] = \sum _{i=1}^{k}x_{i}\,p_{i}=x_{1}p_{1}+x_{2}p_{2}+\cdots +x_{k}p_{k}$$
which is the weighted sum of the $x_i$, with $p_{i}$ values being the weights. 

In our case $x_i = S\cdot D(d_i)$ and $p_i=N(d_i)$ so the NPV of the leg can be expressed as:

$$\textrm{NPV}_{premium} = F\cdot S\cdot \sum_{i=1}^{n} D(d_i) \cdot N(d_i)$$

## Default leg

The LGD is paid out on the same date on which the credit event occurs, i.e. it can potentially be paid out on any date between $d_0$ and $d_n$. Mathematically, therefore, the NPV of the premium leg can be expressed as follows:

$$\mathrm{NPV_{default}} = \mathbb{E} \left[F(1-R) \times D(\tau) \times \mathbb{1}(\tau \leq d_n) \right] $$

Using the laws of probability, we can break this down into the sum of "daily NPVs" calculated as a function of the daily default probabilities $DP$:

$$
\begin{align*}
\mathbb{E}\left[F(1-R) \times D(\tau) \times \mathbb{1}(\tau \leq d_n) \right]
&= \sum_{d'=d_0}^{d_n} \mathbb{E}[ F(1-R) \times D(\tau) | \tau = d'] DP[ \tau = d' ] \\
&= F(1-R) \sum_{d'=d_0}^{d_n} D(d') \left( DP[ \tau \geq d' ] - DP[ \tau \geq d'+1 ] \right) \\
&= F(1-R) \sum_{d'=d_0}^{d_n} D(d') \left( N(d') - N(d'+1) \right)
\end{align*}
$$

where the last step holds since $DP[\tau\geq d'] = 1 - DP[\tau < d'] = 1 - (1-N(\tau < d')) = N(\tau < d')$.

<img src="timeline.png">

In [None]:
# credit default swap class with breakeven method
from finmarkets import generate_swap_dates

Below a simple test of the class, using [discount_curve_ch_10.xlsx](https://drive.google.com/file/d/1mugHyet3H9tcSAvYvt8G4_kpfaEbVY7b/view?usp=sharing).

In [None]:
# test CDS class 
import pandas as pd
from finmarkets import DiscountCurve

In [None]:
# check default leg, premium leg and npv

## Estimate Default Probabilities from CDS

Pretty much like the discount curves could be derived from OIS market quotes, we can estimate default probabilities (hence credit curves) from CDS quotes using *bootstrap*.
Following the same steps seen for the discount curve we can determine default probabilites at discrete dates to fill our curve:

* collect market quotes for a number of CDS with different maturities;
* create the corresponding CDS objects;
* define a $\tt{CreditCurve}$ whose pillars are the CDS maturity dates and the survival probabilities are unknown;
* define an objective function to minimize the sum of the squared CDS's NPVs;
* set the non-default probabilities to an initial value and define their range of variability between $[0, 1]$ since they are probabilities and fix "today's" probability to 1 since there hasn't been any default;
* run the minimization.

The file for market quotes is [cds_quotes.xlsx](https://drive.google.com/file/d/1BOtwCFYk0CUwYkMhnowWTj0HNOpBefd_/view?usp=sharing).


In [None]:
# bootstrapping
from scipy.optimize import minimize
import pandas as pd

### Determine Default Probabilities from Bond Prices

The price of a bond issued by a party is directly linked to the credit rating of that party, since there is always an associated default risk, which means that the borrower might not be able to repay fully or partially the amount of the taken loan.
Bonds with low ratings, called junk bonds, are sold at lower prices (since riskier) while those with higher ratings, called investment-grade bonds, are sold at higher prices.

Let's see with an example how the default probability can be determined from bond prices.
Imagine to have a bond and let $x$ represent the present value of a bond cash flow stream. 

When you have a default probability associated to the issuer to valuate the bond we need to take each possible value of $x$, multiply it by its probability and sum the results. In other words the value of the bond should equal the mathematical expectation of $x$.

Consider a bond which pays $F$ at maturity and that the issuer of this bond has a default probability $DP$ (in case of default the recovery is $R$).What will be the price of this bond ?

$$ V_{bond} =
\begin{cases}
& D \cdot R \cdot F\quad\textrm{(in case of default of the issuer)}\\
&D \cdot F\quad\textrm{(in case of no default)}\\
\end{cases}$$
where $D$ is the proper discount factor.
Since we don’t know if the issuer will default or not we can estimate the bond price as

$$V_{bond} = D \cdot R \cdot F \cdot DP ( \tau ) + D \cdot F \cdot ( 1 − DP ( \tau)) = D\cdot F \cdot ( 1 − ( 1 − R ) DP ( \tau ))$$

From the this equation is clear that the higher the default probability the lower is the bond price. Conversely, given the market price of the bond we can estimate the issuer default probability.

In the notes there is a generalization of this formula to estimate the default probability of a bond issuer from its current price

$$\mathbb{E}(x) = \Big(CSD + RF(1-S)\Big)\cfrac{1-(SD)^N}{1-SD}+F(SD)^N$$
where $C$ the coupon and $N$ their number, $F$ the bond face value, $R$ the recovery percentage and $S$ the survival probability between two coupons. 

With $\mathbb{E}(x)$ equal to the price of the bond, this equation can be solved numerically for $S$. The default probability is simply $(1-S)$, and the probability that the bond defaults anytime within the next $k$ coupons is $(1-S^k)$.

Consider for example a bond with 4 year maturity, a face value of €100 which provides 5% yearly coupons. Also the recovery rate $R$ is 40% and the risk-free rate is constant a 3%. Let's try to estimate the default probability of the issuer of this bond, finding from the previous equation the value of $S$ which give the current trading price (€80).

In [None]:
# with brentq determine default probability
from scipy.optimize import brentq