Previous: [Issues Related to Curran's Methodology](05.ipynb) | [Table of Contents](00.ipynb) | Next: [Applications to Derivatives Pricing](07.ipynb)

# The Willow Tree Lattice Handbook

## Xu, Hong, and Qin's Methodology

Xu _et al_. [14] enhance the selection process for the discrete density pairs $\big\{\big(z_i,q_i\big)\big\}_{i=1}^m$. 
- On the one hand, the authors depart from the uniform distribution as the only choice for $\big\{q_i\big\}_{i=1}^m$, allowing the user to give more weight to probabilities in the central nodes of the lattice so that, when the willow tree algorithm is used to price options, the figures obtained rapidly match those calculated with the expectation operator.


- On the other, they optimise the sampling strategy for $\big\{z_i\big\}_{i=1}^m$, devising three alternative procedures which select representative variates that mimic more closely the moments of the standard normal distribution.

The goal of this method is to increase the efficiency of the lattice in the space dimension, to get very accurate results with the smallest $m$ possible. This way, the computational resources made available can be directed towards refining the structure in the time dimension.

#### Parameter $\gamma$

To boost the efficiency of the willow tree in pricing derivatives, Xu _et al_. notice that, the larger the weight attached to the central nodes of the lattice, the faster the results match those computed with the expectation operator. For this reason, the authors propose a sampling strategy that weighs each element of the sequence $\big\{q_i\big\}_{i=1}^m$ according to the position of the associated variate within the tree: more if the variate is close to the core, less if it is near the edges.

For all $i$, the probabilities are retrieved by:

$$
q_i=\frac{{\big(i-0.5\big)}^\gamma}{m} \qquad \gamma\in\mathbb{R}^+, \quad 0\leq\gamma\leq1.
$$

#### Considerations

- First of all, the equation is very similar to that of vector $\mathbf{z}$ (see [Mathematical Development of the Tree](02.ipynb), and shows that the stratified sampling technique is also used to draw representative variables from the distribution of probabilities.


- Second, $\gamma$, the weighting parameter, is responsible for the form of the distribution. When $\gamma=0$, the distribution is uniform, as it is in Curran's sampling strategy. In this case, the nodes are equiprobable. When $\gamma=1$, the distribution is triangular. Here, the central nodes have the maximum weight possible. Choices of $\gamma$ anywhere in-between flatten or sharpen the distribution [Figure 4].

#### Figure 4: Different Shapes for the Probability Distribution, $m$ = 30*
<img src="../handbook/img/figure-3.png" alt="Features of the Willow Tree" style="width: 425px;"/>

_* Adapted from Xu, Hong, and Qin [14]._

- Third, the larger the value of the parameter, the faster the convergence of the pricing process to the true price. The reason for this behaviour is that, on average, the standard Brownian motion stays close to the origin with high probability, a consequence of both the martingality condition and the shape of the standard normal density. 


- Fourth, when $\gamma$ is different from 0, the probabilities do not sum to 1 anymore. To ensure that the variables always meet such requirement, the authors normalise the elements of the sequence as follows:

 $$
 q_i=\frac{q_i}{\displaystyle\sum_{j=1}^mq_j}
 $$

 Also, when the distribution of $\big\{q_i\big\}_{i=1}^m$ is not uniform, the objective function in the linear programming problem (see [The Linear Programming Problem](03.ipynb)) must be adjusted as follows:

 $$
 \min_{p_{ij}^k}\sum_{i=1}^mq_i\sum_{j=1}^m {p_{ij}^k\big\vert z_j\sqrt{t_{k+1}}-z_i\sqrt{t_k}\big\vert}^3
 $$

 The function weighs paths according to their distance from the origin, giving more relevance to transitions close to the core, and less to those near the edges. Therefore, if the probabilities are equal, as it happens with Curran's sampling strategy, the adjustment is unnecessary, because the paths have the same significance: the correction would simply reduce the value of the objective function, now multiplied by $1/m$, with no impact on the determination of the optimal set of transition probabilities.


- Fifth, $\gamma$ governs the width of the strata from which the representative variates $\big\{z_i\big\}_{i=1}^m$ are drawn, because such parameter impacts on the definition of the $\big\{q_i\big\}_{i=1}^m$ which, in the optimisation procedures to be introduced, enter the formula to calculate the extremes of the intervals. 


- Finally, to speed up the sampling process, it suffices to compute half of the elements of the sequence, because the distribution of the probabilities is symmetrical around 0.5.

## Sampling Strategies

To improve the quality of the variates, Xu _et al_. propose three strategies, of which two are presented here.

- __Kurtosis matching strategy (KRT)__: provides a better representation of the variance and, most importantly, the kurtosis of the samples, so it works particularly well on the _tails_ of the distribution.


- __First partial moment matching strategy (FPM)__: focuses on the expectation, so it is more appropriate to describe the _body_ of the above.

### Kurtosis Matching Strategy

Despite Ho's corrections, when the number of space steps $m$ is small, Curran's sampling strategy leads to sequences whose value of kurtosis is systematically lower than 3.

Correctly proxying for kurtosis is important, because this parameter determines, most importantly, the accuracy of prices of deep out-of-the-money options. Hence, the farther its value from 3, the larger the bias injected in the derivatives pricing process.

To cope with this issue, the authors devise an optimisation procedure that attempts to generate variates which, by construction, possess the basic properties of the standard normal distribution. The strategy involves choosing a set of probabilities $\big\{q_i\big\}_{i=1}^m$ from a user-defined distribution, and using such set to compute, first, the endpoints of the strata, then, the sequence of variates. The latter are found by solving the following constrained nonlinear least-squares problem:

$$
\begin{align*}
&\min_{z_i}\bigg(\sum_{i=1}^{m}q_iz_i^4-3\bigg)^2\\
\text{subject to:}&\nonumber\\
&\sum_{i=1}^{m}q_iz_i=0\\
&\sum_{i=1}^{m}q_iz_i^2=1\\
&Z_{i-1}\leq z_i\leq Z_{i}
\end{align*}
$$

The above formulation has three merits.

 - The first one is to explicitly state the zero mean and the unit variance requirements, which are quite cryptic in the martingality and time increment conditional variance equations (see [The Linear Programming Problem](03.ipynb), Key Convergence Properties).


- The second one is to provide a formula that minimises, according to the least-squares criterion, the degree of excess kurtosis of the sequence. As opposed to the original objective function, this formula omits the denominator, because the variance of the process is forced to be 1.


- The last one is, at the same time, to cover as wide an interval of the standard normal distribution as possible and to increase the precision of the solutions, demanding that the end points of the retrieved sequence, $z_1$ and $z_m$, be as distant from the origin as it is allowed. To this purpose, the variates are required to fall within $m$ intervals $\big\{[Z_{i-1},Z_i]\big\}_{i=1}^m$ (the strata defined in [Mathematical Development of the Tree](02.ipynb)), whose extremes are computed, for all $i$, by:

 $$
 Z_i=\Phi^{-1}\bigg(\sum_{j=1}^iq_j\bigg)
 $$

 with $Z_0=\Phi^{-1}\big(0\big)=-\infty$, and $Z_m=\Phi^{-1}\big(1\big)=+\infty$.
 
 Because the variates obtained by the optimisation procedure tend to be the midpoints of the respective strata, $z_1$ and $z_m$, which lie in unbounded intervals, cannot but minimise the distance from such points, growing as large as possible subject to the imposed constraints. Stretching the range has therefore two consequences.
 
 - First, the portion of the standard normal distribution represented increases: Xu _et al_. find out that the extremes tend to grow, in absolute value, very close to 3, approximately covering 99.7% of the standard normal distribution, a wide portion corresponding to $\pm3$ standard deviations from the origin.
 
 - Second, the accuracy of the solution to the optimisation problem raises because, whatever the choice of $m$, larger end points allow to match more closely the properties of the standard normal, most importantly variance and, in relation to it, kurtosis. The latter effect is perfectly in line with Ho's $\delta$-correction strategy (see [Issues Related to Curran's Methodology](05.ipynb), Issue One), of which the outlined methodology represents, by all means, an automated version.
 
### First Partial Moment Matching Strategy

This strategy involves replacing the objective function $\min_{z_i}\bigg(\sum_{i=1}^{m}q_iz_i^4-3\bigg)^2$ with one that fits the _first-order upper partial moment_ of the discrete density pairs $\big\{\big(z_i,q_i\big)\big\}_{i=1}^m$ to that of the standard normal distribution. The latter is a one-sided integral computed only for values above a specific threshold $\tau$:

$$
\mu_1^+(\tau)=\int_\tau^{+\infty}\big(x-\tau\big)f(x)dx = \int_{-\infty}^{+\infty}\big(x-\tau\big)^+f(x)dx
$$

with $\big(x-\tau\big)^+=\max\big(x-\tau,0\big)$.

Given $\big\{q_i\big\}_{i=1}^m$ and $\big\{Z_i\big\}_{i=0}^m$ generated by the methodology above, the set of variates $\big\{z_i\big\}_{i=1}^m$ is determined by minimising:

$$
\min_{z_i}\sum_{i=2}^m\Bigg\vert\sum_{j=1}^mq_j\big(z_j-Z_{i-1}\big)^+-\int_{-\infty}^{+\infty}\big(z-Z_{i-1}\big)^+f(z)dz\Bigg\vert
$$

subject to the same constraints of the kurtosis matching strategy.

The term on the left, inside the absolute value operator, represents a discretised version of the one on the right, which is equal to the RHS of the previous equation when $x$ is replaced by $z$, and $\tau$ by $\big\{Z_i\big\}_{i=1}^{m-1}$. To make the formula operative, the integral can be simplified, so as to obtain:

$$
\min_{z_i}\sum_{i=2}^m\Bigg\vert\sum_{j=1}^mq_j\big(z_j-Z_{i-1}\big)^+-\bigg(\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}Z_{i-1}^2}-Z_{i-1}\big(1-\Phi\big(Z_{i-1}\big)\big)\bigg)\Bigg\vert
$$

The strategy is well-suited for _near-_ and _very-near-the-money_ options, to correctly price which kurtosis is a less relevant parameter. In such cases, it achieves a higher level of accuracy than the first strategy does, and the precision increases the shorter the distance from the at-the-money price.


Previous: [Issues Related to Curran's Methodology](05.ipynb) | [Table of Contents](00.ipynb) | Next: [Applications to Derivatives Pricing](07.ipynb)