Previous: [The Discrete Process Y](04.ipynb) | [Table of Contents](00.ipynb) | Next: [Xu, Hong, and Qin's Methodology](06.ipynb)

# The Willow Tree Lattice Handbook

## Issues Related to Curran's Methodology

Criticism of Curran's [4] methodology relates to the arbitrary choice of the pairs $\big\{\big(z_i,q_i\big)\big\}_{i=1}^m$ representing a discrete approximation of the standard normal distribution. Three issues arise.

### Issue One
The first issue refers to the __first two moments__ of the set of points, mean and variance, which should satisfy equations:

$$
\mathbf{q}^\text{T}\mathbf{z}=0
$$

$$
\mathbf{q}^\text{T}\mathbf{r}=1
$$


namely, zero mean and unit variance. As Ho [8] points out, while the pairs, selected according to Curran's strategy, always satisfy the zero mean requirement (because the variates are symmetrical and equiprobable), they do not pass the unit variance test for relatively small values of $m$. For example, the variance is 0.8903 when $m$ = 11, and it still fails to be one when $m$ = 100 (0.9873).

#### Ho's Proposed Correction

To meet the variance requirement, Ho suggests two modifications.

- The first one consists in sampling from a _non-standard_ distribution, $\mathcal{N}_\epsilon\big(0,1+\epsilon\big)$, with $\epsilon>0$ such that, for all $i$, $z_i=\mathcal{N}_\epsilon^{-1}\big(\frac{i-0.5}{m}\big)$ satisfies the unit variance requirement.


- The second one involves modifying the two endpoints of $\mathbf{z}$ as $z_1-\delta$ and $z_m+\delta$, with $\delta>0$ such that the resulting sequence of variates also satisfies the unit variance requirement.

Unfortunately, these proposed variations have a common issue: they depend on the particular choice of $m$ so that, every time $m$ is varied, $\epsilon$ or $\delta$ should be modified accordingly. As $m\to\infty$, the variance naturally converges to 1, so $\epsilon$ and $\delta$ gradually decrease when the number of nodes increases. Hence, when each of these quantities, computed in line with a small $m$, is subsequently employed for a larger $m$, the variance of the process may well go above unity, and the effect is more pronounced when $\epsilon$ is used. For instance, when $m$ = 11, $\epsilon\approx$ 0.0598 and $\delta\approx$ 0.1698; when $m$ = 30, resorting to the same quantities leads to a variance of, respectively, 1.0767 and 1.0087.

The corrections outlined may prove effective in two cases: whenever varying parameter $m$ is not a frequent task, for example every time the lattice is built _una tantum_, and whenever using a small $m$ is vital to save computational time and memory.

On the one hand, finding the optimal value for $\epsilon$ or $\delta$ may require time, especially if the procedure is not automatised, and doing so repeatedly could be exhausting or prohibitive. On the other, coping with scarce resources may force the user to choose a small value for $m$, one for which the _central limit theorem_ cannot be invoked, so opting for a patch may be the only way to obtain pairs with the desired properties. Ho, for instance, writes at a time, the early 2000s, in which the lack of powerful processing facilities definitely impacts on the choice of the parameter. Nowadays, computational speed and memory are less of an issue, and choosing a large value for $m$ is not as problematic as it used to be a decade ago. Nevertheless, picking a small $m$ is still advantageous because, when building a lattice, a coarse (but well-defined) structure in space helps free computational resources, to be directed towards refining the structure in time, the more relevant dimension for option pricing.


#### Xu, Hong, and Qin's Solution

A neater solution to the unit variance problem is put forth by Xu _et al_. [14], who resort to a constrained optimisation procedure (explained in [Xu, Hong, and Qin's Methodology](06.ipynb)) in order to define a sequence of pairs that meets the requirements of the equations above even with a small number of elements. To this end, the algorithm incorporates Ho's $\delta$-correction strategy among its constraints, demanding that the endpoints of the retrieved sequence $\{z_i\}_{i=1}^m$ be as large as possible.

### Issue Two

The second issue relates to __kurtosis__, the fourth moment of the set of points. Kurtosis, the degree of tail fatness of a distribution, is equal to 3 for a standard normal. Hence, when the pairs $\big\{\big(z_i,q_i\big)\big\}_{i=1}^m$ are sampled from a standard normal distribution, they must satisfy:

$$
\frac{\mathbf{q}^{\text{T}}\big(\mathbf{r}\circ\mathbf{r}\big)}{\big(\mathbf{q}^{\text{T}}\mathbf{r}\big)^2}
= \frac{\sum_{i=1}^mq_iz_i^4}{\big(\sum_{i=1}^mq_iz_i^2\big)^2}
= 3
$$

Variance, a core component of kurtosis, explicitly appears in the denominator of the equation. Moreover, $\mathbf{r}$, a transformation of $\mathbf{z}$, appears in both the numerator and the denominator. As a consequence, for small values of $m$, the pairs, computed according to Curran's methodology, will show kurtosis lower than 3.

#### Ho's Proposed Correction

Simply modifying the endpoints of $\mathbf{z}$, as Ho suggests, is insufficient, because such correction is not optimal when each component of $\mathbf{z}$ is raised to the fourth power. 

#### Xu, Hong, and Qin's Solution

Xu _et al._ include a constraint on the value of kurtosis in their optimisation procedure, thus providing a solution to the problem.


### Issue Three

The last, but most serious, issue refers to the fact that determining the sequences $\big\{z_i\big\}_{i=1}^m$ and $\big\{q_i\big\}_{i=1}^m$ _independently of each other_ will often lead to __non-solvable linear programming problems__. 


To delve into the issue, first noted by Haussmann and Yan [7], it is important to introduce the following theorem [13]:

>__Theorem (Rouché-Capelli).__ A system of linear equations with $m$ variables has a solution if and only if the rank of its coefficient matrix $\mathbf{A}$ is equal to the rank of its augmented matrix $\big[\mathbf{A}\vert \mathbf{b}\big]$. If there exist solutions, they form an affine subspace of $\mathbb{R}^m$ of dimension $m-rk\big(\mathbf{A}\big)$. In particular:
>1. If $m=rk\big(\mathbf{A}\big)$ the solution is unique;
>2. otherwise, there exist $\infty^{m-rk(\mathbf{A})}$ solutions.

In the willow tree framework, the system of linear equations mentioned in the theorem is the one formed by constraints defined in [The Linear Programming Problem](03.ipynb).

To apply the theorem, the coefficients of the system must be stored into, respectively, a matrix $\mathbf{A}$, collecting those on the LHS of the equations, and a vector $\mathbf{b}$, collecting those on the RHS. To have a feasible system, such quantities cannot be chosen arbitrarily.

In particular, $\mathbf{b}$ must be such that the rank of $\big[\mathbf{A}\vert \mathbf{b}\big]$ equals that of $\mathbf{A}$. As _different transformations_ of $\mathbf{z}$ and $\mathbf{q}$ appear in $\mathbf{A}$ and $\mathbf{b}$, whenever $\big\{z_i\big\}_{i=1}^m$ and $\big\{q_i\big\}_{i=1}^m$ are sampled inconsistently using Curran's methodology, the chance to obtain vector $\mathbf{b}$ _linearly independent_ of matrix $\mathbf{A}$ is high.

Formally, while the rank of $\mathbf{A}$, as found by Ho, is always $4m-3$, the rank of the augmented matrix $\big[\mathbf{A}\vert \mathbf{b}\big]$ is often $4m-2$. Since the Rouché-Capelli theorem specifies that a linear system is infeasible unless the two ranks coincide, choosing pairs in conformity with Curran's strategy will frequently lead to linear programming problems that admit no solutions.


#### The Cause of Infeasibility

According to Haussmann and Yan, the cause of infeasibility resides in the _stationarity constraint_. As it turns out, stationarity is conflicting with the conditional variance requirement, so that solutions exist only if one of the two is removed. However, while plausible, removing a constraint from the original formulation of the problem represents a suboptimal course of action.

#### Xu, Hong, and Qin's Solution

The optimisation process put forth by Xu _et al_. deals with the issue successfully, because it allows to determine the components of $\big\{z_i\big\}_{i=1}^m$ based on a previous choice regarding the elements of $\big\{q_i\big\}_{i=1}^m$, introducing _de facto_, between the sequences, some degree of dependence which positively affects the rank of the augmented matrix.

Previous: [The Discrete Process Y](04.ipynb) | [Table of Contents](00.ipynb) | Next: [Xu, Hong, and Qin's Methodology](06.ipynb)