# pyhawkes

## Introduction 

Hawkes processes are a mathematical tool extending a system of simple Poisson processes by allowing individual counting processes to be interdependent. These inderdependencies between individual processes manifest through self-excitation and cross-excitation effects of an occurrence of an event on the conditional intensity function.

In mathematical terms, the definition of a M-variate Hawkes process is the following:

Let us have a filtration $\mathbb{F}^{m} = \{\mathscr{F}^{m}_t\}_{t\geq0}$, where $\mathscr{F}^{m}_{t}$ is a $\sigma-$algebra generated by a simple non-explosive counting process $N^{m}(t)$. Then let us define $\mathbb{F}^{'} = \{\mathscr{F}_{t}^{'}\}_{t\geq0}$, where $\mathscr{F}_{t}^{'} = \bigcup_{m=1}^{M} \mathscr{F}^{m}_{t}$.

A mutivariate Hawkes process is a vector $\mathbf{N(t)} = (N^{1}(t),...,N^{M}(t))$ of such counting processes satisfying $\forall m \in \{1,..., M\}$: 
    
1. $N^{m}(0) = 0$

2. Conditional intensity function $\lambda^{m}(t|\mathscr{F}_{t}^{'})$ is an $\mathscr{F}_{t}^{'}$-predictable process of the form: 
\begin{equation}
\lambda^{m}(t|\mathscr{F}_{t}^{'}) = \mu_m + \sum_{n = 1}^{M} \int_{0}^{t}g^{(m,n)}(t-u)dN^{n}(u),
\end{equation}
where the integral is a Stieltjes integral and $\forall (m,n) \in B$, where $B = \{ (i,j): i \in \{1,...,M\}$, $j \in \{1,...,M\}\}, g^{(m,n)}(t): \mathbb{R}^{+}_{0} \rightarrow \mathbb{R}^{+}_{0}$.  

3. $P(N^{m}(t + h) - N^{m}(t) = 1 |\mathscr{F}_{t}) = \lambda^{m}(t|\mathscr{F}_{t}^{'})h + o(h)$ 
4. $P(N^{m}(t + h) - N^{m}(t) \geq 2 | \mathscr{F}_{t}) = o(h)$,

where $o(h)$ represents small-o-notation.

The condition of positivity of parameters is usally applied. Also, for an M-variate Hawkes process to be stationary, the matrix $A(MxM)$ with entries $a_{m,n} = \int_{0}^{\infty} g^{m,n}(v) dv$ needs to have a spectral radius strictly less than 1.


## Kernels

Exponential kernel of the form $g(t|\alpha, \beta) = \alpha e^{-\beta t}$ is the most commonly used one, as it allows for $O(n)$ reduction of the usual $O(n^2)$ complexity of the computation of log-likelihood, compensator or simulation of the process through the use of recursive formula. $\alpha$ parameter represents the measure of an instantaneous impact of the event, while $\beta$ parameter represents the speed of decay of the impact. 

Another popular kernel is a power-law kernel that takes the form $\frac{\alpha}{(t + \gamma)^{\beta}}$, where $\beta > 1$ . This kernel is however prohibitively slow due to its $O(n^2)$ complexity. Therefore, we use parametrization composed of power-law factors in sum of exponentials as provided by [2]:

$$
g(t|n,\epsilon,\tau_0) = \frac{n}{Z} \Big(\sum_{i = 0}^{J-1}\xi_{i}^{-(1+\epsilon)}e^{-\frac{t}{\xi_i}}\Big),
$$

where $\xi_j = \tau_0 j^i$ for $0 \leq i < J$. Parameter Z is chosen such that $\int_{0}^{\infty}g(t)dt = n$, therefore $Z = \sum_{j=0}^{J-1}(\tau_0 m^j)^{-\epsilon}$. Parameter $j$ controls the precision of the approximation and $J$ specifies the range of it. $1 + \epsilon$ term approximates the power-law decay with an  identical exponent.

## Simulation

As a simulation method, we use Ogata's thinning method. The basic idea is that after an event in the multivariate Hawkes process, the process behaves likes an inhomogeneous Poisson process. Therefore, the simulation successively simulates first points for a series of inhomogeneous Poisson processes. For a more detailed overview, see [1]. 


## Goodness-of-fit tests

The goodness-of-fit is mostly done for Hawkes processes through the application of a random time change theorem [1], [3]. In short, we may transform the original Hawkes process series to a new series that should behave like a sequence of i.i.d. exponential r.v.s with unit rate. The transformation takes the form:

$$
V^{m}_{i} = \int_{T^{m}_{i}}^{T^{m}_{i+1}} \lambda_m(t|\mathscr{F}_{t}^{'})dt,
$$

where $T^{m}_{i + 1}$ and $T^{m}_{i}$ are jump times of the original Hawkes process. Then, $\{V^{m}_{i}\}_{i = 1, ..., N^{m}(T)}$ is a sequence of i.i.d. exponential r.v.s with rate 1. Kolmogorov-Smirnov test and Excess Dispersion test [4] are used to test the exponential property and Ljung-Box or Box-Pierce tests are used to test the independence property. The functions comp_[type]_hawkes (where type may be exp, power or gen) return the sequences $\{V^{m}_{i}\}_{i = 1, ..., N^{m}(T)}$ for all M.


## Log-likelihood

The log-likelihood of an M-variate Hawkes process takes the following form [1]:

$$
log L = \sum_{m=1}^{M} \Big[ \sum_{t_{i}^m < T}^{} log(\lambda^{m}(t_{i}^{m}|\mathscr{F}_{t_i}^{'})) - \int_{0}^{T}\lambda^{m}(u|\mathscr{F}_{u}^{'})du) \Big]
$$

The functions lik_[type]_hawkes (where type may be exp, power or gen) return the value of log L.

## Literature

[1] For a more detailed description, please refer to notes prepared by Yuanda Chen, https://www.math.fsu.edu/~ychen/hawkes.html and references provided there.

[2] LALLOUACHE, Mehdi; CHALLET, Damien. The limits of statistical significance of Hawkes processes fitted to financial data. *Quantitative Finance*, 2016, 16.1: 1-11.

[3] BOWSHER, Clive G. Modelling security market events in continuous time: Intensity based, multivariate point process models. *Journal of Econometrics*, 2007, 141.2: 876-912.

[4] ENGLE, Robert F.; RUSSELL, Jeffrey R. Autoregressive conditional duration: a new model for irregularly spaced transaction data. *Econometrica*, 1998, 1127-1162.