# A study of SDE and stoch processes

Questions that need answering:
* what is the space setup here? bounded continuous function of $[0, T]$? or is it the dual of that (some good behaving measures)? What is the dual space?
* what is the norm/metric/topology here for strong convergence?
* is the weak convergence here the same thing as weak-$\star$?

Then let's rewrite the below by summarizing [Higham D.J (2002)](http://homepages.warwick.ac.uk/~masdr/JOURNALPUBS/stuart51.pdf),
[this doctoral thesis](https://core.ac.uk/download/pdf/70597247.pdf)
and [this paper on Euler-Maryama](https://arxiv.org/abs/1610.07047.pdf)

In [None]:
import torch

import numpy as np

<br>

Consider and SDE
$$
dX_t
    = \mu_t dt + \Sigma_t^{\tfrac12} dW_t
    \,, $$
where $\mu_t = \mu(X_t)$, $\Sigma_t = \Sigma(X_t)$ and $W_t$
is a multivaraite Brownian motion.

Euler-Maruyama numerical integration method of this SDE goes like this
(based on this lecture [on numerical methods in Computational Finance](https://www.mimuw.edu.pl/~apalczew/CFP_lecture5.pdf)).

The SDE is in fact the following integral equation
$$
X_t - X_0
    = \int_0^t \mu_\tau d\tau + \int_0^t \Sigma_\tau^{\tfrac12} dW_\tau
  \,, $$

so for $t$ and $t+\delta$ the Îto integral of the BM can be approximated
by the same finite difference, with which it is constructed (in $\ell_2$)

$$
\begin{align}
\int_t^{t+\delta} \Sigma_\tau^{\tfrac12} dW_\tau
    &\approx \Sigma_t^{\tfrac12} \bigl(W_{t+\delta} - W_t\bigr)
    \,, \\
\int_t^{t+\delta} \mu_\tau d\tau
    & \approx \mu_t \delta
    \,,
\end{align}
$$
where approximations are in $\ell_2$ ($\|\cdot\|_2^2 = \mathbb{E}(\cdot)^2$)
sense as $\delta \to 0$. Therefore the $\delta$-difference obeys

$$
X_{t+\delta} - X_t
    \sim \mathcal{N}_d(\mu_t \delta, \Sigma_t \delta)
    \,. $$

Consider a partition $0 = t_0 < t_1 < \cdots < t_n = T$ with $
\max_{k=1}^n (t_{k+1} - t_k) \leq \delta
$
and denote by
$ X^\delta_t $ the piecewise linear interpolation of $(X_{t_n})_{n=0}^n$:
$$
X^\delta_t
%     = \sum_{k=1}^n
%         1_{[t_{k-1}, t_k)}(t) \Bigl(
%             X_{t_k} + \frac{t - t_k}{t_{k+1} - t_k} (X_{t_{k+1}} - X_{t_k})
%         \Bigr)
    = \sum_{k \colon t \in [t_{k-1}, t_k)}
        X_{t_k} + \frac{t - t_k}{t_{k+1} - t_k} (X_{t_{k+1}} - X_{t_k})
    \,. $$

A numerical scheme for a given SDE has strong convergence of order $\gamma$
if there exists $C_T$ (depending on the SDE and time $T$) such that

$$
\mathbb{E} \biggl(
    \sup_{t \in [0, T]} \bigl\| X_t - X^\delta_t \bigr\|_2^2
\biggr)^{\tfrac12}
    \leq C_T \delta^\gamma
    \,. $$

Another notion of convergence for SDEs (and in general in probability) is weak
(weak-$\star$ in functional analysis). A scheme for an SDE has weak convergence
of order $\gamma$ if for any bounded continuous function $
    g \in C^\infty(
        \mathbb{R}^d \to \mathbb{R}
    )
$ there is a $C_{T,g}$ such that

$$
\bigl\lvert
    \mathbb{E} g(X_T) - g(X^\delta_T)
\bigr\rvert
    \leq C_{T,g} \delta^\gamma
    \,. $$
Note that weak convergence concerns the distribution at $T$ only.

<br>

In [None]:
import torch

import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
import math

def euler_maruyama(mu, sigma, x_0, *, delta=1e-4):
    X_t = x_0.clone().detach()
    *head, _ = X_t.shape

    mu_t, sigma_t = mu(X_t), sigma(X_t)
    while True:
        X_t.add_(mu_t * delta)  # make a mul-copy of mu

        # torch.matmul(dW_t, torch.cholesky(sigma_t, upper=True))
        dW_t = torch.randn(*head, sigma_t.shape[-1], 1)
        dW_t.mul_(math.sqrt(delta))

        X_t.add_(torch.matmul(sigma_t, dW_t).squeeze(-1))

        yield X_t.clone()

        mu_t, sigma_t = mu(X_t), sigma(X_t)

In [None]:
a = torch.randn(2, 20)
sigma_t = torch.cholesky(torch.mm(a, a.t()) * 1e-2, upper=False)
mu_t = torch.zeros(5, 2) * 2

In [None]:
paths = [torch.randn(1, 2).repeat(5, 1) * 1e-1]
# paths = [torch.randn(5, 2) * 1e-1]
int_ = euler_maruyama(lambda x: mu_t, lambda x: sigma_t,
                      paths[-1], delta=1e-3)

paths.extend(X_t for _, X_t in zip(range(200), int_))

paths = torch.stack(paths, dim=1)

In [None]:
fig = plt.figure(figsize=(14, 14))
ax = fig.add_subplot(111)

colors = plt.cm.Accent(np.linspace(0, 1, num=len(paths)))
for path, col in zip(paths, colors):
    path = path.numpy()
    xy, uv = path[:-1], path[1:] - path[:-1]
    ax.quiver(xy[:, 0], xy[:, 1], uv[:, 0], uv[:, 1], color=col,
              angles="xy", units="xy", scale=1., scale_units="xy")

In [None]:
assert torch.allclose(torch.cholesky(a, upper=True).t(), torch.cholesky(a))

<br>

Maybe we could start with the simple ode of the form
$$ \label{eq:prob_3_2}
\frac{dy}{dx} = f(x, y)
    \,, y(x_0) = y_0
    \,. \tag{3.2}
    $$

[p. 42 in Petrovskiy (1984)](#.pdf) :

If $f\colon G \to \mathbb{R}$ is bounded and continuous on a domain $
  G \subset \mathbb{R}\times \mathbb{R}
$ (open connected subset), then for any $(x_0, y_0) \in G$ there exists
at least one solution (integral curve) $
  \phi \colon [a, b] \to \mathbb{R}
$ with $a < x_0 < b$ and $
  [a, b] \subset \{x\colon (x, y) \in G\}
$ satisying the problem \eqref{eq:prob_3_2}. If $f$ is (uniformly) Lipschitz
in $y$ then the solutons are unique.

[p. 57 in Petrovskiy (1984)](#.pdf):

If the map $f(x, y) \colon G \to \mathbb{R}^d$ is continuous on a domain $
  G \subset \mathbb{R} \times \mathbb{R}^d
$ w.r.t. $x$ and Lischitz on any closed bounded (compact?) subset of $G$,
then for any $(x_0, y_0) \in G$ there exists a closed interval $[a, b]$
inside $G$ covering $x_0$ over which a unique solution to the problem
\eqref{eq:prob_3_2} is defined. (Contraction mapping theorem)

[p. 78 Petrovskiy (1984)](#.pdf) :

If in \eqref{eq:prob_3_2} $f$ is $p$-smooth w.r.t both $x$ and $y$ then the solutions are $p+1$ smooth.

[p. 80 Petrovskiy (1984)](#.pdf) :

If $f$ continuous, bounded, and every solution to \eqref{eq:prob_3_2} is unique,
then solutions are continuous w.r.t. $x_0, y_0$ and $f$ ($\sup$-norm).

<br>

In [None]:
import gpytorch

In [None]:
# Training data is 11 points in [0,1] inclusive regularly spaced
train_x = torch.linspace(0, 1, 100)
# True function is sin(2*pi*x) with Gaussian noise
train_y = torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2

In [None]:
# We will use the simplest form of GP model, exact inference
class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood)

In [None]:
# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=1e-2)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iter = 50
for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
        i + 1, training_iter, loss.item(),
        model.covar_module.base_kernel.lengthscale.item(),
        model.likelihood.noise.item()
    ))
    optimizer.step()

```python
f_preds = model(test_x)
y_preds = likelihood(model(test_x))

f_mean = f_preds.mean
f_var = f_preds.variance
f_covar = f_preds.covariance_matrix
f_samples = f_preds.sample(sample_shape=torch.Size(1000,))
```

In [None]:
# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# Test points are regularly spaced along [0,1]
# Make predictions by feeding model through likelihood
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    test_x = torch.linspace(0, 1, 51)
    observed_pred = likelihood(model(test_x))

In [None]:
with torch.no_grad():
    # Initialize plot
    f, ax = plt.subplots(1, 1, figsize=(4, 3))

    # Get upper and lower confidence bounds
    lower, upper = observed_pred.confidence_region()
    # Plot training data as black stars
    ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
    # Plot predictive means as blue line
    ax.plot(test_x.numpy(), observed_pred.mean.numpy(), 'b')
    # Shade between the lower and upper confidence bounds
    ax.fill_between(test_x.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
    ax.set_ylim([-3, 3])
    ax.legend(['Observed Data', 'Mean', 'Confidence'])

In [None]:
model.eval()

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    mv = model(test_x)

smpl = mv.sample(torch.Size((11,)))

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(12, 7))

ax.plot(test_x, smpl.numpy().T);
ax.plot(train_x, train_y)