In [None]:
from preamble import *
plt.close('all')


In [None]:
from book_funs20 import *



# Generative models with kernels

## Aim of this section

Synthetic data generation is a data obtained as a result of fitting observed data to a given model. There are many applications using synthetic financial time series data, for risk management, decision tools, or backtesting purposes. 

A classical approach to this problem are autoregressive, also called parametric methods, fitting a known process to market observations, as  GARCH (Generalized Autoregressive Conditional Heteroscedasticity).

A more recent field of research are non-parametric models, based on neural networks, as for instance GAN (Generative Adversarial Networks). There exists a number of works using GANs in finance for time series prediction, portfolio management or fraud detection, see for instance \cite{EFO:2021} for a review. However, this approach still needs to prove its efficiency for pricing purposes.

In this paper, we describe an alternative approach to non-parametric models, producing synthetic data using kernel methods. Kernel methods are explainable since we can measure the accuracy of predictions with error estimates. These estimates are based on a distance between measures, that is a natural link to optimal transport theory. This allows to reproduce any random variables based on the observation of their realizations, as well as to quantify theoretically the discretization error.

Indeed, the capability to reproduce a given random variable accurately is key to synthetic data. The section \ref{generative-models-with-kernels} describes our approach, whereas the section \ref{numerical-illustration} gives numerical illustrations of our construction.

Finally, we illustrate our approach via two financial applications in section \ref{pricing-applications}. The first checks that the time series forecast can be used for Monte Carlo pricing. The second is a P\&L explanation that can be used for intradays real time P\&L approximation of large derivative portfolios. We compute various numerical metrics to show the convergence properties of our methods.

### Settings

Let $\mathbb{X}$ be an unknown probability measure, **absolutely continuous** with respect to the Lebesgue measure, supported over a convex set $\mathcal{X} \subset \RR^D$, $D$ being the dimension, which is the number of risk sources for financial applications. We focus in this paper on the discrete case, that is, let 
\begin{equation}\label{X}
X :=\{x^n_d\}_{n,d=1}^{N_x,D} \in \RR^{N_x,D}
\end{equation} be a set of **distinct** points in $\mathcal{X}$, defined as random samples following $\mathbb{X}$, and consider the discrete probability measure $\mathbb{X}_x = \frac{1}{N_x} \sum_{i=1}^{N_x} \delta_{x^i}$, $\delta_x$ being the Dirac measure concentrated at $x$. Consider $\mathbb{Y}$ another probability measure, with a known law, for instance a uniform distribution over $\mathcal{X}$, and define as for $\mathbb{X}_x$, $\mathbb{Y}_y = \frac{1}{N_y} \sum_{i=1}^{N_y} \delta_{y^i}$.

### Kernel review

We refer to \cite{BerlinetThomasAgnan:2004} for a complete introduction to reproducing kernel Hilbert spaces (RKHS) theory. We call a function $k: \mathcal{X} \times \mathcal{X} \mapsto \mathbb{R}$ a kernel if it is symmetric and positive definite (see \cite{BerlinetThomasAgnan:2004} for a definition). A reproducing kernel Hilbert space $\mathcal{H}_k$ is a Hilbert space, generated by the kernel $k$, which scalar product satisfies the following reproducing property : $k(x,y) = \langle k(x,\cdot), k(y,\cdot) \rangle_{\mathcal{H}_k}$,  $\forall (x, y) \in \mathcal{X} \times \mathcal{X}$, see \cite{SHS:2001}.

The discrepancy between two probability measures $\mathbb{X}$ and $\mathbb{Y}$, induced by a kernel $k$ is
\begin{equation}\label{DkC}
\begin{array}{c}
    D_k\big(\mathbb{X},\mathbb{Y}\big)^2 := \int\int k(x,y)d\mathbb{X}d\mathbb{X} + \\
    \int\int k(x,y)d\mathbb{Y}d\mathbb{Y} - 2 \int\int k(x,y) d\mathbb{X}d\mathbb{Y}. 
\end{array}
\end{equation}
For the discrete case, this amounts to the following formula, introduced in \cite{GR:2006}
\begin{equation}\label{Dk}
\begin{array}{c}
    D_k\big(\mathbb{X}_x,\mathbb{Y}_y\big)^2 := \alpha \sum_{n,m=1}^{N_x}  k(x^n,x^m) + \\ \beta\sum_{n,m}^{N_y} k(y^n,y^m) - \gamma\sum_{n,m=1}^{N_x,N_y} k(x^n,y^m), 
\end{array}
\end{equation}
where $\alpha = \frac{1}{N_x^2}$, $\beta = \frac{1}{N_y^2}$ and $\gamma = \frac{2}{N_x N_y}$. For \eqref{DkC}, or \eqref{Dk}, we define sharp discrepancy sequences (SDS, see \cite{LeFloch-Mercier:2020b}), as solutions of the following, non-convex, minimization problem
\begin{equation}\label{SDS}
\bar{Y} = \arg \inf_{y \in \mathcal{X}^{N_y}} D_k(\mathbb{X},\mathbb{Y}_y)
\end{equation}

We introduce also the discrepancy matrix induced by the kernel $k$ as $M_k(X,Y):= (d_k(x^n,y^m))_{n,m=1}^{N_x,N_y}$, defined as
\begin{equation}\label{dk}
 d_k(x,y) = k(x,x)+k(y,y)-2k(x,y)
\end{equation}

Let $f:\mathcal{X} \mapsto \RR^{D_f}$ any vector valued function. Kernel methods allow to define a simple interpolation / extrapolation procedure. Denoting $z \mapsto f_z$ the interpolated function and the "ground truth values" $f(z)$, we introduce the following sets, using a classical terminology for a supervised machine learning. Consider a training set $X,f(X) \in \mathcal{X}^{N_x} , \mathbb{R}^{N_x , D_f}$, a test set $Z,f(Z) \in \mathcal{X}^{N_z}, \mathbb{R}^{N_z , D_f}$, as well as a third set $Y \in \mathcal{X}^{N_y}$ of internal parameters (to fix ideas, $Y$ is equivalent to the "weight set" for neural networks).
Let $K(X,Y)$ be a kernel matrix, induced by the kernel $k$, i.e. $K(X,Y):=(k(x^n,y^m))_{n,m = 1}^{N_x,N_y}$. We define a projection operator $f_Z := \mathcal{P}_{k}(X,Y,Z)f(X)$, induced by the kernel $k$, defined as a matrix through
\begin{equation} \label{projection}
\mathcal{P}_{k}(X,Y,Z) := K(Y,Z) K(X,Y)^{-1}.
\end{equation}
The inverse is computed using a least-squares approach, as follows, $K(X,Y)^{-1} = ( K(Y,X)K(X,Y)+ \epsilon I_d)^{-1}K(Y,X)$, where $\epsilon \geq 0$ is a (optional) regularization term. The projection operator \eqref{projection} benefits from the following error estimate (see \cite{LeFloch-Mercier:2020b}), that are  confidence levels
\begin{equation} \label{error}
\| f(Z) - f_Z \|_{\ell^2} \le D_k(X,Y,Z) \| f \|_{\mathcal{H}_k},
\end{equation}
where $D_k(X,Y,Z) := D_k(X,Y)+D_k(Y,Z)$.
Starting from the formula \eqref{projection}, we can define all kind of differential operators, as for instance the gradient
\begin{equation} \label{grad}
\nabla f_Z = (\nabla_Z K)(Y,Z) K(X,Y)^{-1} f(X).
\end{equation}
 

### Kernel-based transport maps

Consider a map $T$ that transports $\mathbb{Y}_y$ into $\mathbb{X}_x$. Using standard optimal transport definitions, $T$ is a push forward map with notation $T_{\#} \mathbb{Y}_y = \mathbb{X}_x$. To fix ideas, in the discrete case, and $N_x=N_y$, $T$ is defined through any permutation map $\sigma: \{1,...,N_x\} \mapsto \{1,...,N_x\}$, as $T(Y):=X^{\sigma}:=\{x^{\sigma(n)}\}_{n=1}^{N_x}$.

To set a well-defined map, we choose to define $T$ as a convex map, with respect to a non-Euclidean metric, which is the discrepancy \eqref{dk}, as follows
\begin{equation}\label{sbar}
\bar{\sigma} = \arg \inf_{\sigma \in \Sigma} Tr(M_k(X^{\sigma},Y)), 
\end{equation}
where $\Sigma$ is the set of all permutations, and Tr holds for the trace of the matrix $M_k$. This problem can be solved for instance as a linear sum assignment problem, and the resulting values can be used as initial ones to solve the problem \eqref{SDS} through a gradient descent algorithm.

Once an optimal permutation $\bar{\sigma}$ is computed, we use the projection operator \eqref{projection} to define a continuous map $G$, as   
\begin{equation}\label{Gz}
z \mapsto G^{\mathbb{X}_x}(z) := \mathcal{P}_{k}(Y,Y,z) X^{\bar{\sigma}}.
\end{equation}
In particular, suppose that $Z \in \mathcal{X}^{N_z}$ is IID of the same random variable used to sample $Y$, then $\mathcal{P}_{k}(Y,Y,Z) X^{\bar{\sigma}}$ is a natural candidate of $N_z$ IID random samples of $\mathbb{X}$.


### Time series forecasting

In this paper we consider time series forecasting as fitting a model in order to match a stochastic process $t \mapsto X(t)\in \RR^D$, observed on a time grid $t_x^1<\ldots<t_x^{T_x}$, the data having the following shape
\begin{equation}\label{TS}
X:=\Big(x^{n,k}_d\Big)_{d=1\ldots D}^{n,k=1\ldots N_x, T_x} \in \RR^{N_x,D,T_x}.
\end{equation} 
In \eqref{TS}, $N_x$ is the number of observed trajectories, the third component of this 3-dimensional tensor corresponding to the time index, and the dimension is $D$, or might also be $D+1$, if the time $t^k$ is added to the observation set to take into account time dependencies. Note that market data consists usually in only one trajectory of a stochastic process, hence $N_x = 1$ in this paper. However, in other applications, $N_x \gg 1$, as are for instance customers data.

Feature engineering is a classical approach in machine learning that consists on adding new features to target a model. In our case, we use an injective map $F : \RR^{N_x,D,T_x} \mapsto \RR^{N_F,D_F}$, assuming that $F(X)$ is a random variable. By reproducing this random variable, one can generate any number $N_z$ of trajectories as follows:

\begin{itemize}\setlength{\itemsep}{0pt}
    \item Consider any input data $X$ having shape \eqref{TS}, use the map $F$ to retrieve $F(X) \in \RR^{N_F \times D_F}$ which are random samples of $\mathbb{X}$.     \item Generate samples using \eqref{Gz}, considering $F(X) \in \RR^{N_F \times D_F}$ as the training set.
    \item From these samples, use $F^{-1}$ to output samples having shape $\RR^{N_z,D,T_z}$, at any time grid $t_z^1<\ldots<t_z^{T_z}$.
\end{itemize}

In this paper, we consider a simple model assumption \footnote{This choice is motivated to provide benchmarks in this paper.}, adapted to stock markets modeled with Markovian processes, fitting any positive time series to a Markovian process $t \mapsto X_t \in \RR^D$ having shape
\begin{equation}\label{PLR}
X_t = X_s \exp( (t-s) \mu + \sqrt{t-s} \mathbb{X}),
\end{equation}
where the unknown random variable, modeling the martingale component of the process, satisfies $\mathbb{E}(\mathbb{X}) = 0$ and is supposed **absolutely continuous** with respect to the Lebesgue measure. 

Hence we introduce the log-return map $F(X) : \RR^{N_x,D,T_x} \mapsto \RR^{ N_x \times T_x,D}$, defined as
\begin{equation}\label{LR}
 \Big( \frac{ \ln(x^{n,k}_d)- \ln(x^{n,k-1}_d)} {\sqrt{t^{k} - t^{k-1}}} \Big) _{d=1\ldots D}^{n,k=1\ldots N_x,T_x}.
\end{equation}
Considering any time grid $t_z^1<\ldots<t_z^{T_z}$, we can define the inverse map $F^{-1}(Z) : \RR^{ N_z \times T_z,D} \mapsto \RR^{N_z,D,T_z}$ as an exponential - integral operator.

### Recurrent methods for time series predictions


Let us describe recurrent methods that can be implemented for any predictive machine \@ref(eq:Pms), and we discuss an example of prediction.

Consider some historical observations $X$ as in \@ref(eq:TS), and two integers $H$ and $P$, satisfying $H+P \le T_X$. H is called the historical depth, P the prediction depth. This setting defines a sliding window of size H+P over the data $X$, used to define the training set as follows (using slicing notations)
$$
X^{0}  = X^{[\cdot,\cdot,i:i+H]} \in \mathbb{R}^{\tilde{N}_X \times D \times H}, f(X^{0})  = X^{[\cdot,\cdot,i+H:i+H+P]} \in \mathbb{R}^{\tilde{N}_X \times D \times P}
$$ 
for any $i = 1,\ldots, \tilde{N}_X$, with $\tilde{N}_X = (T - H - P)N_X$. We can iterate the procedure, producing at each step P new predicted values, using recursively a predictive machine \@ref(eq:Pms) as follows
$$
 X^{k+1} =[X^k,f(X^{k})], \qquad f(X^{k+1})= \mathcal{P}_{m}(X^k,Y,X^{k+1}, f(X^{k})),(\#eq:RK)
$$
$[X^k,f(X^{k})]$ being the concatenation of these two tensors in the last variable.  Such a construction allows to produce predicted values of the temporal series at any future times.

## Numerical illustration

Let $X$ as in \eqref{X}. We use kernels together with maps $(k\circ S)(x,y)$, adapted to our sets, to ensure positive-definiteness. In the numerical sections to follow, our kernel choice is 
\begin{equation} \label{KER}
  k(x,y) = \Pi_{d=1\ldots D} \left(1 - |x_d-y_d| \right),
\end{equation}
that is the kernel equivalent of a RELU activation function, together with the following scaling map : $S(x) = \Big(S_d(x_d)\Big)_{d=1\ldots D}$, with
\begin{equation} \label{MAP}
  S_d(x_d) = \frac{x_d - \overline{x_d}}{\max_{x^n \in X} x^n_d - \min_{x^n \in X} x^n_d},
\end{equation}
$\overline{x_d}$ being the mean $\frac{\sum_{n=1 \ldots N_x} x^n_d}{N_x}$.

### One dimensional distributions

For illustration goals, we apply the algorithm \eqref{Gz} for two bi-modal distributions based on a Gaussian and a Student's distribution namely $\mathcal{N}(0,1)$ and  $t(\nu=5)$. We use here two distinct sets (training set $X$ and test set $Z$) to highlight some convergence properties of \eqref{Gz}:

\begin{itemize}\setlength{\itemsep}{0pt}
    \item IID : $X,Z$ are iid samples of $\mathbb{X}$.
    \item SDS : $X,Z$ are sharp discrepancy sequences (SDS) of $\mathbb{X}$, see \eqref{SDS}.
\end{itemize}
For both sets, the size of the training set is $N_x = 1000$, whereas the size of the test set is $N_z=500$. We plot the results computed by the sampler algorithm \eqref{Gz} in Figure \ref{plot1} (resp. Figure \ref{plotsds}) for the IID case (resp. SDS case).


In [None]:
xy, f_x, f_z = data_for_static_dist(**get_wilmott_param(),M=501)
figure1(xy, **get_wilmott_param())
table1 = table(f_x, f_z, f_names= ["Gaussian", "t-distribution"])


In [None]:
xy, f_x, f_z = data_for_static_dist(**get_wilmott_param(),M=501,grid_projection = True)
figure1(xy, **get_wilmott_param())
tablesds = table(f_x, f_z, f_names= ["Gaussian", "t-distribution"])


To measure a fit of a generated discrete distribution $G^{\mathbb{X}_x}$ to the original distribution $\mathbb{X}$, throughout this paper we compute summary statistics of the generated distribution, together with the original one, as well as the result of the Kolmogorov-Smirnov (KS) test for marginals and, eventually, compute correlation matrices. The computed moment values for the original distribution are quoted in parenthesis. The value for KS tests corresponds to the p-value for a 95\% level (a successful KS test is above $0.05$).



In [None]:
pyresults <- py$table1
knitr::kable(pyresults, caption = "Statistics of IID-generated distributions", escape = FALSE)  %>%
      kable_styling(latex_options = c("repeat_header","HOLD_position"))


In [None]:
pyresults <- py$tablesds
knitr::kable(pyresults, caption = "Statistics of SDS-generated distributions", escape = FALSE)  %>%
      kable_styling(latex_options = c("repeat_header","HOLD_position"))


These two tables show how an appropriate choice of sets can drastically improve the convergence performance of \eqref{Gz}. The convergence rate for the set of IID random samples $X \in \RR^{N_x,D}$ is of order $\mathcal{O}(\frac{1}{\sqrt{N_x}})$, while SDS's convergence is of order $\mathcal{O}(\frac{1}{N_x^2})$ for smooth distributions, see \cite{LeFloch-Mercier:2020b}.


### Time series forecasting illustration

We illustrate the time series forecast algorithm, see section \ref{time-series-forecasting} with real market data, retrieved from January 1, 2016 to December 31, 2020, for three assets: Google, Apple and Amazon. 


In [None]:
f_z,data,f_x=RegenerateDistribution(**get_yf_AAG_params())
table2, corr_h, corr_g = table2(f_x,f_z, **get_yf_AAG_params())


In [None]:
scatter_hist(f_x[:,0], f_x[:,2], **get_yf_AAG_params())



In [None]:
scatter_hist(f_z[:,0], f_z[:,2], **get_yf_AAG_params())



After applying the log-return map, we produce samples of this distribution using \eqref{Gz} and draw both distributions, which represent historical and generated log-returns, in Figures \ref{plot2} and \ref{plot3}, projected on two of its components Apple and Google. We check the match between the original and generated distribution using the same table of statistics described in section \ref{one-dimensional-static-distribution}. Notice that the p-value in K-S test is higher than .05 for the three marginals (see Table \ref{tab:102}, line K-S).



In [None]:
pyresults <- py$table2
knitr::kable(pyresults, caption = "Summary statistics for Apple, Amazon and Google")  %>%
      kable_styling(latex_options = c("repeat_header","HOLD_position"))


We also check that the historical correlation matrices Table \ref{tab:603} is close to the generated one at Table \ref{tab:604}.



In [None]:
pyresults <- py$corr_h
knitr::kable(pyresults, caption = "Correlation matrix of historical data", escape = FALSE)  %>%
      kable_styling(latex_options = c("repeat_header","HOLD_position"))


In [None]:
pyresults <- py$corr_g
knitr::kable(pyresults, caption = "Correlation matrix of generated data", escape = FALSE)  %>%
      kable_styling(latex_options = c("repeat_header","HOLD_position"))


Using the generated distribution, we then reconstruct trajectories from January 1, 2021, to December 31, 2021, as described in section \ref{time-series-forecasting}. We illustrate the output of ten generated trajectories for Google, in Figure \ref{plot4}, and plot also the historical Google charts to compare with.



In [None]:
Generated_Google()



### Recurrent kernels illustration

The recurrent method \@ref(eq:RK) allows to draw one trajectory, that can be considered as a iid realization of the temporal series, based on the knowledge of its history. Figure \ref{fig:675} shows a toy example of historical temporal series forecast, having two components : the Bitcoin price and the hash-rate values, for which we considered $T_X$ covering daily observations from 01/01/2015 to 23/11/2020, since $H$ and $P$ are set to fit 6 months datas. Hence the settings to produce Figure \ref{fig:675} correspond to $N_X=1, D=2, T_X=1460$ in \@ref(eq:TS). Starting from this setting, we predict the temporal series up to 31/12/2021 and compare it with the historically observed one, using a kernel implementation of the scheme \@ref(eq:RK).

\begin{figure}

{\centering \includegraphics[height = 50mm,width=0.45\linewidth]{CodPyFigs/1640295936328} 
\includegraphics[height = 50mm,width=0.45\linewidth]{CodPyFigs/1640295971717}
}

\caption{Recurrent kernels : Generated (yellow) BTC-USD left / HR right, versus historical (blue).}\label{fig:675}
\end{figure}

This method has a lot of forecasting applications, useful for professional purposes. However, in the context of time series forecasts, such a method faces a number of challenges. First, we are left with two extra parameters, $H$ and $P$. Secondly, it is not clear how to generate other realizations of the studied temporal series. As a consequence, it is not clear neither how to generate a pertinent mean estimator using this construction. Finally, we don't have any argument to ensure the stability of the recurrent scheme \ref{eq:RK}.


## Monte Carlo pricing

We show that synthetic data generated in section \ref{time-series-forecasting}, illustrated in Figure \ref{plot4}, can be used for Monte-Carlo based pricing \footnote{An alternative pricing method here could be to consider the Kolmogorov, or Black-Scholes, partial differential equations, similar to a Cox tree, but for any number of underlying assets, as already presented in \cite{LeFloch-Mercier:2020a} and ref. therein.}, as they share close statistical properties with simulated paths from known processes. 

### Experiment settings


Consider a bivariate geometric Brownian motion (gBm) with initial values \(X_0=[100, 120]\):
\begin{equation} \label{gbm}
dX_t = \sigma X_t\,dW_t,
\end{equation}
where $W_t$ is a two-dimensional Brownian motion with given correlation $0.5$ and the volatilities are $\sigma=[0.1,0.2]$. Consider a standard basket option's payoff as the following function
\begin{equation} \label{PVO}
  P(x) = \max(x \cdot\omega - K, 0)
\end{equation} where \(x\) are the input market data, \(\omega= [0.5,0.5]\) are
weights, \(K = 108\) is called the option's strike.

We define the reference value for this test as $\mathbb{E}(P(X_T) | X_0)$, the maturity \(T\) is set to one year ($1Y$), which can be computed using Monte-Carlo methods.

#### Reproducing a Bivariate Gaussian 

We simulate a trajectory $X \in \mathbb{R}^{1,2,1000}$ of a gBm \eqref{gbm} path. Following the section \ref{time-series-forecasting}, we compute the log-normal returns, which are random samples of a bivariate normal distribution. We then use \eqref{Gz} to produce 1000 random samples, $Z = G^{\mathbb{X}_x} \in \mathbb{R}^{1000,2}$. Table \ref{statgbm} shows that both distributions $\mathbb{X}_x,\mathbb{Z}_z$ match.


In [None]:
params = get_bgm_params(**get_params_BSMultiple(**get_params_MC(**get_wilmott_param())))
RGBM,a,b = RGBM(**params)


In [None]:
pyresults <- py$RGBM
knitr::kable(pyresults, caption = "\\label{statgbm}Stats for BGM", escape = FALSE)  %>%
      kable_styling(latex_options = c("repeat_header","HOLD_position"))


#### Basket option pricing

In this paragraph we study numerically the convergence properties of our approach as the size of the training set, that is the size of the observed gBm samples, increases. We consider as input two sequences of observed gBm $X^n_1,X^n_2$, having varying size $n=100,\ldots,100\times N$, with $N=10$. The sets $X^n_1$ (resp. $X^n_2$) correspond to a randomly sampled gBm (resp. sharp-discrepancy sequences), as in section \ref{one-dimensional-static-distribution}. 

Our approach follows then the steps described in section \ref{time-series-forecasting} : we generate IID samples $Z_1^n\in \mathbb{R}^{10000,2}$ (resp. SDS sample $Z_2^n\in \mathbb{R}^{10000,2}$), and we evaluate the option's price at the maturity time $T$, using $\frac{1}{N_z}\sum_{n=0}^{N_z}P(z^n)$ in order to approximate the option's reference value. The results are plot in Figure \ref{convergence}.


In [None]:
T_bsm_multiple_scenario(**params)



The blue dashed lines in Figure \ref{convergence} is the reference price computed using a Monte Carlo method, the red (resp. green) lines correspond to the prices computed using a random generated sample $Z_{1}^n$ (resp. SDS $Z_{2}^n$). Note that the results are within the Monte-Carlo's statistical error, that are the dotted line in the figure. 




## P\&L explanation

### Experiment settings

We illustrate our approach with an application of real time P\&L explanation for large multi-asset portfolios, which we outline here for a better understanding in the one-dimensional case.

Consider a function $P(t,x)\in \RR^{D_P}$, $x \in \RR^{D}$, corresponding to an external engine pricing a portfolio of $D_P$ instruments, assuming that the risk sources values are $x$ at time $t$. Pricing engines are often computationally intensive and can hardly be used in real-time. This experiment proposes a quicker alternative using nightly batches. For illustrative purposes, we consider a pricing engine taken as a Black-Scholes formula with predefined values, see below.

Consider a historical market data set, for this test consisting of $253$ closing values, denoted $x^{-252},\ldots,x^{0}$, for the S\&P500, during the period of time \(t^{-252}=\) June 1, 2021 and \(t^{0}=\) June 1, 2022, retrieved from Yahoo Finance. Thus, considering \eqref{TS}, this set is described by a tensor with \(N_x=1, D=1, T_x = 253\).

We use the historical data set to produce synthetic data at a future horizon date $t^1 = t^0 + 4$ days, following section \ref{time-series-forecasting}, simulating night-batch computed Value-at-Risk (VaR) scenario. We also produce similarly a test set $Z \in \RR^{N_z}$ with the same method, corresponding to simulated, real time data at date $t^1$.

To benchmark our approach, we compute the P\&L on the test set \(Z\) using three methods:

\begin{itemize}
\item Analytical P\&L : it is computed as $P(t^1,Z) - P(t^0,x^0)$, and is the reference values for our tests.
\item Predicted P\&L: the price function $P(t^1,Z)$ is approximated using the formula \eqref{projection}, as $\mathcal{P}_{k}(X,X,Z)P(t^1,X) - P(t^0,x^0)$.
\item Taylor approximation: the price function $P(t^1,Z)$ is computed using a second order Taylor formula approximation around $P(t^0,x^0)$. \footnote{We compare to a Taylor approximation, as this method is currently used by some banks to estimate their P\&L on a real time basis.}
\end{itemize}



### Training set

According to \eqref{error}, the interpolation error committed by the projection operator \(P_k\) \eqref{projection}, defined on a set \(X\), is driven at any point \(z\) by the quantity \(d_k(z,X)\). We plot at Figure \ref{plot10} the isocontours of this error function for two distinct sets.

\begin{itemize}
\item (a) $X$ is generated as VaR scenarios for the three dates $t^{-1},t^0,t^1$.
\item (b) $X$ is the historical data set.
\end{itemize}

The blue dots in Figure \ref{plot10} are the test set $Z$, and corresponds to simulated, intraday, market values.

It is clear from this picture that the interpolation error is smaller if we consider the VaR scenario dataset on the left-hand side. Indeed, since banks must produce VaR data for regulatory constraints, such data are available, and we considered them as training set in this paper to extrapolate the P\&L. We could use only the historical data set, at the expense of less accurate results. Note that this situation might be of interest, if only historical data are available.


In [None]:
params_option = option_param_var().get_param()
results = maturity_scenario_amine(**params_option)


In [None]:
training_set_generation(params_option,results)



Notice finally that there are three sets of red points at Figure \ref{plot10}-(a), as we considered VaR scenarios at three different times $t^{-1},t^0,t^1$ since we are interested in approximating time derivatives for risk management, as the theta $\partial_t P$, see section below.

### P\&L explanation of S\&P 500 options

To benchmark our results, we considered the following values: the S\&P value is $x^0=4101$, as of date $t^0$=June 1, 2022. The pricing engine is taken as a Black-Scholes formula $C(x,K,r,T,\sigma)$, with strike $K=4050$, which is near to the spot's value, volatility $\sigma = 25\%$, and without risk-free interest rate ($r=0$). We considered two maturities: a short one $T=10$ days, and a longer one $T=365$ days.

\textbf{European option P\&L}. 

We plot the results of three methods on the test set $Z$ (exact P\&L, our approximation, Taylor approximation) in Figure \ref{plot11} for two maturities. 


In [None]:
european_pnl(params_option,results)



We notice that the output values are accurate especially for a short-term maturity where the predicted P\&L is more precise than the P\&L computed using the Taylor approximation especially deep in the money (DIM) and deep out of the money (DOM). Indeed, this method is more competitive than a Taylor approximation as the pricing function becomes nonlinear.

Table \ref{tab:506} shows the values of error between analytical P\&L and predicted P\&L and also between analytical P\&L and Taylor approximation for different maturity scenarios, where the computed error is the relative mean squared error (RMSE) expressed in percentage
\begin{equation}
 RMSE(f,g) = \frac{\|f-g\|_{\ell^2}}{\|f\|_{\ell^2}+\|g\|_{\ell^2}}
\end{equation}


In [None]:
error_codpy = european_pnl_error(results)



In [None]:
pyresults <- py$error_codpy
knitr::kable(pyresults, caption = "PnL error in percentage ", escape = FALSE)  %>% 
      kable_styling(latex_options = c("repeat_header","HOLD_position"))


\textbf{European greeks.}

Using the differential operators (see \eqref{grad}), we approximate the first and second order derivatives of $P(t^1,Z)$, called greeks. The Figure \ref{plot12} plots $\partial_x P$ ( called Delta), $\partial_t P$ (theta), and the Figure \ref{plot13} plots the second order derivatives $\partial_x^2 P$ (gamma), $\partial_t \partial_x P$. For the delta, we added the linear, Taylor approximation (the green line).  We notice that the results of our machine learning technique are accurate, especially for DIM and DOM where Taylor approximation tends to diverge. 


In [None]:
results = single_scenario(**params_option)
single_scenario_exact_deltas_output_amine(results,**params_option)


In [None]:
single_scenario_exact_gammas_output_amine(results,**params_option)



## The Bachelier problem


**Problem description**. This section provides a benchmark of the methods \@ref(eq:Pi) approximating the conditional expectation \@ref(eq:CE) for the Bachelier problem, which we describe now. Consider a martingale process $t \mapsto X(t) \in \mathbb{R}^D$, given by the Brownian motion $dX=\sigma dW_t$, where the matrix $\sigma \in \mathbb{R}^{D \times D}$ is randomly generated. The initial condition is $X(0)=(1,\cdots,1)$, w.l.o.g. Consider two times $1=t^1<t^2=2$, $t^2$ being the maturity of an option, which is a function denoted $f(x) = \max(b(x)-K,0)$, where $K=1.1$, $b(x)  = x \cdot a$ with random weights $a \in \mathbb{R}^D$. It is straightforward to verify that $b(x)$ follows a Brownian motion $db = \theta dW_t$. To get a fixed value for $\theta$ (fixed to 0.2 in our tests), we normalize the diffusion matrix $\sigma$ above.

With these settings, the conditional expectation \@ref(eq:CE) can be determined using a closed formula, providing the reference value 
$$
  f(x) = \theta \sqrt{t^2 - t^1} pdf(d) + (b(x)-K) cdf(d),\qquad d(x,K)  = \frac{b(x)-K}{\theta \sqrt{t^2 - t^1}}, (\#eq:BACH)
$$
PDF (resp. CDF) holding for the probability density function (resp. cumulative) of the normal law.


### Methodology and input/output data

We test different numerical methods implementing \@ref(eq:Pi), with the following inputs:


* $X \in \mathbb{R}^{ N_X \times D}$, is given by iid samples of the Brownian motion $X(t^1)$ at time $t^1 = 1$. The reference values are $f(Z | X) \in \mathbb{R}^{ N_X \times 1}$, computed using \@ref(eq:BACH).
* $Z \in \mathbb{R}^{ N_Z \times D}$ is an iid realization of the Brownian motion $X(t^2)| X(t^1)$ at time $t^2 = 2$, since $f(Z) \in \mathbb{R}^{ N_Z \times 1}$ are the functions values.

  
For each method, the output are  $f_{Z|X} \in \mathbb{R}^{ N_Z \times D_f}$ approximating \@ref(eq:CE), hence are compared to $f(Z|X)$ in our experiments. We plot the generated learning and test set in picture \ref{fig:Bachelier}, comparing the observed variable \(f_{Z}\) and the reference values \(f(Z|X)\). Thus the problem can be stated as : knowing the noisy data in the left-hand side, deduce the one at right.

\begin{figure}
\centering
\includegraphics[width=100mm]{CodPyFigs/Bachelier.pdf}
\caption{Bachelier problem. Left training set $b(Z),f(X)$, right test set $b(X),f(Z|X)$.}
\label{fig:Bachelier}
\end{figure}


### Four methods to tackle the Bachelier problem

We compare four methods for the Bachelier problem. Two methods are based on a standard approach, that uses predictive machines of the form \@ref(eq:Pms), in order to approximate a conditional expectation \@ref(eq:Pi) as

$$
	f_{Z | X} = \mathcal{P}_{m}(Z,Y,X, f(Z))(\#eq:pmB)
$$
The first machine $m$ is a neural network method, the second is a kernel one, labeled ANN and CodPy pred in the figures. The third machine solves \@ref(eq:Pixy), labeled Pi:iid in the figures. The fourth provides a similar approach, but picks up $X$ (resp. $Z$) as the sharp discrepancy sequences (SDS) of $X(t^1)$ (resp. $X(t^2)$) and is labeled Pi:sharp in our figures. 

To illustrate a typical benchmark run of one of these four methods, Figure \ref{fig:994} shows the predicted values \(f_{Z|X}\) against the exact ones \(f(Z|X)\), as functions of the basket values \(b(Z)\), for the last method (SDS). We show five runs of the method with $N_X = N_Z = 32,64,128,256,512$.

\begin{figure}
\centering
\includegraphics[height=50mm,width=150mm]{CodPyFigs/BachelierSharp.pdf}
\caption{\label{fig:994}Exact and predicted values for sharp discrepancy sequences}
\end{figure}

Figure \ref{fig:995} presents a benchmark for scores, computed accordingly to the RMSE \% \@ref(eq:rmse) (lower is better), for the two dimensional case $D=2$, however the results are similar whatever the dimensions are. 

\begin{figure}
\centering
\includegraphics{CodPyFigs/BachelierScores.pdf}
\caption{\label{fig:995}Benchmark of scores}
\end{figure}

### Concluding remarks

We emphasize that the axis in Figure \ref{fig:995} is in log-scale of the size of the training \(N_x\). This test shows numerically that both predictive methods based on \@ref(eq:Pms) are not converging. The method Pi:iid (in yellow color) shows a performance profile which has a convergence pattern at the statistical rate $\frac{1}{\sqrt{N_X}}$, that is, the one expected with randomly sampled data. The method Pi:sharp (in green color) is an illustration of performance gains when using the proposed sharp discrepancy sequences.
