In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn') 
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import helper
helper.ENCODER_BASE_NAME='stacked_autoencoder_tf2'

The swaption volatility data are stored in an csv file in the 'input_data' directory. The helper function ``get_df_from_csv(filename)`` read this data and parse some auxillary data from the header.
Note that in our git only an example data set can be found. The images shown in the following are build on the larger data set decripted in the paper. You are invited to use your own data set.

In [None]:
from helper import get_df_from_csv
dfInput=get_df_from_csv('EURSWVOLN.csv')

With the ``plot_swaption_volas(dfInput, trade_date)`` function you can plot all volatilities at given date, in the following example the 28.12.2018. (All plot functions save the images in then 'images' directory.)

In [None]:
from helper import plot_swaption_volas
plot_swaption_volas(dfInput, '20181228')

or just the feature vector with ``plot_feature_vector(dfInput, trade_date)``

In [None]:
from helper import plot_feature_vector
plot_feature_vector(dfInput, '20181228')

As explained in the paper we implanted anomalies at the 20Y-05Y-100bps point at 28.12.2018 an try to find them (at least the biggest ones) with the algorithm. The implanted data sets are saved at synthetic dates in 2018.

In [None]:
from helper import plot_swaption_volas2
trade_dates=['18000101','18000102','18000103','18000104','18000105','20181228']
labels=['20','10','5','3','2','original']
optionPeriodSymbol='20Y'
swapPeriodSymbol='05Y'
plot_swaption_volas2(dfInput, trade_dates, labels, optionPeriodSymbol, swapPeriodSymbol)

To get an impression about the dimensionality of the problem. One way is to peforme an PCA (the special case of an autoencoder with linear activation function), which can easyly done with sklearn, and have a look at the sum of the variance of the principal components.

In [None]:
from sklearn.decomposition import PCA
pca=PCA()
pca.fit(dfInput['data'].values)
explained_variance_ratio_cumsum=pd.DataFrame(data=pca.explained_variance_ratio_.cumsum())
print(explained_variance_ratio_cumsum[:15])
explained_variance_ratio_cumsum.plot()

So with 5 principal components more then 91% of the variance are explained and with 10 more then 97%. In the following we autoencoder with 5 and 10 bottlenecks, with and without denoising and compare the results.

In [None]:
n_bottlenecks=[5,5,10,10]
noise_stddevs=[0,0.01,0,0.01]

As a first step we create an 90% train an 10% test set, where the inplanted data points (1800) are neglected. 

In [None]:
dataset_synth=dfInput['data'][dfInput['data'].index<'1900']
dataset_train=dfInput['data'][dfInput['data'].index>'1900'].sample(frac=0.9,random_state=42)
dataset_test = dfInput['data'][dfInput['data'].index>'1900'].drop(dataset_train.index)
print("test.shape: {}".format(dataset_test.shape))
print("train.shape: {}".format(dataset_train.shape))
print("synthetic.shape: {}".format(dataset_synth.shape))

In [None]:
from neuralnetwork_tf2 import calibrate_stacked_autoencoder
n_epoch=10000
n_bottlenecks=[5,5,10,10]
noise_stddevs=[0,0.01,0,0.01]
for n_bottleneck, noise_stddev in zip (n_bottlenecks, noise_stddevs):    
    input_train=dataset_train.values
    input_test=dataset_test.values
    calibrate_stacked_autoencoder(n_epoch, input_train=input_train, input_test=input_test, n_bottleneck=n_bottleneck, noise_stddev_=noise_stddev)    

The function ``get_prediction_from_model(dfInput, noise_stddev, n_bottleneck, index_for_statistics)`` 

In [None]:
from neuralnetwork_tf2 import get_prediction_from_model
dfResults={}
for n_bottleneck, noise_stddev in zip (n_bottlenecks, noise_stddevs):
    encoder_name, dfResult = get_prediction_from_model(dfInput, noise_stddev, n_bottleneck, dfInput['data'].loc[dfInput['data'].index>'1900'].index)
    dfResults[encoder_name] = dfResult

In [None]:
from helper import plot_hist
for n_bottleneck, noise_stddev in zip (n_bottlenecks, noise_stddevs):
    plot_hist(dfResults, dataset_train.index, dataset_test.index, n_bottleneck, noise_stddev)