# Data Analysis of Interest Rates

This notebook shows a basic data analysis of the interest rate series. It starts by simple plotting the series and then
analyses its structure.

## Plotting Interest rates

First, we look at the interest rates and the cumulative sum.

In [None]:
train_start = '2018-10-01 00:00:00'
train_end = '2020-01-01 00:00:00'

In [None]:
import pandas as pd
import numpy as np
import cufflinks as cf

cf.go_offline()

df = pd.read_csv('../data/interest_rates_p1.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace=True)
df['cum_interest_rate'] = df['interest_rate'].cumsum()
df

In [None]:
df['train_interest_rate'] = df['interest_rate']
df['test_interest_rate'] = df['interest_rate']
df.loc[df.index >= pd.to_datetime(train_end), 'train_interest_rate'] = np.nan
df.loc[df.index < pd.to_datetime(train_end), 'test_interest_rate'] = np.nan
fig = df[['train_interest_rate', 'test_interest_rate']].iplot(title='Interest rate history',
                                                              yaxis_title='Interest rate in %', asFigure=True)
fig.update_layout(yaxis=dict(tickformat=".2%"))
fig

Plotting the cumulative sum of interest rates.

In [None]:
df['cum_train_interest_rate'] = df['cum_interest_rate']
df['cum_test_interest_rate'] = df['cum_interest_rate']
df.loc[df.index >= pd.to_datetime(train_end), 'cum_train_interest_rate'] = np.nan
df.loc[df.index < pd.to_datetime(train_end), 'cum_test_interest_rate'] = np.nan
fig = df[['cum_train_interest_rate', 'cum_test_interest_rate']].iplot(title='Cumulative Interest rate history',
                                                                      yaxis_title='Interest rate in %', asFigure=True)
fig.update_layout(yaxis=dict(tickformat=".2%"))
fig

Looking at the cumulative interest rates yields a return of 5 % by simple being long all the time.

## Modelling
Assuming that the interest rates series is stationary (for simplicity it is not shown that it is) we can use the partial
auto-correlation for determining the number of lags for predicting the next interest rate. This will give us a better
understanding of the underlying series process.

In [None]:
# separate data into training and test subsets
train_df = df[df.index < pd.to_datetime(train_end)]
test_df = df[df.index >= pd.to_datetime(train_end)]

## Partial Autocorrelation
Partial Autocorrelation is used to determine the number of significant lags (if modelled linearly) for predicting the
next interest rate. (More details can be found under "Auto Regressive Models") This gives us a better understanding of
needed interest rates used as input. However, this does not have to be the optimal number of lags for determining the
trading action.

In [None]:
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.graphics.tsaplots import plot_acf
from matplotlib import pyplot as plt

# Plot the auto-correlation of the interest rates
# Shows the correlation of a sequence with itself
plot_acf(train_df['interest_rate'], lags=10)
plt.show()

# Shows the correlation of a sequence with itself but only shows the direct effect of one lag to the next interest rate
plot_pacf(train_df['interest_rate'], lags=10)
plt.show()


The Partial Autocorrelation shows us that the number of significant lags (if modelled linearly) is 6.

## Linear Regression
In this section we will model the interest rate history as an Auto Regressive model. This will give us some initial
weights for our neural network.

In [None]:
def normalize_interest_rate(interest_rate):
    return (interest_rate - 0.002) / 0.002 + 1


lags = 6

for lag in range(lags):
    # normalize interest rate and shift it
    df[f'interest_rate_{lag}'] = normalize_interest_rate(df['interest_rate'].shift(lag))

# target value of linear regression
df['next_interest_rate'] = df['interest_rate_0'].shift(-1)

train_df = df[df.index < pd.to_datetime(train_end)]
test_df = df[df.index >= pd.to_datetime(train_end)]
train_df

In [None]:
from sklearn.linear_model import LinearRegression

lr = LinearRegression(fit_intercept=False)
ranged = train_df[[f'interest_rate_{lag}' for lag in range(lags)] + ['next_interest_rate']].dropna()
lr.fit(ranged[[f'interest_rate_{lag}' for lag in range(lags)]], ranged['next_interest_rate'])
pd.Series(lr.coef_).iplot(kind='bar',
                          title='Coefficients Interest rate prediction, Training',
                          yaxis_title='Coefficients in decimal',
                          xaxis_title='Lags Interest Rate')
print(f"Training Coefficients: {lr.coef_}")

lr = LinearRegression(fit_intercept=False)
ranged = test_df[[f'interest_rate_{lag}' for lag in range(lags)] + ['next_interest_rate']].dropna()
lr.fit(ranged[[f'interest_rate_{lag}' for lag in range(lags)]], ranged['next_interest_rate'])
pd.Series(lr.coef_).iplot(kind='bar',
                          title='Coefficients Interest rate prediction, Test',
                          yaxis_title='Coefficients in decimal',
                          xaxis_title='Lags Interest Rate')
print(f"Test Coefficients: {lr.coef_}")

## Tasks

1. Load a different product (e.g. p2 or p3) and analyze the Correlations there. Are they similar?


