# Volatility and Correlations

In this notebook, we calculate risk factor returns and estimate volatilities and correlations.

In [None]:
import lzma
import numpy as np
import pandas as pd

In [None]:
from describe_data_set import \
    describe_values, \
    describe_dates, \
    normalised_term, \
    plot_data_table, \
    plot_volatility_distribution, \
    plot_correlation_distribution

We use stored consolidated data. The stored data originates from [ParseInputData](ParseInputData.ipynb) notebook.

In [None]:
path = "../data/consolidated/"
file_name = "data_set_full.csv"

zipped_file_name = file_name + ".xz"

In [None]:
with lzma.open(path + zipped_file_name) as zf:
    file_content = zf.read()
    with open(path + file_name, "wb") as f:
        f.write(file_content)

In [None]:
data_set_full = pd.read_csv(path + file_name)
data_set_full["DATE"] = pd.to_datetime(data_set_full["DATE"])
data_set_full["TERM"] = data_set_full["TERM"].fillna("")
print(data_set_full.shape)

## Data Selection

In [None]:
first_date = pd.Timestamp("2004-09-07")
last_date = pd.Timestamp("2024-04-30")

currencies = [ "EUR", "USD", "GBP", "USD-EUR", "GBP-EUR" ]
# terms = ["", "1Y", "5Y", "10Y"]
terms = ["", "1Y", "2Y", "5Y", "10Y", "20Y"]

In [None]:
data_set = data_set_full[
    (data_set_full["DATE"] >= first_date) &
    (data_set_full["DATE"] <= last_date) &
    data_set_full["CURRENCY"].isin(currencies) &
    data_set_full["TERM"].isin(terms)
]

We check that the available data are of equal length and that there are no gaps (except for weekends and bank holidays).

In [None]:
describe_dates(data_set)

We also double-check that available data is plausible.

In [None]:
describe_values(data_set)

## Data Interpolation

Data gaps due to weekends and bank holidays require special treatment.

We choose to model time a calender time. Furthermore, we assume that time series correspond to continuous stochastic processes. As proxy for the actual time series at gap dates we choose linear interpolation of neighbouring data points.

In [None]:
empty_cols = pd.MultiIndex(levels=[[],[],], codes=[[],[],], names=["CURRENCY", "MONTHS",])
all_days = pd.DataFrame(index=pd.date_range(first_date, last_date, freq='d'), columns=empty_cols)
data_table = pd.pivot_table(data_set, index="DATE", columns=["CURRENCY", "MONTHS"], values="VALUE", aggfunc="sum")
data_table = pd.merge(all_days, data_table, left_index=True, right_index=True, how="left")
data_table = data_table.interpolate(method='linear', axis=0)

## Data Inspection

To get some intuition about the dat we plot normalised time series for FX rates and interest rates.

FX rates are converted to log-rates.

Furthermore, FX log-rates and interest rates are shifted to start in zero. This aims at making time series plots comparable.

In [None]:

plot_data_table(data_table[["USD-EUR", "GBP-EUR"]]).show()
plot_data_table(data_table[["EUR",]]).show()
plot_data_table(data_table[["USD",]]).show()
plot_data_table(data_table[["GBP",]]).show()

## Return Calculation

We calculate $n$-day overlapping returns. For this analysis, we set $n=30$. That is, returns correspond to monthly returns.

Selection of non-overlapping returns and sub-sampling is handled by subsequent data analysis.

For FX rates, we calculate log-prices and consequently log-returns.

In [None]:
return_days = 30

In [None]:
table = data_table.copy()
table["USD-EUR"] = np.log(table["USD-EUR"])
table["GBP-EUR"] = np.log(table["GBP-EUR"])

In [None]:
return_table = pd.DataFrame(
    index = table.index[return_days:],
    columns = table.columns,
    data = table.values[return_days:,:] - table.values[:-return_days,:],
)

The properties of the overlapping return sample provide a first indication of the data.

In [None]:
return_table.describe().T

Recall, that standard deviation is calculated for monthly returns. A corresponding annualised volatility can be calculated by multiplying standard deviation by $\sqrt{365/30}\approx 3.5$.

## Volatility for Non-Overlapping Returns

Overlapping returns exhibit considerable auto-correlation. To eliminate this feature from the data set we select sub-samples of non-overlapping returns.

For our data set of daily observations and $n$-day returns we can select $n$ subsamples of non-overlapping returns.

There is no natural criteria which subsample is to be used for estimating properties of returns. Consequently, we calculate standard deviation and annualised volatility for each sub-sample. The distribution across sub-samples is used as an indication of the uncertainty for our estimate.

In [None]:
std_table = pd.DataFrame()
end_idx = return_table.shape[0]
for offset in range(return_days):
    std_table[offset] = return_table.iloc[offset:end_idx:return_days].describe().T["std"]
std_table

We inspect the volatility distribution via box-plots.

In [None]:
plot_volatility_distribution(std_table.T[["USD-EUR", "GBP-EUR"]], return_days).show()
plot_volatility_distribution(std_table.T[["EUR",]], return_days).show()
plot_volatility_distribution(std_table.T[["USD",]], return_days).show()
plot_volatility_distribution(std_table.T[["GBP",]], return_days).show()

## Correlations of Non-Overlapping Returns

For correlation estimation, we also use sub-samples of non-overlapping returns.

Correlations are claculated for risk factor pairs. We will analyse the following sets of risk factor pairs:
  - Interest rate risk factors within a single currency,
  - FX versus FX risk factors,
  - FX versus interest rates risk factors, and
  - interest rate risk factors from different currencies.

In [None]:
def correlations(
    return_table,
    col_pair_list,
    return_days,
    ):
    table = pd.DataFrame(
        columns = [ p[0][0]+"_"+normalised_term(p[0][1]/12)+"__"+ p[1][0]+"_"+normalised_term(p[1][1]/12) for p in col_pair_list],
        index = range(return_days)
    )
    end_idx = return_table.shape[0]
    for offset in range(return_days):
        tmp = return_table.iloc[offset:end_idx:return_days]
        table.loc[offset] = [ tmp[c[0]].corr(tmp[c[1]]) for c in col_pair_list ]
    return table


def make_correlation_idx_list(cols):
    idx_list = []
    for i in range(len(cols)):
        for j in range(i+1,len(cols)):
            idx_list.append( (cols[i], cols[j]) )
    return idx_list

### Single Currency Interest Rate Correlations

In [None]:
s = return_table.columns.get_loc_level("EUR")[0]
col_pair_list = make_correlation_idx_list(return_table.columns[s])
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

In [None]:
s = return_table.columns.get_loc_level("USD")[0]
col_pair_list = make_correlation_idx_list(return_table.columns[s])
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

In [None]:
s = return_table.columns.get_loc_level("GBP")[0]
col_pair_list = make_correlation_idx_list(return_table.columns[s])
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

### FX Correlations

In [None]:
s = return_table.columns.get_loc_level("GBP-EUR")[0]
gbp_eur = return_table.columns[s][0]
gbp_eur = ('GBP-EUR', 0)
usd_eur = ('USD-EUR', 0)

#### GBP-EUR versus USD-EUR

In [None]:
col_pair_list = [(gbp_eur, usd_eur),]
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

#### GBP-EUR versus Rates

In [None]:
idx = return_table.columns[return_table.columns.get_loc_level("EUR")[0]]
col_pair_list = [(gbp_eur, k) for k in idx]
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

In [None]:
idx = return_table.columns[return_table.columns.get_loc_level("GBP")[0]]
col_pair_list = [(gbp_eur, k) for k in idx]
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

In [None]:
idx = return_table.columns[return_table.columns.get_loc_level("USD")[0]]
col_pair_list = [(gbp_eur, k) for k in idx]
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

#### USD-EUR versus Rates

In [None]:
idx = return_table.columns[return_table.columns.get_loc_level("EUR")[0]]
col_pair_list = [(usd_eur, k) for k in idx]
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

In [None]:
idx = return_table.columns[return_table.columns.get_loc_level("USD")[0]]
col_pair_list = [(usd_eur, k) for k in idx]
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

In [None]:
idx = return_table.columns[return_table.columns.get_loc_level("GBP")[0]]
col_pair_list = [(usd_eur, k) for k in idx]
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

### Rates versus Rates

In [None]:
one = return_table.columns[return_table.columns.get_loc_level("EUR")[0]]
two = return_table.columns[return_table.columns.get_loc_level("USD")[0]]
col_pair_list = [(a, b) for a in one for b in two]
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

In [None]:
one = return_table.columns[return_table.columns.get_loc_level("EUR")[0]]
two = return_table.columns[return_table.columns.get_loc_level("GBP")[0]]
col_pair_list = [(a, b) for a in one for b in two]
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

In [None]:
one = return_table.columns[return_table.columns.get_loc_level("GBP")[0]]
two = return_table.columns[return_table.columns.get_loc_level("USD")[0]]
col_pair_list = [(a, b) for a in one for b in two]
plot_correlation_distribution(correlations(return_table, col_pair_list, return_days))

## Save Result Data

We save the volatility and correlation estimates for subsequent model calibration.

### Volatilities

In [None]:
path = "../data/output/"

In [None]:
file_name = "standard_deviation_" + str(return_days) + "days.csv"
std_table.to_csv(path + file_name)

### Correlations

In [None]:
col_pair_list = make_correlation_idx_list(return_table.columns)
multi_idx = pd.MultiIndex.from_tuples([ (t[0][0], t[0][1], t[1][0], t[1][1]) for t in col_pair_list ], names=["CURRENCY1", "MONTHS1", "CURRENCY1", "MONTHS2"])
corrs_table = correlations(return_table, col_pair_list, return_days)
corrs_table.columns = multi_idx
# corrs_table.T

In [None]:
file_name = "correlations_" + str(return_days) + "days.csv"
corrs_table.T.to_csv(path + file_name)