# Assingment 4

1. Please estimate a theoretical standard Hungarian university student's daily 95% VaR, and 1 year VaR 99% events and if applicable, their monetary values. The text answers will be judged by the relevance of the example, they need not be personal or true for a concrete person.
2. Download the daily closing price history of ZWACK from WSJ (https://www.wsj.com/market-data/quotes/HU/XBUD/ZWACK/historical-prices/download?MOD_VIEW=page&num_rows=3768.9583333333335&range_days=3768.9583333333335&startDate=01/01/2010&endDate=04/27/2020) between 2010.01.01 and 2020.04.24. Plot the time series. Compute the daily absolute price return time series, and plot its cumulative distribution function. Determine the daily:

  - standard deviation
  - lower and upper VaR values at 99% confidence,
  - lower and upper CVaR values at 95% confidence,
  - expected shortfall value at 95% confidence

Summarize your observations on the usefullness of these quantities as a risk measure in a single paragraph.



## Solution

### Example 1
The Value-at-Risk measure is used to predict the loss of an investment with the given probability. In the student example, we have to formulate a reasonable definition for the "value of university students", to estimate their VaR values.

Such a definition can be given, by considering the relation of the daily amount of money spent by the university student during his studies and his expected future earnings. In this framework, by "investing in a student" we mean the amount of money he or she spends, and his or her return is calculated as his or her expected future earnings. The expected earning of a student is constantly growing in time, as he or she progresses in the studies.

This money, spent by the student can fluctuate around a mean with some standard deviation. If our investment spends more, it will be lossmaking. However if it spends less, it pays off more easily. VaR could be calculated by measuring this fluctuation, in relation to its payoff. The derivation of the exact function which relates these two values are left to the reader as an excersice.

### Example 2

In [None]:
import os
import numpy as np
import pandas as pd
from scipy.stats import norm

import seaborn as sns
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt

from tabulate import tabulate

#### Just some matplotlib and seaborn parameter tuning

In [None]:
out = '.\\out\\'
figsave_format = 'png'
figsave_dpi = 200

axistitlesize = 22
axisticksize = 17
axislabelsize = 26
axislegendsize = 18

# Set axtick dimensions
major_size = 6
major_width = 1.2
minor_size = 3
minor_width = 1
mpl.rcParams['xtick.major.size'] = major_size
mpl.rcParams['xtick.major.width'] = major_width
mpl.rcParams['xtick.minor.size'] = minor_size
mpl.rcParams['xtick.minor.width'] = minor_width
mpl.rcParams['ytick.major.size'] = major_size
mpl.rcParams['ytick.major.width'] = major_width
mpl.rcParams['ytick.minor.size'] = minor_size
mpl.rcParams['ytick.minor.width'] = minor_width

# Seaborn style settings
sns.set_style({'axes.axisbelow': True,
               'axes.edgecolor': '.1',
               'axes.facecolor': 'white',
               'axes.grid': True,
               'axes.labelcolor': '.15',
               'axes.spines.bottom': True,
               'axes.spines.left': True,
               'axes.spines.right': True,
               'axes.spines.top': True,
               'figure.facecolor': 'white',
               'font.family': ['sans-serif'],
               'font.sans-serif': ['Arial',
                'DejaVu Sans',
                'Liberation Sans',
                'Bitstream Vera Sans',
                'sans-serif'],
               'grid.color': '.8',
               'grid.linestyle': '--',
               'image.cmap': 'rocket',
               'lines.solid_capstyle': 'round',
               'patch.edgecolor': 'w',
               'patch.force_edgecolor': True,
               'text.color': '.15',
               'xtick.bottom': True,
               'xtick.color': '.15',
               'xtick.direction': 'in',
               'xtick.top': True,
               'ytick.color': '.15',
               'ytick.direction': 'in',
               'ytick.left': True,
               'ytick.right': True})

#### Load dataset

In [None]:
data_dir = './/data//'
data_file = data_dir + os.listdir(data_dir)[0]
zwack_data = pd.read_csv(data_file)

In [None]:
zwack_data.head()

In [None]:
zwack_data.tail()

#### Reverse DataFrame to be in the correct order by dates

In [None]:
zwack_data = zwack_data[::-1]
zwack_data.index = pd.RangeIndex(start=0, stop=zwack_data.index.start+1, step=1)

In [None]:
cols = zwack_data.columns

In [None]:
date_jump = 90

In [None]:
# Reverse all data into the correct order by time
dates = np.array(zwack_data[cols[0]])
zw_open = zwack_data[cols[1]]
zw_close = zwack_data[cols[4]]
zw_high = zwack_data[cols[2]]
zw_low = zwack_data[cols[3]]

#### First visualization

In [None]:
nrows = 2
ncols = 2
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(ncols*18, nrows*10),
                         facecolor='black', subplot_kw=dict(facecolor='black'))
fig.subplots_adjust(hspace=0.25, wspace=0.15)

fill_alpha = 0.2

y_values = [zw_open, zw_close, zw_high, zw_low]
labels = ['Open values', 'Close values', 'High values', 'Low values']
colors = ['yellow', 'tab:orange', 'tab:green', 'tab:red']

for i in range(nrows):
    for j in range(ncols):
        axes[i][j].plot(dates, y_values[i*ncols + j],
                        c=colors[i*ncols + j], lw=3)
        axes[i][j].fill_between(dates, y_values[i*ncols + j],
                                color=colors[i*ncols + j], alpha=fill_alpha)

for i in range(nrows):
    for j in range(ncols):
        # Source text
        axes[i][j].text(x=0.12, y=-0.17, s='Source of data: https://www.wsj.com/market-data/',
                        c='white', fontsize=13, fontweight='book',
                        horizontalalignment='center', verticalalignment='center', transform=axes[i][j].transAxes,
                        bbox=dict(facecolor='black', alpha=0.2, lw=0))

        axes[i][j].set_title('ZWC1 values -- {}'.format(labels[i*ncols + j]), 
                       fontsize=axistitlesize, fontweight='bold', color='white')

        axes[i][j].set_xlabel('Time', fontsize=axislabelsize, color='white')
        axes[i][j].set_ylabel('Values', fontsize=axislabelsize, color='white')

        axes[i][j].tick_params(axis='both', which='major', labelsize=axisticksize, colors='white')
        axes[i][j].set_xticks(dates[::date_jump])
        axes[i][j].set_xticklabels(dates[::date_jump], rotation=42, ha='center')

        # Should be placed after setting x-ticks!!!
        axes[i][j].set_xlim(dates[0], dates[-1])
        axes[i][j].set_ylim(10000, None)

        #axes[i][j].legend(loc='lower right', fontsize=axislegendsize)

plt.savefig(out + 'time_series.png',
            format=figsave_format, dpi=figsave_dpi,
            facecolor='black', edgecolor='black')

plt.show()

### Calculating absolute price return and plotting its distribution and cumulative distribution

In [None]:
zwack_data['Return'] = zw_close - zw_close.shift()
zwack_data['AbsReturn'] = np.abs(zw_close - zw_close.shift())

zwack_data['ReturnPerc'] = zw_close.pct_change()
zwack_data['AbsReturnPerc'] = np.abs(zw_close.pct_change())

In [None]:
zwack_data.head()

Plot the return values

In [None]:
nrows = 1
ncols = 1
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(ncols*18, nrows*10),
                         facecolor='black', subplot_kw=dict(facecolor='black'))
fig.subplots_adjust(hspace=0.25, wspace=0.15)

fill_alpha = 0.2

axes.plot(dates, zwack_data['Return'], label='Daily returns',
          c='tab:olive', lw=3)
axes.fill_between(dates, zwack_data['Return'],
                  color='tab:olive', alpha=fill_alpha)

# Source text
axes.text(x=0.12, y=-0.17, s='Source of data: https://www.wsj.com/market-data/',
                c='white', fontsize=13, fontweight='book',
                horizontalalignment='center', verticalalignment='center', transform=axes.transAxes,
                bbox=dict(facecolor='black', alpha=0.2, lw=0))

axes.set_title('ZWC1 values -- {}'.format('Daily returns'), 
               fontsize=axistitlesize, fontweight='bold', color='white')

axes.set_xlabel('Time', fontsize=axislabelsize, color='white')
axes.set_ylabel('Values', fontsize=axislabelsize, color='white')

axes.tick_params(axis='both', which='major', labelsize=axisticksize, colors='white')
axes.set_xticks(dates[::date_jump])
axes.set_xticklabels(dates[::date_jump], rotation=42, ha='center')

# Should be placed after setting x-ticks!!!
axes.set_xlim(dates[0], dates[-1])
#axes.set_ylim(10000, None)

#axes.legend(loc='lower right', fontsize=axislegendsize)

#plt.savefig(out + 'time_series.png',
#            format=figsave_format, dpi=figsave_dpi,
#            facecolor='black', edgecolor='black')

plt.show()

Plotting the absolute price return and plotting its distribution

In [None]:
nrows = 1
ncols = 2
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(ncols*10, nrows*10))

axes[0].hist(zwack_data['AbsReturn'], bins=40, alpha=0.5, density=True)
axes[1].hist(zwack_data['AbsReturn'], bins=40, alpha=0.5, density=True, cumulative=True)

axes[0].set_yscale('log')

for i in range(ncols):
    axes[i].set_xlabel('Abs. price return', fontsize=axislabelsize)
    axes[i].set_ylabel('P(Abs. price return)', fontsize=axislabelsize)

    axes[i].tick_params(axis='both', which='major', labelsize=axisticksize)

plt.show()

Calculation the mean and standard deviation of the return values

In [None]:
rtrn_mean = np.mean(zwack_data['Return'])
rtrn_std = np.std(zwack_data['Return'])
rtrn_mean_prc = np.mean(zwack_data['ReturnPerc'])
rtrn_std_prc = np.std(zwack_data['ReturnPerc'])

Plotting the real distribution of the return values

In [None]:
nrows = 1
ncols = 2
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(ncols*10, nrows*10))

xrange = (-1000,1000)

ha = axes[0].hist(zwack_data['Return'], bins=40, alpha=0.5, density=True)
axes[1].hist(zwack_data['Return'], bins=40, alpha=0.5, density=True, cumulative=True)


# Plot a PDF over the distribution
# Calculate the parameters (mean and std)
sigma_conf = 3.5
x = np.linspace(rtrn_mean - sigma_conf*rtrn_std, rtrn_mean + sigma_conf*rtrn_std)
axes[0].plot(x, norm.pdf(x, rtrn_mean, rtrn_std), label='Fitted PDF',
             c='tab:red', lw=3)

# Mean of PDF
axes[0].axvline(x=rtrn_mean, label='Mean of returns',
                color='black', ls='-.', lw=3, alpha=0.6)

# Sigma conf
colors = ['tab:green', 'tab:olive', 'tab:purple']
for sig in range(1,4):
    axes[0].axvline(x=sig*rtrn_std, label='${0} \\sigma$ conf.'.format(sig),
                    color=colors[sig-1], ls='--', lw=3, alpha=0.6)
    axes[0].axvline(x=-sig*rtrn_std,
                    color=colors[sig-1], ls='--', lw=3, alpha=0.6)

axes[0].set_xlim(-1000, 1000)
    
for i in range(ncols):
    axes[i].set_xlabel('Price return', fontsize=axislabelsize)
    axes[i].set_ylabel('P(Price return)', fontsize=axislabelsize)

    axes[i].tick_params(axis='both', which='major', labelsize=axisticksize)

axes[0].legend(loc='upper left', fontsize=axislegendsize)
    
plt.show()

In [None]:
print('Expected value of return is {0:.3f} with sigma = {1:.3f}.'.format(rtrn_mean, rtrn_std))

### Calculating daily std

In [None]:
zwack_data['DailyMean'] = [np.mean([zw_high[i], zw_low[i]]) for i in range(0, len(zwack_data))]
zwack_data['DailyStd'] = [np.std([zw_high[i], zw_low[i]]) for i in range(0, len(zwack_data))]

In [None]:
zwack_data.head()

## - Calculating VaR at 99% Confidence interval <br /> - Determining lower and upper VaR in the discrete distribution

### VaR - Method I. - Normal distribution 

In [None]:
# Calculating VaR from the theoretical normal distribution of return prices
alpha = 0.99
VaR_99 = norm.ppf(1-alpha, rtrn_mean, rtrn_std)

In [None]:
print(tabulate([['99%', '+-{0:.3f} USD'.format(-VaR_99)]],
               headers=['Confidence interval', 'Value-at-Risk']))

The `+-` denotes the difference between definitions of VaR, whether it's defined positive, or negative.

In [None]:
nrows = 1
ncols = 1
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(ncols*15, nrows*10))

ha = axes.hist(zwack_data['Return'], bins=40, alpha=0.5, density=True)

# Plot a PDF over the distribution
# Calculate the parameters (mean and std)
sigma_conf = 4
x = np.linspace(rtrn_mean - sigma_conf*rtrn_std, rtrn_mean + sigma_conf*rtrn_std)
axes.plot(x, norm.pdf(x, rtrn_mean, rtrn_std), label='Fitted PDF',
             c='tab:red', lw=3)
axes.fill_between(x[x<=VaR_99], norm.pdf(x[x<=VaR_99], rtrn_mean, rtrn_std),
                  color='tab:orange', alpha=0.6, zorder=3)
axes.fill_between(x[x>VaR_99], norm.pdf(x[x>VaR_99], rtrn_mean, rtrn_std),
                  color='tab:grey', alpha=0.6, zorder=3)

# Mark 99% VaR
axes.axvline(x=VaR_99, label='Theor. 99% VaR',
             color='tab:orange', ls='-.', lw=4, alpha=0.6)

# Mean of PDF
axes.axvline(x=rtrn_mean, label='Mean of returns',
             color='black', ls='-.', lw=3, alpha=0.6)

# Sigma conf
colors = ['tab:green', 'tab:olive', 'tab:purple']
for sig in range(1,4):
    axes.axvline(x=sig*rtrn_std, label='${0} \\sigma$ conf.'.format(sig),
                    color=colors[sig-1], ls='--', lw=3, alpha=0.6)
    axes.axvline(x=-sig*rtrn_std,
                    color=colors[sig-1], ls='--', lw=3, alpha=0.6)

axes.set_xlim(-1000, 1000)
axes.set_xlabel('Price return', fontsize=axislabelsize)
axes.set_ylabel('P(Price return)', fontsize=axislabelsize)

axes.tick_params(axis='both', which='major', labelsize=axisticksize)

axes.legend(loc='upper left', fontsize=axislegendsize)
    
plt.show()

By definiton, the 99% VaR describes our maximal loss on the investment with 99% confidence. According to my calculation, we lose maximally $390.937$ USD within 99% confidence if we approximate our return prices dataset with the normal distribution.

### VaR - Method II. - Using the actual dataset

To find the discrete values (lower and upper VaR), which bounds this theoretical VaR value, we look at the original list of returns. We choose the bounding values from this list. For these calculations we use the definition of VaR, where it is a positive value.

In [None]:
def VaR(X, alpha):
    THRES = int(len(X) * (1-alpha))
    VaR_lower = sorted(X)[THRES]
    VaR_upper = sorted(X)[THRES+1]
    
    return VaR_upper, VaR_lower

In [None]:
alpha = 0.99
VaR_99_upper, VaR_99_lower = VaR(X=zwack_data['Return'][1:], alpha=alpha)

In [None]:
print(tabulate([['Upper 99% VaR', '+-{0:.3f} USD'.format(-VaR_99_upper)],
                ['Lower 99% VaR', '+-{0:.3f} USD'.format(-VaR_99_lower)]],
               headers=['Confidence interval', 'Value-at-Risk']))

Also by the definition given on the lecture, we should multiply these VaR values by -1 to get the necessary VaR values in question. That's how we get positive numbers for our lower and upper VaR.

## - Calculating CVaR/ES at 95% Confidence interval <br/> - Determining lower and upper CVaR in the discrete distribution

By definition the $\alpha\%$ CVaR, also called as Expected shortfall is the expected value of the values that fall beyond the VaR. It can be formulated as follows:

$$
\text{CVaR}
=
\frac{1}{1 - \alpha}
\int_{-\,\infty}^{\text{VaR}} x\, p\left( x \right)\,\text{d}x
$$

or with probabilities instead of real values:

$$
\text{CVaR}
=
\frac{1}{1 - \alpha}
\int_{-1}^{\text{VaR}} x\, p\left( x \right)\,\text{d}x
$$

#### Source
https://www.investopedia.com/terms/c/conditional_value_at_risk.asp

### CVaR - Method I. - Normal distribution 

First we need to calculate the $95\%$ VaR to get the $95\%$ CVaR from its value. First just do it using the normal distribution.

In [None]:
# Calculating VaR from the theoretical normal distribution of return prices
alpha = 0.95
VaR_95 = norm.ppf(1-alpha, rtrn_mean, rtrn_std)

In [None]:
print(tabulate([['95%', '+-{0:.3f} USD'.format(-VaR_95)]],
               headers=['Confidence interval', 'Value-at-Risk']))

Now with symbolic integration we can calculate the integral above at the definition of the CVaR.

In [None]:
import sympy as sp

In [None]:
x = sp.Symbol('x')
alpha = sp.Symbol('\\alpha')
mu = sp.Symbol('\\mu')
sigma = sp.Symbol('\\sigma')

In [None]:
result = sp.integrate(x * sp.exp(-1/2 * ((x - mu)/sigma)**2), (x, -sp.oo, VaR_95))

In [None]:
CVaR_95 = (1/(1-alpha) * 1/(sigma * sp.sqrt(2 * sp.pi)) * result).evalf(subs={alpha: 0.95,
                                                                              mu: rtrn_mean,
                                                                              sigma: rtrn_std})

In [None]:
print(tabulate([['95%', '+-{0:.3f} USD'.format(-float(CVaR_95))]],
               headers=['Confidence interval', 'Conf. Value-at-Risk']))

In absolute value, CVaR should be always higher, than VaR, like here.

### CVaR - Method II. - Using the actual dataset 

To calculate the actual lower and upper CVaR according to the lecture, we should calculate the expected value of price returns less or equal as the lower, or upper VaR values respectively. First we should calculate the lower and upper 95% VaR values for this.

In [None]:
alpha = 0.95
VaR_95_upper, VaR_95_lower = VaR(X=zwack_data['Return'][1:], alpha=alpha)

In [None]:
print(tabulate([['Upper 95% VaR', '{0:.3f} USD'.format(-VaR_95_upper)],
                ['Lower 95% VaR', '{0:.3f} USD'.format(-VaR_95_lower)]],
               headers=['Confidence interval', 'Value-at-Risk']))

Now move onto the upper- and lower CVaR calculation, which according to the lecture is the following:

$$
\text{CVaR}^{\alpha} \left( X \right) = - \mathbb{E} \left[ X | X \leq \text{VaR}^{\alpha} \left( X \right) \right]
$$
$$
\text{CVaR}_{\alpha} \left( X \right) = - \mathbb{E} \left[ X | X < \text{VaR}^{\alpha} \left( X \right) \right]
$$

In [None]:
CVaR_95_upper_idx = np.where(zwack_data['Return'][1:].sort_values() <= VaR_95_upper)[0]
CVaR_95_lower_idx = np.where(zwack_data['Return'][1:].sort_values() < VaR_95_upper)[0]

To get the correct (normalized) expected value of this slice of the distribution, which could be done also by summation over the discrete values.

In [None]:
def normal(X, mu, sigma):
    """
    Returns the value of the normal distribution in the given points.
    """
    return 1/(sigma*np.sqrt(2*np.pi)) * np.exp(-1/2 * ((X - mu)/sigma)**2)

In [None]:
CVaR_95_upper_vals = zwack_data['Return'][1:].sort_values().iloc[CVaR_95_upper_idx]
CVaR_95_lower_vals = zwack_data['Return'][1:].sort_values().iloc[CVaR_95_lower_idx]

#### Method II./a) Normalize by normal distribution

In [None]:
# Normalizing constant, because P(x) values in the formula of the
# discrete expected value calculation should sum up to 1.
normed_up = 1/np.sum(normal(CVaR_95_upper_vals, mu=rtrn_mean, sigma=rtrn_std))
normed_lw = 1/np.sum(normal(CVaR_95_lower_vals, mu=rtrn_mean, sigma=rtrn_std))
# Calculating upper and lower CVaR
CVaR_95_upper = np.sum(CVaR_95_upper_vals * normal(CVaR_95_upper_vals, mu=rtrn_mean, sigma=rtrn_std)) * normed_up
CVaR_95_lower = np.sum(CVaR_95_lower_vals * normal(CVaR_95_lower_vals, mu=rtrn_mean, sigma=rtrn_std)) * normed_lw

In [None]:
print(tabulate([['Upper 95% CVaR', '+-{0:.3f} USD'.format(-CVaR_95_upper)],
                ['Lower 95% CVaR', '+-{0:.3f} USD'.format(-CVaR_95_lower)]],
               headers=['Confidence interval', 'Conf. Value-at-Risk']))

#### Method II./b) Simple average

In [None]:
CVaR_95_upper = np.mean(CVaR_95_upper_vals)
CVaR_95_lower = np.mean(CVaR_95_lower_vals)

In [None]:
print(tabulate([['Upper 95% CVaR', '+-{0:.3f} USD'.format(-CVaR_95_upper)],
                ['Lower 95% CVaR', '+-{0:.3f} USD'.format(-CVaR_95_lower)]],
               headers=['Confidence interval', 'Conf. Value-at-Risk']))

## Expected shortfall at 95% confidence level

It is the average of the values that fall beyond the 5% of all return prices.

In [None]:
alpha = 0.95
THRES = int(len(zwack_data['Return'][1:]) * (1-alpha))

ES_95 = np.mean(sorted(zwack_data['Return'][1:])[:THRES+1])

In [None]:
print(tabulate([['95% ES', '+-{0:.3f} USD'.format(-ES_95)]],
               headers=['Confidence interval', 'Expected Shortfall']))