In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn import linear_model
import scipy.stats as stats
from scipy.stats import norm
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
pd.set_option("display.precision", 4)

In [14]:
norm.cdf(-1.175)

0.11999735789901245

## Helpers

In [287]:
def performance_summary(return_data, annualization = 12):
    """ 
        Returns the Performance Stats for given set of returns
        Inputs: 
            return_data - DataFrame with Date index and Monthly Returns for different assets/strategies.
        Output:
            summary_stats - DataFrame with annualized mean return, vol, sharpe ratio. Skewness, Excess Kurtosis, Var (0.5) and
                            CVaR (0.5) and drawdown based on monthly returns. 
    """
    summary_stats = return_data.mean().to_frame('Mean').apply(lambda x: x*annualization)
    summary_stats['Volatility'] = return_data.std().apply(lambda x: x*np.sqrt(annualization))
    summary_stats['Sharpe Ratio'] = summary_stats['Mean']/summary_stats['Volatility']

    summary_stats['Skewness'] = return_data.skew()
    summary_stats['Excess Kurtosis'] = return_data.kurtosis()
    summary_stats['VaR (0.05)'] = return_data.quantile(.05, axis = 0)
    summary_stats['CVaR (0.05)'] = return_data[return_data <= return_data.quantile(.05, axis = 0)].mean()
    
    wealth_index = 1000*(1+return_data).cumprod()
    previous_peaks = wealth_index.cummax()
    drawdowns = (wealth_index - previous_peaks)/previous_peaks

    summary_stats['Max Drawdown'] = drawdowns.min()
    summary_stats['Peak'] = [previous_peaks[col][:drawdowns[col].idxmin()].idxmax() for col in previous_peaks.columns]
    summary_stats['Bottom'] = drawdowns.idxmin()
    
    recovery_date = []
    for col in wealth_index.columns:
        prev_max = previous_peaks[col][:drawdowns[col].idxmin()].max()
        recovery_wealth = pd.DataFrame([wealth_index[col][drawdowns[col].idxmin():]]).T
        recovery_date.append(recovery_wealth[recovery_wealth[col] >= prev_max].index.min())
    summary_stats['Recovery'] = recovery_date
    
    return summary_stats

## Q1 

1. Using multiple momentum factors will increase the trading cost since we need to manage more assets over the period. Selling and buying increase this cost and also more tax burden. However, this gives us more chance to hold the assets with high momentum by updating the portfolio. 

2. The Quadratic Market Factor, does lead to slight improvement in the R-Squared of the regression, but the value still remains fairly low, to indicate any relation between LTCM and the market returns. The increased R-Squared could also be just a factor of multicolinearity, as the alpha in the regression also increases, meaning there is more unexplained returns on adding the quadratic market factor. The huge negative beta on SPY^2 is a feature of the factor. The monthly returns are small, thus the squared returns are even smaller, thus the beta has to be larger in magnitude to fir these small returns properly.

3. We have the ineffincient mv frontier which indicates the generated portfolio combinations are not robust. Also, in order to get the tangency portfolio, we require the existence of a risk free asset. We can use the upper side of the mv frontier to choose the portfolio that we desire. 

4. TS: The regression sample size will be from Jan 2001 to Dec 2022 (22 * 12). We will perform 40 linear regressions for each portfolio. The return of the portfolio over time will be our y and the parameters of the pricing model will be our x's. 
    CS: The regression sample is 40 since we will have 40 groups of betas generated from the TS regressions. We only need to do 1 regression for CS and that is using the mean of 40 portfolios regressing on the 40 betas. 

5. This means that GMO believed that the prices would revert to fundamental value over time. In specific, at times of high prices GMO would have a contrarian view of expected returns being low. For GMWAX, the fund has been quite successful relative to the benchmark, with returns almost double the benchmark and a Sharpe ratio more than double. However, we don't have a long enough series of returns to do a more involved statistical analysis of the performance. That is we can NOT see the significance of the GMO property. 

6. Harvard places bounds on the portfolio allocation rather than implementing whatever numbers come out of the MV optimization problem along with a long only constraint on non-cash assets. The solution is numerical (rather than an explicit formula,) due to the inequality constraints. While the solution is computationally easy, it leads to the need for many boundary parameters, which greatly influence the solution. Thus, the solution may be overparameterized with little guide on how to set the parameters, or the motive to parameterize the problem to achieve a certain solution. (This is actually the question in mid1) 

7. The model for convergence in the long-run was based on i.i.d.log returns. So to the extent that means and volatilities change over time, the model would need to be adjusted to capture these conditional statistics. This means that the SR will be different between short run and long run. 

8. CIP will imply UIP from my view as the future contracts in reality is conceptually the expectation of the exchange rate return for the next period. Therefore, CIP is accurate with no doubt. UIP, on the other hand is just the concept and we can do the linear regression between the difference of exchange rate and the excess risk free rate of two currencies to see if UIP holds. 

9. The reason could be due to high volatility of the data as its payoffs could be non-linear. This makes it hard to find betas, correlations. Also, measuring the tail of a distribution is considerably hard and thus, estimation will overestimate excess returns, underestimate risk.

10. Do a simple linear regression of ri with respect to SPY. The beta generated will be the hedge ratio. If beta is positive, we long ri with the according amount otherwise we short. 

## Q2

In [288]:
fore = pd.read_excel('final_exam_data.xlsx',sheet_name = 'forecasting (weekly)', index_col = 'Date')
fore.head()

Unnamed: 0_level_0,GLD,Tbill rate,Tbill change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2009-04-19,-0.0126,0.13,-0.045
2009-04-26,0.0528,0.095,-0.035
2009-05-03,-0.0309,0.145,0.05
2009-05-10,0.0348,0.165,0.02
2009-05-17,0.0174,0.155,-0.01


In [289]:
data = fore[['GLD']]
p1 = performance_summary(data) 
p1[['VaR (0.05)', 'CVaR (0.05)']]

Unnamed: 0,VaR (0.05),CVaR (0.05)
GLD,-0.0333,-0.0471


In [290]:
VaR_full_sample = (norm.ppf(0.05) * data.std())
VaR_full_sample.to_frame('Full Sample Var')  

Unnamed: 0,Full Sample Var
GLD,-0.0347


In [291]:
sigma_rolling = data.shift(1).rolling(150).std().dropna()
rolling_var = (norm.ppf(0.05) * sigma_rolling)
rolling_var.rename(columns={'ES1-GC1 50-50 Returns':'Rolling CVaR'},inplace =True)
rolling_var.tail(1)

Unnamed: 0_level_0,GLD
Date,Unnamed: 1_level_1
2022-12-04,-0.038


3. We judge the method that working best by simply comparing the number of the actual data under VaR to the value of VaR. A good estimation will be the one that the actual number is equal or less than as VaR shows. We may prefer the empirical method as the normal estimation is assuming mu(mean) to be zero.

## Q3

In [292]:
factors = pd.read_excel('final_exam_data.xlsx',sheet_name = 'factors (excess returns)', index_col = 'Date')
factors.tail()

Unnamed: 0_level_0,MKT,UMD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2021-02-28,0.0278,-0.0789
2021-03-31,0.0308,-0.0614
2021-04-30,0.0493,0.011
2021-05-31,0.0029,0.0088
2021-06-30,0.0275,0.0219


In [293]:

futures = pd.read_excel('final_exam_data.xlsx',sheet_name = 'futures (excess returns)', index_col = 'Date')
futures.tail()

Unnamed: 0_level_0,NG1,KC1,CC1,LB1,CT1,SB1,LC1,W1,S1,C1,GC1,SI1,HG1,PA1
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2021-02-28,0.0807,0.1131,0.0672,0.1224,0.089,0.0392,-0.0169,-0.0121,0.0257,0.0155,-0.0654,-0.019,0.1458,0.0424
2021-03-31,-0.0588,-0.0976,-0.1307,0.0136,-0.079,-0.1021,0.0696,-0.0565,0.0224,0.0158,-0.0092,-0.0722,-0.0223,0.1324
2021-04-30,0.1238,0.1332,-0.0026,0.487,0.081,0.1496,-0.0411,0.2015,0.0934,0.3115,0.0304,0.0538,0.1189,0.1274
2021-05-31,0.0188,0.1601,0.0299,0.0437,-0.0607,0.0224,-0.0011,-0.1064,-0.0258,-0.1125,0.0763,0.0836,0.0454,-0.0438
2021-06-30,0.2157,-0.0163,-0.0211,0.0,0.0532,-0.0075,0.0576,-0.0358,-0.1116,0.0575,-0.073,-0.0765,-0.0867,-0.0454


In [294]:
ts = pd.DataFrame(data = None, index = futures.columns, columns = ['a', 'MKT', 'UMD', 'R^2'])

for a in ts.index:
    y = futures[a]
    X = sm.add_constant(factors[['MKT','UMD']])
    reg = sm.OLS(y, X).fit()
    ts.loc[a] = [reg.params[0] * 12, reg.params[1], reg.params[2], reg.rsquared]
    
ts

Unnamed: 0,a,MKT,UMD,R^2
NG1,0.112,0.3541,0.3812,0.0173
KC1,0.0232,0.3151,-0.0275,0.0259
CC1,0.0708,0.2073,-0.0358,0.012
LB1,0.0645,0.9421,-0.0048,0.1368
CT1,0.0249,0.5042,-0.1786,0.099
SB1,0.0931,0.058,-0.3192,0.0327
LC1,0.0154,0.1831,0.0661,0.02
W1,0.0545,0.2989,0.0224,0.0213
S1,0.0425,0.3995,0.0273,0.0529
C1,0.0609,0.3404,0.062,0.0282


In [295]:
MAE_alpha = (100 * ts['a']).abs().mean()
print('MAE = {:.2f} %'.format(MAE_alpha))

MAE = 5.89 %


In [296]:
r2_mean = ts['R^2'].mean() 
r2_mean

0.058747179429465106

1.b. If the model works perfectly, alpha and MAE should be zero. To be clear, these would be zero in population, and for any given sample they may be non-zero but by a statistically insignificant amount. Nothing could be said about the R-Squared as in the TimeSeries test we do not care about high R-Squared. Thus, the average R-Squared statistic would be unrestricted. Nothing needs to be said about the regression beta, as they would vary with the exposure to the respective factor and would vary with each asset. 

In [297]:
y = pd.DataFrame(futures.mean(), index=futures.columns, columns = ['Mean'])
X = sm.add_constant(ts[['MKT', 'UMD']].astype(float))
reg = sm.OLS(y,X,missing='drop').fit() 

In [298]:
reg.params[0] * 12 

0.0611209101800308

In [299]:
reg.params[1:] * 12

MKT    0.0620
UMD    0.0735
dtype: float64

In [300]:
reg.rsquared

0.39137127926546056

In [301]:
pred_cs = reg.params[0] + ts[['MKT', 'UMD']] @ reg.params[1:] 
mae = (futures.mean() - pred_cs).abs().mean() 
print('MAE = {:.2f} %'.format(mae*12*100))

MAE = 1.80 %


2.b. The R-Squared should be 1 as we expect the factors to explain all of the expected returns of the portfolios. MAE should also be zero. We would expect the Alpha for the cross-sectional regression to be zero. A non-zero cross-sectional intercept means the model pricing is off by a fixed amount, potentially due to risk-free rate mismeasurment. We cannot say anything specific about the factor premia (Beta of CS) as the cross-sectional regression provides the freedom for the factor premia to be anything and is derived as the regression coefficient. 

In [302]:
ts[['MKT','UMD']].mean() #TS

MKT    0.4001
UMD    0.0229
dtype: float64

In [303]:
reg.params[1:] * 12 #CS

MKT    0.0620
UMD    0.0735
dtype: float64

3. It looks like under TS, the market risk premia is high than it in the CS. Also, the UMD risk premia is lower in TS than in CS. Also, the difference of risk premias in TS is very large while in CS two factors give very similar risk premias. 

## Q4

In [304]:
fore1 = pd.read_excel('final_exam_data.xlsx',sheet_name = 'forecasting (weekly)', index_col = 'Date')
fore1

Unnamed: 0_level_0,GLD,Tbill rate,Tbill change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2009-04-19,-0.0126,0.130,-0.045
2009-04-26,0.0528,0.095,-0.035
2009-05-03,-0.0309,0.145,0.050
2009-05-10,0.0348,0.165,0.020
2009-05-17,0.0174,0.155,-0.010
...,...,...,...
2022-11-06,0.0216,4.013,0.043
2022-11-13,0.0517,4.063,0.050
2022-11-20,-0.0108,4.135,0.072
2022-11-27,0.0026,4.175,0.040


In [305]:
fore1['Tr_lag'] = fore1['Tbill rate'].shift() 
fore1['Tc_lag'] = fore1['Tbill change'].shift()  
forecast = fore1.dropna()
forecast

Unnamed: 0_level_0,GLD,Tbill rate,Tbill change,Tr_lag,Tc_lag
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2009-04-26,0.0528,0.095,-0.035,0.130,-0.045
2009-05-03,-0.0309,0.145,0.050,0.095,-0.035
2009-05-10,0.0348,0.165,0.020,0.145,0.050
2009-05-17,0.0174,0.155,-0.010,0.165,0.020
2009-05-24,0.0284,0.175,0.020,0.155,-0.010
...,...,...,...,...,...
2022-11-06,0.0216,4.013,0.043,3.970,0.085
2022-11-13,0.0517,4.063,0.050,4.013,0.043
2022-11-20,-0.0108,4.135,0.072,4.063,0.050
2022-11-27,0.0026,4.175,0.040,4.135,0.072


In [306]:
y = forecast['GLD']
x = sm.add_constant(forecast[['Tr_lag', 'Tc_lag']]) 
reg1 = sm.OLS(y, x).fit() 
reg1.params

const     0.0010
Tr_lag    0.0003
Tc_lag    0.0005
dtype: float64

In [307]:
reg1.rsquared

0.00010651991509946779

In [308]:
wt = 0.2 + 80 * reg1.fittedvalues
wt

Date
2009-04-26    0.2804
2009-05-03    0.2800
2009-05-10    0.2841
2009-05-17    0.2834
2009-05-24    0.2821
               ...  
2022-11-06    0.3620
2022-11-13    0.3613
2022-11-20    0.3626
2022-11-27    0.3648
2022-12-04    0.3644
Length: 711, dtype: float64

In [309]:
rx = (wt.T * y).T
rx

Date
2009-04-26    0.0148
2009-05-03   -0.0086
2009-05-10    0.0099
2009-05-17    0.0049
2009-05-24    0.0080
               ...  
2022-11-06    0.0078
2022-11-13    0.0187
2022-11-20   -0.0039
2022-11-27    0.0010
2022-12-04   -0.0011
Length: 711, dtype: float64

In [310]:
performance_summary(forecast[['GLD']], annualization=52)

Unnamed: 0,Mean,Volatility,Sharpe Ratio,Skewness,Excess Kurtosis,VaR (0.05),CVaR (0.05),Max Drawdown,Peak,Bottom,Recovery
GLD,0.059,0.1523,0.387,-0.2566,1.4907,-0.0333,-0.0471,-0.4474,2011-09-04,2015-11-29,2020-08-02


In [311]:
performance_summary(rx.to_frame(), annualization=52)

Unnamed: 0,Mean,Volatility,Sharpe Ratio,Skewness,Excess Kurtosis,VaR (0.05),CVaR (0.05),Max Drawdown,Peak,Bottom,Recovery
0,0.0173,0.0436,0.3979,-0.2144,1.1954,-0.0097,-0.0133,-0.1426,2011-09-04,2015-11-29,2020-05-17


In [312]:
data = sm.add_constant(forecast[['GLD']])
lfd = sm.OLS(rx, data).fit()
lfd.params[0]*52

0.0004985357096263617

In [313]:
lfd.params[1]*52

14.851159557898223

In [314]:
tracking_error = (lfd.resid.std()*np.sqrt(52)) 
information_ratio = lfd.params[0]*52/tracking_error 
information_ratio

0.21349311946966631

5. These two factors could behave very similar as their betas are close to each other.

In [315]:
START_PREDICT = pd.to_datetime('2016-12-25')

forecasts_OOS = pd.DataFrame(columns=['Forecast'],index=forecast.index, dtype='float64')

est = linear_model.LinearRegression()

Xlag = forecast[['Tr_lag', 'Tc_lag']]
X = forecast[['Tbill rate', 'Tbill change']]
y = forecast[['GLD']]
forecast_err = [] 
null_err = []

for t in forecast.loc[START_PREDICT:,:].index:
    yt = y.loc[:t].values.reshape(-1,1)
    Xlag_t = Xlag.loc[:t,:].values
    x_t = X.loc[t,:].values.reshape(1,-1)
    est.fit(Xlag_t,yt);
    predval = est.predict(x_t)[0,0]

    # this timing is assigning forecast to datestamp of info used to make the forecast
    forecasts_OOS.loc[t,'Forecast'] = predval
    null_forecast = yt.mean()
    actual = y.loc[t]
    forecast_err.append(predval - actual)
    null_err.append(null_forecast - actual)

# make sure expanded mean (baseline forecast) uses all spy data, (spy, not spy_aligned)
forecasts_OOS.insert(0,'Mean', y.expanding().mean().dropna())

# more convenient to have datestamp reflect date of the forecasted value
forecasts_OOS = forecasts_OOS.shift(1).dropna()
forecasts_OOS

Unnamed: 0_level_0,Mean,Forecast
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01,0.0008,0.0043
2017-01-08,0.0009,0.0083
2017-01-15,0.0009,0.0045
2017-01-22,0.0010,0.0069
2017-01-29,0.0010,0.0110
...,...,...
2022-11-06,0.0011,0.0006
2022-11-13,0.0011,0.0011
2022-11-20,0.0012,0.0025
2022-11-27,0.0011,0.0022


In [316]:
RSS = (np.array(forecast_err)**2).sum()
TSS = (np.array(null_err)**2).sum()
1 - RSS/TSS # R^2
# It seems that if I try to compute the OOS R^2 directly, the result is not the same as using the exact same function from last year's final. 

0.04453971803404222

In [317]:
def oos_rsquared(data,forecasts,null=None):
    data = data.copy()
    forecasts = forecasts.copy()
    null = null.copy()
    
    # if no Null forecast given, use expanding mean
    if null is None:
        null = data.expanding().mean().shift()

    # label Data and Null accordingly--input may be series or dataframe
    if isinstance(null, pd.DataFrame):
        null.columns = ['Null']
    elif isinstance(null,pd.Series):
        null.name = 'Null'
    if isinstance(data, pd.DataFrame):
        data.columns = ['Data']
    elif isinstance(data,pd.Series):
        data.name = 'Data'

    # double check data is aligned and no NaN (null will have NaN in first value by default)
    alldata = forecasts.join(data,how='inner',rsuffix='_Data').join(null,how='inner',rsuffix='_Null').dropna(axis=0)
    null = alldata[['Null']]
    data = alldata[['Data']]
    forecasts = alldata.drop(columns=['Data','Null'])


    # Forecast MSE
    err_forecast = forecasts.subtract(data.values)
    mse_forecast = (err_forecast**2).sum()

    # Null MSE
    err_null = null.subtract(data.values)
    mse_null = (err_null**2).sum()

    # OOS R-squared
    r2oos = (1 - mse_forecast/mse_null.values).to_frame().T
    r2oos.index = ['OOS-Rsquared']

    return r2oos

In [318]:
GLD_OOS, _ = forecast[['GLD']].align(forecasts_OOS, join='right', axis=0)

oos_rsquared(GLD_OOS,forecasts_OOS,forecasts_OOS[['Mean']])

Unnamed: 0,Mean,Forecast
OOS-Rsquared,0.0,-0.145


In [319]:
GLD_OOS

Unnamed: 0_level_0,GLD
Date,Unnamed: 1_level_1
2017-01-01,0.0156
2017-01-08,0.0195
2017-01-15,0.0220
2017-01-22,0.0074
2017-01-29,-0.0136
...,...
2022-11-06,0.0216
2022-11-13,0.0517
2022-11-20,-0.0108
2022-11-27,0.0026


In [320]:
corr_val = forecasts_OOS.corrwith(GLD_OOS['GLD'])
corr_val.to_frame('Corr. to SPY')

Unnamed: 0,Corr. to SPY
Mean,-0.1055
Forecast,-0.1952


8. None of the baseline forecast and the actual forecast is positively correlated to the actual GLD. This forecast is not informative to me as the R^2 is negative and the forecasts does not follow the actual values as the correlation shows. 

In [321]:
wt_OOS = 0.2 + 80 * forecasts_OOS
wt_OOS

Unnamed: 0_level_0,Mean,Forecast
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01,0.2678,0.5458
2017-01-08,0.2708,0.8638
2017-01-15,0.2745,0.5631
2017-01-22,0.2786,0.7502
2017-01-29,0.2799,1.0791
...,...,...
2022-11-06,0.2843,0.2446
2022-11-13,0.2866,0.2890
2022-11-20,0.2923,0.4017
2022-11-27,0.2910,0.3750


In [322]:
df = wt_OOS * GLD_OOS.values
df

Unnamed: 0_level_0,Mean,Forecast
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01,0.0042,0.0085
2017-01-08,0.0053,0.0169
2017-01-15,0.0060,0.0124
2017-01-22,0.0020,0.0055
2017-01-29,-0.0038,-0.0146
...,...,...
2022-11-06,0.0061,0.0053
2022-11-13,0.0148,0.0149
2022-11-20,-0.0031,-0.0043
2022-11-27,0.0008,0.0010


In [323]:
performance_summary(df[['Mean']],52)

Unnamed: 0,Mean,Volatility,Sharpe Ratio,Skewness,Excess Kurtosis,VaR (0.05),CVaR (0.05),Max Drawdown,Peak,Bottom,Recovery
Mean,0.0213,0.0406,0.526,-0.2494,3.2217,-0.009,-0.0126,-0.0635,2020-08-09,2022-09-25,NaT


In [324]:
performance_summary(df[['Forecast']], 52)

Unnamed: 0,Mean,Volatility,Sharpe Ratio,Skewness,Excess Kurtosis,VaR (0.05),CVaR (0.05),Max Drawdown,Peak,Bottom,Recovery
Forecast,-0.0345,0.1874,-0.1838,-14.2605,236.5081,-0.0125,-0.0439,-0.45,2020-03-08,2022-10-16,NaT


In [325]:
data = sm.add_constant(GLD_OOS)
lfd1 = sm.OLS(df[['Mean']], data).fit()
lfd1.params[0]*52

-0.0019172427471378646

In [326]:
lfd1.params[1]*52

15.460511123396993

In [327]:
tracking_error = (lfd1.resid.std()*np.sqrt(52)) 
information_ratio = lfd1.params[0]*52/tracking_error 
information_ratio

-1.0359595374735697

In [328]:
data = sm.add_constant(GLD_OOS)
lfd2 = sm.OLS(df[['Forecast']], data).fit()
lfd2.params[0]*52

-0.08899021013239446

In [329]:
lfd2.params[1]*52

36.269181714953255

In [330]:
tracking_error = (lfd2.resid.std()*np.sqrt(52)) 
information_ratio = lfd2.params[0]*52/tracking_error 
information_ratio

-0.5507928525330712

## Q5

In [331]:
fx = pd.read_excel('final_exam_data.xlsx',sheet_name = 'fx (daily)', index_col = 'DATE')
fx.tail()

Unnamed: 0_level_0,GBP,SOFR,SONIA
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-11-18,1.1902,0.038,0.0293
2022-11-21,1.1785,0.038,0.0293
2022-11-22,1.188,0.038,0.0293
2022-11-23,1.204,0.0379,0.0293
2022-11-25,1.2102,0.038,0.0293


In [332]:
rf_ex = pd.DataFrame() 
rf_ex['GBP'] = np.log(fx['GBP'])
rf_ex['rf_USD'] = np.log(1+fx['SOFR'])
rf_ex['rf_GBP'] = np.log(1+fx['SONIA'])
rf_ex

Unnamed: 0_level_0,GBP,rf_USD,rf_GBP
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-04-03,0.3413,0.0181,0.0046
2018-04-04,0.3419,0.0173,0.0046
2018-04-05,0.3358,0.0173,0.0046
2018-04-06,0.3427,0.0173,0.0047
2018-04-09,0.3461,0.0173,0.0046
...,...,...,...
2022-11-18,0.1741,0.0373,0.0289
2022-11-21,0.1642,0.0373,0.0289
2022-11-22,0.1723,0.0373,0.0289
2022-11-23,0.1856,0.0372,0.0289


2. UIP states that the mean return of these currencies positions should be zero as the change in the spot fx rate is completely explained by the changes in risk free rates. 

In [333]:
rf_ex['ex_rf'] = (rf_ex['rf_USD'] - rf_ex['rf_GBP']).shift()
rf_ex['ret'] = rf_ex['GBP'].diff(1) 
rf_ex['ex_ret'] = rf_ex['ret'] - rf_ex['ex_rf']
rf_ex.dropna()

Unnamed: 0_level_0,GBP,rf_USD,rf_GBP,ex_rf,ret,ex_ret
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-04-04,0.3419,0.0173,0.0046,0.0135,0.0006,-0.0129
2018-04-05,0.3358,0.0173,0.0046,0.0126,-0.0061,-0.0187
2018-04-06,0.3427,0.0173,0.0047,0.0127,0.0069,-0.0058
2018-04-09,0.3461,0.0173,0.0046,0.0127,0.0034,-0.0093
2018-04-10,0.3479,0.0173,0.0047,0.0127,0.0018,-0.0109
...,...,...,...,...,...,...
2022-11-18,0.1741,0.0373,0.0289,0.0084,0.0076,-0.0009
2022-11-21,0.1642,0.0373,0.0289,0.0084,-0.0099,-0.0183
2022-11-22,0.1723,0.0373,0.0289,0.0084,0.0080,-0.0004
2022-11-23,0.1856,0.0372,0.0289,0.0084,0.0134,0.0049


In [334]:
performance_summary(rf_ex[['ex_ret']])

Unnamed: 0,Mean,Volatility,Sharpe Ratio,Skewness,Excess Kurtosis,VaR (0.05),CVaR (0.05),Max Drawdown,Peak,Bottom,Recovery
ex_ret,-0.0741,0.0328,-2.2585,-0.1669,-0.056,-0.0213,-0.025,-0.9991,2018-04-04,2022-11-22,NaT


In [335]:
rf_ex['ex_rf'].mean()

0.006042283356448891

In [336]:
rf_ex['ret'].mean()

-0.0001334503425634808

4. The interest spread is positive, which means it will decrease the overall estimation and hence NOT help the excess return of GBP over this sample. 

    The FX growth is negative over the sample, that indicates the depreciation on GBP and hence Appreciation for USD. 

In [337]:
data = rf_ex[['ex_ret']]
prob_rx = norm.cdf(-np.sqrt(5*252)*(data.mean()/data.std()))[0]
print(str(prob_rx * 100) + '%')

100.0%


In [338]:
y = rf_ex[['ret']] 
x = sm.add_constant(rf_ex[['ex_rf']]) 
reg = sm.OLS(y, x, missing='drop').fit() 
reg.params

const   -6.9182e-05
ex_rf   -1.0636e-02
dtype: float64

In [339]:
reg.rsquared

0.00016217295905862628

7. From Lecture 8 silde 19, UIP implies alpha to be 0 and beta to be 1. UIP has no implication for R^2. 

8. If interest rate of GBP increases, the risk free rate spread will decrease. That will lead to appreciation of GBP if other factors stay the same. Therefore, USD will be weaker and depreciate. 

9. By CIP, we know that the forward premium should be equal to the risk free rate of USD minus risk free rate of GBP. Therefore, if the risk free rate of USD increases relative to GBP, we will definitely expect forward exchange rate to be higher than spot exchange rate. 

In [340]:
rf_ex['pred_ret'] = reg.params[0] + reg.params[1] * rf_ex['ex_rf']
rf_ex['pred_ex_ret'] = rf_ex['pred_ret'] - rf_ex['ex_rf'] 
rf_ex[['pred_ret', 'pred_ex_ret']].dropna()

Unnamed: 0_level_0,pred_ret,pred_ex_ret
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-04-04,-0.0002,-0.0137
2018-04-05,-0.0002,-0.0128
2018-04-06,-0.0002,-0.0129
2018-04-09,-0.0002,-0.0129
2018-04-10,-0.0002,-0.0129
...,...,...
2022-11-18,-0.0002,-0.0086
2022-11-21,-0.0002,-0.0086
2022-11-22,-0.0002,-0.0086
2022-11-23,-0.0002,-0.0086
