$$
r_{n, t+1}=\bar{a}_n+\bar{c}_n \cdot\left(\frac{f_{n, t}^{\mathrm{Bmk}}-\bar{m}_n^{\mathrm{Bmk}}}{\bar{s}_n^{\mathrm{Bmk}}}\right)+e_{n, t+1} 
$$

Podemos ter duas abordagens econometricas na hora de analisar o out-of-sample fit. A primeira abordagem se dá na Time-Series dos dados, isso é, podemos regredir para cada ativo $n$, a time-series obtendo um $\bar{R}^2$ para cada ativo. A segunda abordagem é no Cross-Section dos dados, i.e. regredindo para cada $t$ a regressão no cross-section obtendo um $bar{R}^2$ para cada minuto.

### Time-Series

In [73]:
# pacotes
import numpy as np
import pandas as pd
import statsmodels.formula.api as sm

In [74]:
# oculta mensagens de avisos
import warnings
warnings.filterwarnings("ignore")

In [75]:
f_bmk = pd.read_csv('../../output/data/20030102_f_bmk.csv', index_col=0)

In [76]:
# padronização
f_bmk = ( f_bmk - f_bmk.mean() ) / f_bmk.std()

In [77]:
y = pd.read_csv('../../output/data/20030102_y.csv', index_col=0)

In [78]:
# também para o índice que começa a partir das 10:04 e termina em 15:59
y = y.loc[100400:155900]

In [79]:
def adjusted_Rsquared(result):
    r2 = result.rsquared
    n = len(y)
    k = len(result.params) - 1
    ar2 = 1 - ( 1 - r2) * ( ( n - 1) / ( n - k ) )
    return ar2

In [80]:
bmk_timeseries = pd.DataFrame(columns=['Adj. R-Squared', 'a', 'c'], index=y.columns)

In [81]:
ols = pd.DataFrame()

In [82]:
for col in f_bmk.columns:
    ols['y'] = y[col]
    ols['x'] = f_bmk[col]
    result = sm.ols(formula="y ~ x", data=ols).fit()
    bmk_timeseries.at[col, 'a'] = result.params[0]
    bmk_timeseries.at[col, 'c'] = result.params[1]
    bmk_timeseries.at[col, 'Adj. R-Squared'] = adjusted_Rsquared(result)

In [83]:
bmk_timeseries

Unnamed: 0,Adj. R-Squared,a,c
FITB(t),0.004352,0.000029,0.000049
AGN(t),0.003372,0.000032,0.000033
ZBRA(t),0.001261,0.00004,-0.000034
ADBE(t),0.001346,0.000048,-0.000056
CKFR(t),0.00432,0.000149,0.000124
...,...,...,...
FRX(t),0.006518,0.000028,0.000048
OSIP(t),0.002817,0.000073,-0.000093
SAFC(t),0.010825,0.000058,0.000085
YUM(t),0.000017,0.000049,-0.000005


In [84]:
bmk_timeseries['Adj. R-Squared'].mean()

0.008151051692000163

### Cross-Section

In [85]:
bmk_crosssection = pd.DataFrame(columns=['Adj. R-Squared', 'a', 'c'], index=y.index)

In [86]:
ols = pd.DataFrame()

In [87]:
for index in f_bmk.index:
    ols['y'] = y.loc[index]
    ols['x'] = f_bmk.loc[index]
    result = sm.ols(formula="y ~ x", data=ols).fit()
    bmk_crosssection.at[index, 'a'] = result.params[0]
    bmk_crosssection.at[index, 'c'] = result.params[1]
    bmk_crosssection.at[index, 'Adj. R-Squared'] = adjusted_Rsquared(result)

In [88]:
bmk_crosssection

Unnamed: 0_level_0,Adj. R-Squared,a,c
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100400,0.001191,0.000377,-0.000034
100500,0.004683,0.000445,-0.000058
100600,0.000251,-0.000181,-0.000015
100700,0.004242,-0.000146,-0.000084
100800,0.002772,0.001213,0.00008
...,...,...,...
155500,0.004749,-0.000086,0.000111
155600,0.00042,0.000073,-0.000033
155700,0.001264,-0.000034,0.000052
155800,0.004339,-0.000063,-0.000115


In [89]:
bmk_crosssection['Adj. R-Squared'].mean()

0.008938459746264506