# Análise de Fatores Utilizando CAPM e Fatores de Risco NEFIN

_Notebook inspirado em um trabalho de Vijay Vaidyanathan, da EDHEC School of Business, para o curso Advanced Portfolio Construction and Analysis_

A ideia central da Análise de Fatores é decompor uma série de retornos observados em um conjunto de retornos preditores/explicativos.

A metodologia adotada seguira o demonstrado em _Asset Management_ (Ang 2014, Oxford University Press) Capítulo 10, onde são analisados os retornos da Berkshire Hathaway. Porém, no nosso caso, vamos analisar os retornos do Dynamo Cougar Fic Fia (dados da CVM) e através dos fatores de risco disponibilizados pela NEFIN.

Primeiro, nós vamos precisar dos retornos do Dynamo Cougar. Eles estão contidos em `data/dyco_d_rets.csv`.

In [17]:
import statsmodels.api as sm
import pandas as pd

import utils as erk

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [18]:
dyco_d = pd.read_csv("data/dyco_d_ret.csv", parse_dates=True, index_col=0)
dyco_d

Unnamed: 0_level_0,DYCO
DATE,Unnamed: 1_level_1
2005-01-04,-0.009900
2005-01-05,-0.012740
2005-01-06,-0.008097
2005-01-07,0.007186
2005-01-10,-0.003828
...,...
2022-02-11,-0.006056
2022-02-14,0.007631
2022-02-15,0.018741
2022-02-16,0.007259


Next, we need to convert these to monthly returns. The simplest way to do so is by using the `.resample` method, which allows you to run an aggregation function on each group of returns in a time series. We'll give it the grouping rule of 'M' which means _monthly_ (consult the `pandas`) documentation for other codes)

We want to compound the returns, and we already have the `compound` function in our toolkit, so let's load that up now, and then apply it to the daily returns.

In [19]:
dyco_m = dyco_d.resample('M').apply(erk.compound).to_period('M')
dyco_m.tail()

Unnamed: 0_level_0,DYCO
DATE,Unnamed: 1_level_1
2021-10,-0.11994
2021-11,-0.046247
2021-12,0.023074
2022-01,0.01364
2022-02,0.012428


Next, we need to load the explanatory variables, which is the Fama-French monthly returns data set. Load that as follows:

In [20]:
nefin_d = erk.get_nefin_returns()
nefin_d

Unnamed: 0_level_0,Rm_minus_Rf,SMB,HML,WML,IML,Risk_free
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2001-01-02,0.006601,0.000524,0.065490,-0.006308,0.014109,0.000579
2001-01-03,0.062427,0.005390,0.009390,-0.028644,0.004510,0.000577
2001-01-04,-0.000310,0.006690,-0.002327,-0.000946,-0.009227,0.000574
2001-01-05,-0.012839,0.003523,-0.002397,0.005985,0.025124,0.000573
2001-01-08,0.003982,0.007883,0.001948,-0.004099,-0.001175,0.000573
...,...,...,...,...,...,...
2022-01-25,0.022372,0.016604,0.004664,-0.015129,0.015340,0.000385
2022-01-26,0.007715,0.003557,-0.012190,-0.010335,0.002393,0.000387
2022-01-27,0.012028,0.006480,-0.000307,-0.013335,0.007697,0.000389
2022-01-28,-0.006167,0.005071,-0.000685,0.002404,0.010792,0.000391


In [21]:
nefin_m = nefin_d.resample('M').apply(erk.compound).to_period('M')
nefin_m

Unnamed: 0_level_0,Rm_minus_Rf,SMB,HML,WML,IML,Risk_free
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2001-01,0.139540,0.029623,0.121934,-0.026541,0.107242,0.011949
2001-02,-0.085317,0.011963,0.065037,0.087819,0.081714,0.010134
2001-03,-0.077331,0.029730,0.029967,0.063062,0.015762,0.013251
2001-04,0.026855,-0.073174,-0.106460,-0.049865,-0.147651,0.013104
2001-05,-0.003461,-0.015901,-0.134612,-0.020359,-0.048654,0.014241
...,...,...,...,...,...,...
2021-09,-0.068977,-0.040151,0.007806,0.018478,-0.045193,0.004840
2021-10,-0.063858,-0.056333,0.020102,-0.023760,-0.026544,0.005317
2021-11,-0.022863,-0.031745,0.035546,0.002780,0.004318,0.006325
2021-12,0.015980,0.071106,0.027794,0.030698,0.034039,0.007283


In [23]:
ret_df = pd.merge(dyco_m, nefin_m, on=['DATE'], how='inner')
ret_df

Unnamed: 0_level_0,DYCO,Rm_minus_Rf,SMB,HML,WML,IML,Risk_free
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2005-01,-0.035876,-0.065619,0.004147,-0.039983,0.033270,-0.007002,0.013351
2005-02,0.073606,0.126742,-0.027138,-0.005885,-0.008687,-0.051169,0.012361
2005-03,-0.052271,-0.070649,0.014393,-0.002859,-0.048353,-0.005237,0.015361
2005-04,-0.085861,-0.080757,0.049373,0.056512,-0.020819,0.044312,0.014176
2005-05,0.006115,0.000172,0.035812,0.056853,-0.029017,0.057646,0.015071
...,...,...,...,...,...,...,...
2021-09,-0.069062,-0.068977,-0.040151,0.007806,0.018478,-0.045193,0.004840
2021-10,-0.119940,-0.063858,-0.056333,0.020102,-0.023760,-0.026544,0.005317
2021-11,-0.046247,-0.022863,-0.031745,0.035546,0.002780,0.004318,0.006325
2021-12,0.023074,0.015980,0.071106,0.027794,0.030698,0.034039,0.007283


Agora, nós vamos decompor o observado em DYCO entre 2005-01 (o início da nossa amostra) e 2012-05, como descrito em Ang(2014), nas porções explicadas pelo prêmio de mercado e no restante, usando o CAPM como modelo explicativo.

i.e.

$$ R_{brka,t} - R_{f,t} = \alpha + \beta(R_{mkt,t} - R_{f,t}) + \epsilon_t $$

Podemos usar `stats.api` para a regressão linear como segue:

In [57]:
dyco_excess = ret_df.loc["2005":"2012-05", ['DYCO']] - ret_df.loc["2005":"2012-05", ['Risk_free']].values
mkt_excess = ret_df.loc["2005":"2012-05", ['Rm_minus_Rf']]

exp_var = mkt_excess.copy()
exp_var["Constant"] = 1

lm = sm.OLS(dyco_excess, exp_var).fit()

In [58]:
lm.summary()

0,1,2,3
Dep. Variable:,DYCO,R-squared:,0.788
Model:,OLS,Adj. R-squared:,0.785
Method:,Least Squares,F-statistic:,322.7
Date:,"Sat, 19 Feb 2022",Prob (F-statistic):,5.100000000000001e-31
Time:,22:11:03,Log-Likelihood:,201.91
No. Observations:,89,AIC:,-399.8
Df Residuals:,87,BIC:,-394.8
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Rm_minus_Rf,0.7648,0.043,17.964,0.000,0.680,0.849
Constant,0.0034,0.003,1.267,0.209,-0.002,0.009

0,1,2,3
Omnibus:,10.513,Durbin-Watson:,1.996
Prob(Omnibus):,0.005,Jarque-Bera (JB):,12.686
Skew:,-0.597,Prob(JB):,0.00176
Kurtosis:,4.413,Cond. No.,15.9


### The CAPM benchmark interpretation

This implies that the CAPM benchmark consists of 24 cents in T-Bills and 76 cents in the market. i.e. each dollar in the Dynamo Cougar portfolio is equivalent to 24 cents in T-Bills and 76 cents in the market. Relative to this, the Dynamo Cougar is adding (i.e. has $\alpha$ of) 0.34% _(per month)_ although the degree of statistica significance is very low.

Now, let's add in some additional explanatory variables, namely Value and Size.

In [61]:
exp_var["Value"] = ret_df.loc["1990":"2012-05",['HML']]
exp_var["Size"] = ret_df.loc["1990":"2012-05",['SMB']]
#exp_var["Momentum"] = ret_df.loc["1990":"2012-05",['WML']]
#exp_var["Iliquidity"] = ret_df.loc["1990":"2012-05",['IML']]
exp_var.head()

Unnamed: 0_level_0,Rm_minus_Rf,Constant,Size,Value
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2005-01,-0.065619,1,0.004147,-0.039983
2005-02,0.126742,1,-0.027138,-0.005885
2005-03,-0.070649,1,0.014393,-0.002859
2005-04,-0.080757,1,0.049373,0.056512
2005-05,0.000172,1,0.035812,0.056853


In [62]:
lm = sm.OLS(dyco_excess, exp_var).fit()
lm.summary()

0,1,2,3
Dep. Variable:,DYCO,R-squared:,0.825
Model:,OLS,Adj. R-squared:,0.819
Method:,Least Squares,F-statistic:,133.5
Date:,"Sat, 19 Feb 2022",Prob (F-statistic):,4.64e-32
Time:,22:11:28,Log-Likelihood:,210.5
No. Observations:,89,AIC:,-413.0
Df Residuals:,85,BIC:,-403.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Rm_minus_Rf,0.7028,0.042,16.798,0.000,0.620,0.786
Constant,0.0036,0.002,1.449,0.151,-0.001,0.009
Size,0.2287,0.058,3.935,0.000,0.113,0.344
Value,-0.0534,0.073,-0.734,0.465,-0.198,0.091

0,1,2,3
Omnibus:,1.16,Durbin-Watson:,2.03
Prob(Omnibus):,0.56,Jarque-Bera (JB):,0.674
Skew:,-0.174,Prob(JB):,0.714
Kurtosis:,3.246,Cond. No.,30.5


### The Fama-French Benchmark Interpretation

The alpha has fallen from .61% to about 0.55% per month. The loading on the market has moved up from 0.54 to 0.67, which means that adding these new explanatory factors did change things. If we had added irrelevant variables, the loading on the market would be unaffected.

We can interpret the loadings on Value being positive as saying that Hathaway has a significant Value tilt - which should not be a shock to anyone that follows Buffet. Additionally, the negative tilt on size suggests that Hathaway tends to invest in large companies, not small companies.

In other words, Hathaway appears to be a Large Value investor. Of course, you knew this if you followed the company, but the point here is that numbers reveal it!

The new way to interpret each dollar invested in Hathaway is: 67 cents in the market, 33 cents in Bills, 38 cents in Value stocks and short 38 cents in Growth stocks, short 50 cents in SmallCap stocks and long 50 cents in LargeCap stocks. If you did all this, you would still end up underperforming Hathaway by about 55 basis points per month.