# Asset Pricing: Empirical Analysis #2

## Implementation of the APT approach proposed by Ross

Goal: Estimate the multi-beta relationship for a global stock index and a sectoral sub-index


APT is based on the basic idea that there are no arbitrage opportunities that last over time. In effect, an asset A that is as risky as asset B, but more profitable, would see its demand increase rapidly, until its profitability became equal to that of asset B, thus cancelling out any arbitrage opportunity.

The other basic assumption of APT is that the expected profitability of a stock can be modelled by a linear function of various macro-economic or sector-specific factors, weighted according to their impact on the stock by a specific beta coefficient.

These factors are diverse and can range from oil prices to US GDP, from European key rates to the exchange rate of a currency pair. These are all factors likely to influence the price of the asset under study.

The model proposed by Ross, is based on a multi-factor model, where the returns of an asset are related to several macroeconomic factors. These factors could include inflation rates, interest rates, economic indicators, etc. The model assumes that the expected return on an asset is a linear function of these factors.

The period of study start from 2014 to 2019

**Factors are:**
* I - Inflation : Log relative of US consumer Price Index
* TB - Treasury bill rate : end of period return on 1-month bills
* LGB - Long-term government bonds : Return on LT government bonds
* IP - Industrial production : industrial production during month
* Baa - Low grade bond
* EWNY - return on equally weighted portfolio of NYSE listed stocks 
* VWNY - return on a value-weighted portfolio of NYSE listed stocks
* CG - Growth rate in real per capita consumtion
* OG - log relative of producer price index/crude petroleum series

**Derived factors :** 
* Monthly Growth IP: $MP(t) = log(IP(t)/IP_t-1)$
* Annual growth IP: $YP(t) = log(IP(t)/IP_t-12))$
* Annual growth IP: $E(I(t)) = expected infla$
* Unexpected Inflation: $UI(t) = I(t) - E(I(t)|t-1)$
* Real interest ex post: $RHO(t) = TP(t-1) - I(t)$
* Change in expected infla $DEI(t) = E(I(t+1)|t) - E(I(t)|t-1)$
* Risk premium: $UPR(t) = Baa(t) - LGB(t)$
* Term Structure: $UTS(t) = LGB(t) - TB(t-1)$

To collect the data we will use the FRED API and Yahoo Finance API

In [1]:
# basic libs
import pandas as pd
import numpy as np
from datetime import datetime

# stats libs
import statsmodels.api as sm
import statsmodels.regression.linear_model as lm

# import yahoo finance to collect stocks data
import yfinance as yf

In [2]:
from fredapi import Fred

# Get data from FRED :
def get_FRED_series(ticker):
    FRED_API_KEY = "9a54ab68d82273ea59014b16364b5bdd"
    fred = Fred(api_key=FRED_API_KEY)
    data = fred.get_series(ticker)
    data = data.dropna()
    data = pd.DataFrame(data)
    data.index = pd.to_datetime(data.index)
    return data

### Import the data

#### Inflation Factor (I)

Freqency monthly

Inflation, consumer prices for the United States (FPCPITOTLZGUSA)

In [3]:
I_factor = get_FRED_series("FPCPITOTLZGUSA")["2013":"2019"]
I_factor = I_factor.resample('M').ffill()
I_factor.index = I_factor.index + pd.DateOffset(days=1)
I_factor = I_factor.rename(columns={0: 'I_factor'})
I_factor

Unnamed: 0,I_factor
2013-02-01,1.464833
2013-03-01,1.464833
2013-04-01,1.464833
2013-05-01,1.464833
2013-06-01,1.464833
...,...
2018-10-01,2.442583
2018-11-01,2.442583
2018-12-01,2.442583
2019-01-01,2.442583


#### Treasury bill rate factor (TB)

Frequency : daily 
Let's resample to monthly frequency

4-Week Treasury Bill Secondary Market Rate, Discount Basis (DTB4WK)

In [4]:
TB_factor = get_FRED_series("DTB4WK")["2013":"2019"].interpolate()
TB_factor = TB_factor.resample("M").mean()
TB_factor.index = TB_factor.index + pd.DateOffset(days=1)
TB_factor = TB_factor.rename(columns={0: 'TB_factor'})
TB_factor

Unnamed: 0,TB_factor
2013-02-01,0.051429
2013-03-01,0.076842
2013-04-01,0.077500
2013-05-01,0.049091
2013-06-01,0.020455
...,...
2019-09-01,2.031364
2019-10-01,1.952000
2019-11-01,1.696818
2019-12-01,1.553684


#### Long-term government bonds factor (LGB)

Frequency : Daily

Let's resample to monthly frequency

Market Yield on U.S. Treasury Securities at 30-Year Constant Maturity, Quoted on an Investment Basis (DGS30)

In [5]:
LGB_factor = get_FRED_series("DGS30")["2013":"2019"].interpolate()
LGB_factor =  LGB_factor.resample("M").mean()
LGB_factor.index = LGB_factor.index + pd.DateOffset(days=1)
LGB_factor = LGB_factor.rename(columns={0: 'LGB_factor'})
LGB_factor

Unnamed: 0,LGB_factor
2013-02-01,3.080476
2013-03-01,3.165263
2013-04-01,3.162500
2013-05-01,2.932727
2013-06-01,3.112727
...,...
2019-09-01,2.119091
2019-10-01,2.158000
2019-11-01,2.190455
2019-12-01,2.280526


#### Industrial Production factor (IP)

Frequency : Monthly

Industrial Production: Total Index

The industrial production (IP) index measures the real output of all relevant establishments located in the United States, regardless of their ownership, but not those located in U.S. territories.

In [6]:
IP_factor = get_FRED_series("INDPRO")["2013":"2020"]
IP_factor = IP_factor.rename(columns={0: 'IP_factor'})
IP_factor

Unnamed: 0,IP_factor
2013-01-01,98.2029
2013-02-01,98.6733
2013-03-01,99.0788
2013-04-01,98.9658
2013-05-01,99.0567
...,...
2020-08-01,95.8881
2020-09-01,95.8444
2020-10-01,96.4292
2020-11-01,96.8564


#### Low grade bonds factor (Baa)

Frequency : Monthly

Moody's Seasoned Baa Corporate Bond Yield
Financial instruments are based on bonds with maturities 20 years and above.

In [7]:
Baa_factor = get_FRED_series("BAA")["2014":"2020"]
Baa_factor = Baa_factor.rename(columns={0: 'Baa_factor'})
Baa_factor

Unnamed: 0,Baa_factor
2014-01-01,5.19
2014-02-01,5.10
2014-03-01,5.06
2014-04-01,4.90
2014-05-01,4.76
...,...
2020-08-01,3.27
2020-09-01,3.36
2020-10-01,3.44
2020-11-01,3.30


#### Equally weighted equities factor (EWNY)

Frequency : Daily
Let's resample in monthly franquency 

Invesco Equally-Wtd S&P 500 A 

In [8]:
EWNY_factor = yf.download("VADAX", start = "2012-01-01", end = "2019-12-31")['Adj Close']
EWNY_factor = pd.DataFrame(EWNY_factor)
EWNY_factor =  EWNY_factor.resample("M").mean()
EWNY_factor = EWNY_factor.loc["2010":"2020"]
EWNY_factor.index = EWNY_factor.index + pd.DateOffset(days=1)
EWNY_factor = EWNY_factor.rename(columns={'Adj Close': 'EWNY_factor'})
EWNY_factor

[*********************100%%**********************]  1 of 1 completed


Unnamed: 0_level_0,EWNY_factor
Date,Unnamed: 1_level_1
2012-02-01,18.732402
2012-03-01,19.662437
2012-04-01,20.029884
2012-05-01,19.866584
2012-06-01,19.158003
...,...
2019-09-01,44.961274
2019-10-01,46.626997
2019-11-01,46.395535
2019-12-01,48.360168


#### Value weighted equities factor (VWNY)

Frequency : Daily
Let's resample in monthly franquency 

S&P 500 EQUAL WEIGHT INDEX (SP500)

In [9]:
VWNY_factor = get_FRED_series("SP500")["2013":"2019"]
VWNY_factor = VWNY_factor.resample("M").mean()
VWNY_factor.index = VWNY_factor.index + pd.DateOffset(days=1)
VWNY_factor = VWNY_factor.rename(columns={0: 'VWNY_factor'})
VWNY_factor

Unnamed: 0,VWNY_factor
2013-12-01,1797.738889
2014-01-01,1807.775238
2014-02-01,1822.356667
2014-03-01,1817.034737
2014-04-01,1863.523333
...,...
2019-09-01,2897.498182
2019-10-01,2982.156000
2019-11-01,2977.675217
2019-12-01,3104.904500


#### Consumption factor (CG)

Frequency: Monthly

Real personal consumption expenditures per capita (A794RX0Q048SBEA)

In [10]:
CG_factor = get_FRED_series("A794RX0Q048SBEA")["2013":"2020"].pct_change().dropna()
CG_factor = CG_factor.resample("M").ffill()
CG_factor.index = CG_factor.index + pd.DateOffset(days=1)
CG_factor = CG_factor.rename(columns={0: 'CG_factor'})
CG_factor

Unnamed: 0,CG_factor
2013-05-01,0.001151
2013-06-01,0.001151
2013-07-01,0.001151
2013-08-01,0.001978
2013-09-01,0.001978
...,...
2020-07-01,-0.086576
2020-08-01,0.088262
2020-09-01,0.088262
2020-10-01,0.088262


#### Oil price factor (OG)

Frequency: Monthly

Spot Crude Oil Price: West Texas Intermediate (WTI) (WTISPLC)

In [11]:
OG_factor = get_FRED_series("WTISPLC")["2010":"2020"].pct_change().dropna()
OG_factor = OG_factor.rename(columns={0: 'OG_factor'})
OG_factor

Unnamed: 0,OG_factor
2010-02-01,-0.023012
2010-03-01,0.063072
2010-04-01,0.039882
2010-05-01,-0.125947
2010-06-01,0.020450
...,...
2020-08-01,0.040039
2020-09-01,-0.064006
2020-10-01,-0.005804
2020-11-01,0.039086


#### EI factor (Expected inflation)

Frenquency: Monthly

Median expected price change next 12 months, Surveys of Consumers. The most recent value is not shown due to an agreement with the source.

Source : University of Michigan: Inflation Expectation (MICH)

In [12]:
EI_factor = get_FRED_series("MICH")["2014":"2020"]
EI_factor = EI_factor.rename(columns={0: 'EI_factor'})
EI_factor

Unnamed: 0,EI_factor
2014-01-01,3.1
2014-02-01,3.2
2014-03-01,3.2
2014-04-01,3.2
2014-05-01,3.3
...,...
2020-08-01,3.1
2020-09-01,2.6
2020-10-01,2.6
2020-11-01,2.8


In [13]:
list_df = [I_factor,TB_factor,LGB_factor,IP_factor,Baa_factor,EWNY_factor,VWNY_factor,CG_factor,OG_factor,EI_factor]

merged_df = pd.merge(I_factor, TB_factor, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, LGB_factor, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, IP_factor, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, Baa_factor, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, EWNY_factor, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, VWNY_factor, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, CG_factor, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, OG_factor, left_index=True, right_index=True)
merged_df = pd.merge(merged_df, EI_factor, left_index=True, right_index=True)
merged_df

Unnamed: 0,I_factor,TB_factor,LGB_factor,IP_factor,Baa_factor,EWNY_factor,VWNY_factor,CG_factor,OG_factor,EI_factor
2014-01-01,1.464833,0.017143,3.889048,99.9990,5.19,27.764893,1807.775238,0.006670,-0.030831,3.1
2014-02-01,1.622223,0.016667,3.769048,100.7583,5.10,28.103847,1822.356667,0.001776,0.065525,3.2
2014-03-01,1.622223,0.046842,3.662632,101.7767,5.06,28.257299,1817.034737,0.001776,-0.000198,3.2
2014-04-01,1.622223,0.051429,3.620952,101.8425,4.90,29.104841,1863.523333,0.001776,0.012599,3.2
2014-05-01,1.622223,0.023333,3.517619,102.2594,4.76,29.115611,1864.263333,0.007778,0.001078,3.3
...,...,...,...,...,...,...,...,...,...,...
2018-10-01,2.442583,2.000526,3.151053,103.9397,5.07,45.449986,2901.500526,0.003037,0.007404,2.9
2018-11-01,2.442583,2.139091,3.339545,104.0007,5.22,43.194252,2785.464783,0.001658,-0.194912,2.8
2018-12-01,2.442583,2.192500,3.361000,103.9946,5.13,42.759029,2723.229524,0.001658,-0.130618,2.7
2019-01-01,2.442583,2.323684,3.095789,103.3730,5.12,40.273488,2567.307368,0.001658,0.037561,2.7


### Add Fama Factors

In [14]:
df_fama_5 = pd.read_csv('data/fama_french_5_factors.csv', skiprows=3, header=0, names=['Date', 'Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'RF'])

# Convert the 'Date' column to datetime and set it as the index
df_fama_5['Date'] = pd.to_datetime(df_fama_5['Date'], format='%Y%m%d')
df_fama_5.set_index('Date', inplace=True)

df_fama_5

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RMW,CMA,RF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1963-07-01,-0.67,0.02,-0.35,0.03,0.13,0.012
1963-07-02,0.79,-0.28,0.28,-0.08,-0.21,0.012
1963-07-03,0.63,-0.18,-0.10,0.13,-0.25,0.012
1963-07-05,0.40,0.09,-0.28,0.07,-0.30,0.012
1963-07-08,-0.63,0.07,-0.20,-0.27,0.06,0.012
...,...,...,...,...,...,...
2023-07-25,0.25,-0.23,-0.79,0.47,-0.41,0.022
2023-07-26,0.02,0.87,1.03,-0.35,0.65,0.022
2023-07-27,-0.74,-0.80,0.27,0.38,0.14,0.022
2023-07-28,1.14,0.41,-0.33,-0.75,-0.40,0.022


In [15]:
fama_factors = df_fama_5[["Mkt-RF", "SMB", "HML", "RMW", "CMA"]]
fama_factors

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RMW,CMA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1963-07-01,-0.67,0.02,-0.35,0.03,0.13
1963-07-02,0.79,-0.28,0.28,-0.08,-0.21
1963-07-03,0.63,-0.18,-0.10,0.13,-0.25
1963-07-05,0.40,0.09,-0.28,0.07,-0.30
1963-07-08,-0.63,0.07,-0.20,-0.27,0.06
...,...,...,...,...,...
2023-07-25,0.25,-0.23,-0.79,0.47,-0.41
2023-07-26,0.02,0.87,1.03,-0.35,0.65
2023-07-27,-0.74,-0.80,0.27,0.38,0.14
2023-07-28,1.14,0.41,-0.33,-0.75,-0.40


In [16]:
merged_df = pd.merge(merged_df, fama_factors, left_index=True, right_index=True)
merged_df

Unnamed: 0,I_factor,TB_factor,LGB_factor,IP_factor,Baa_factor,EWNY_factor,VWNY_factor,CG_factor,OG_factor,EI_factor,Mkt-RF,SMB,HML,RMW,CMA
2014-04-01,1.622223,0.051429,3.620952,101.8425,4.9,29.104841,1863.523333,0.001776,0.012599,3.2,0.87,0.65,-0.37,-0.18,-0.29
2014-05-01,1.622223,0.023333,3.517619,102.2594,4.76,29.115611,1864.263333,0.007778,0.001078,3.3,0.04,-0.18,-0.16,-0.5,-0.16
2014-07-01,1.622223,0.024286,3.42,102.8163,4.73,30.493863,1947.087619,0.007778,-0.020796,3.3,0.74,0.43,-0.39,0.07,-0.1
2014-08-01,1.622223,0.024091,3.331818,102.6562,4.69,30.818152,1973.1,0.007692,-0.068057,3.2,-0.32,-0.23,-0.05,0.09,0.16
2014-10-01,1.622223,0.011429,3.26,102.9892,4.69,31.084982,1993.22619,0.007692,-0.094518,2.9,-1.39,-0.14,0.3,0.2,-0.07
2014-12-01,1.622223,0.041667,3.038333,103.6345,4.74,31.91717,2044.572105,0.009327,-0.217707,2.8,-0.89,-0.89,0.62,0.1,0.48
2015-04-01,0.118627,0.022727,2.626364,101.244,4.48,32.897358,2079.990455,0.005937,0.138645,2.6,-0.38,0.34,0.44,-0.17,0.24
2015-05-01,0.118627,0.015909,2.585909,100.783,4.89,33.284008,2094.862857,0.00508,0.088522,2.8,1.01,-0.31,-0.6,0.25,-0.1
2015-06-01,0.118627,0.013,2.955,100.4781,5.13,33.39166,2111.9435,0.00508,0.00928,2.7,0.17,-0.06,-0.22,0.24,-0.34
2015-07-01,0.118627,0.004091,3.111818,101.1052,5.2,33.126581,2099.283636,0.00508,-0.149114,2.8,0.61,-0.76,-0.03,0.21,0.08


### Compute derived factors

* Monthly Growth IP: $MP(t) = log(IP(t)/IP_t-1)$
* Annual growth IP: $YP(t) = log(IP(t)/IP_t-12))$
* Annual growth IP: $E(I(t)) = expected infla$
* Unexpected Inflation: $UI(t) = I(t) - E(I(t)|t-1)$
* Real interest ex post: $RHO(t) = TP(t-1) - I(t)$
* Change in expected infla $DEI(t) = E(I(t+1)|t) - E(I(t)|t-1)$
* Risk premium: $UPR(t) = Baa(t) - LGB(t)$
* Term Structure: $UTS(t) = LGB(t) - TB(t-1)$

In [17]:
merged_df["MP_derived_factor"] = np.log(merged_df["IP_factor"] / merged_df["IP_factor"].shift(1))

merged_df["YP_derived_factor"] = np.log(merged_df["IP_factor"]/merged_df["IP_factor"].shift(12))

merged_df["UI_derived_factor"] = merged_df["I_factor"] - merged_df["EI_factor"].shift(1)

merged_df["RHO_derived_factor"] = merged_df["TB_factor"].shift(1)-merged_df["I_factor"]

merged_df["DEI_derived_factor"] = merged_df["EI_factor"] - merged_df["EI_factor"].shift(1)

merged_df["UPR_derived_factor"] = merged_df["Baa_factor"] - merged_df["LGB_factor"]

merged_df["UTS_derived_factor"] = merged_df["LGB_factor"] - merged_df["TB_factor"].shift(1)

merged_df.dropna(inplace=True)

In [18]:
merged_df

Unnamed: 0,I_factor,TB_factor,LGB_factor,IP_factor,Baa_factor,EWNY_factor,VWNY_factor,CG_factor,OG_factor,EI_factor,...,HML,RMW,CMA,MP_derived_factor,YP_derived_factor,UI_derived_factor,RHO_derived_factor,DEI_derived_factor,UPR_derived_factor,UTS_derived_factor
2015-12-01,0.118627,0.068947,3.03,98.939,5.46,32.149798,2080.6165,0.001956,-0.123704,2.6,...,0.25,-0.08,-0.11,-0.012536,-0.028924,-2.581373,-0.119103,-0.1,2.43,3.030476
2016-02-01,1.261583,0.224211,2.858421,98.9136,5.34,29.352911,1918.597895,0.005756,-0.042929,2.5,...,-1.0,0.06,-0.35,-0.000257,-0.033266,-1.338417,-1.192636,-0.1,2.481579,2.789474
2016-03-01,1.261583,0.251,2.623,98.1907,5.13,29.295104,1904.4185,0.005756,0.238456,2.7,...,0.39,-0.58,-0.61,-0.007335,-0.046032,-1.238417,-1.037373,0.2,2.507,2.398789
2016-04-01,1.261583,0.250909,2.684545,98.4669,4.79,31.709133,2021.954091,0.005756,0.08522,2.8,...,-0.62,-0.39,-0.1,0.002809,-0.041665,-1.438417,-1.010583,0.1,2.105455,2.433545
2016-06-01,1.261583,0.221429,2.627619,98.7275,4.53,32.607006,2065.550952,0.003151,0.043888,2.6,...,-0.2,-0.29,0.02,0.002643,-0.042261,-1.538417,-1.010674,-0.2,1.902381,2.37671
2016-07-01,1.261583,0.218636,2.452273,98.836,4.22,33.052939,2083.891364,0.003151,-0.08429,2.7,...,-0.43,-0.07,0.27,0.001098,-0.047408,-1.338417,-1.040155,0.1,1.767727,2.230844
2016-08-01,1.261583,0.2585,2.227,98.7554,4.24,34.094875,2148.902,0.004926,0.001568,2.5,...,-0.9,0.47,-0.74,-0.000816,-0.024887,-1.438417,-1.042947,-0.2,2.013,2.008364
2016-09-01,1.261583,0.257826,2.261739,98.6596,4.31,34.637547,2177.482174,0.004926,0.010286,2.4,...,-0.5,0.1,-0.14,-0.000971,-0.021294,-1.238417,-1.003083,-0.1,2.048261,2.003239
2016-11-01,1.261583,0.236,2.5005,98.3452,4.71,34.071154,2143.020952,0.003201,-0.082764,2.4,...,0.2,-0.5,-0.06,-0.003192,-0.021456,-1.138417,-1.003757,0.0,2.2095,2.242674
2016-12-01,1.261583,0.2895,2.862,99.0314,4.83,34.725018,2164.985714,0.003201,0.138195,2.2,...,2.03,0.31,0.62,0.006953,-0.020725,-1.138417,-1.025583,-0.2,1.968,2.626


In [19]:
derived_factor = ["MP_derived_factor","YP_derived_factor","UI_derived_factor","RHO_derived_factor","DEI_derived_factor","UPR_derived_factor","UTS_derived_factor","EI_factor","Mkt-RF","SMB","HML"]
df_derived_factor = merged_df[derived_factor]
df_derived_factor

Unnamed: 0,MP_derived_factor,YP_derived_factor,UI_derived_factor,RHO_derived_factor,DEI_derived_factor,UPR_derived_factor,UTS_derived_factor,EI_factor,Mkt-RF,SMB,HML
2015-12-01,-0.012536,-0.028924,-2.581373,-0.119103,-0.1,2.43,3.030476,2.6,0.97,-0.63,0.25
2016-02-01,-0.000257,-0.033266,-1.338417,-1.192636,-0.1,2.481579,2.789474,2.5,-0.04,-0.3,-1.0
2016-03-01,-0.007335,-0.046032,-1.238417,-1.037373,0.2,2.507,2.398789,2.7,2.34,-0.65,0.39
2016-04-01,0.002809,-0.041665,-1.438417,-1.010583,0.1,2.105455,2.433545,2.8,0.64,-0.32,-0.62
2016-06-01,0.002643,-0.042261,-1.538417,-1.010674,-0.2,1.902381,2.37671,2.6,0.2,0.63,-0.2
2016-07-01,0.001098,-0.047408,-1.338417,-1.040155,0.1,1.767727,2.230844,2.7,0.24,0.47,-0.43
2016-08-01,-0.000816,-0.024887,-1.438417,-1.042947,-0.2,2.013,2.008364,2.5,-0.16,0.06,-0.9
2016-09-01,-0.000971,-0.021294,-1.238417,-1.003083,-0.1,2.048261,2.003239,2.4,0.03,0.07,-0.5
2016-11-01,-0.003192,-0.021456,-1.138417,-1.003757,0.0,2.2095,2.242674,2.4,-0.68,-0.38,0.2
2016-12-01,0.006953,-0.020725,-1.138417,-1.025583,-0.2,1.968,2.626,2.2,-0.36,-0.39,2.03


### Build the portfolio for US stock market

To constitute the portfolio, we propose to select companies with different capitalization 

* Large-Cap (Large Capitalization): Market cap typically between $10B-$200B
* Mid-Cap (Medium Capitalization): Market cap typically between $2B-$10B
* Small-Cap (Small Capitalization): Market cap typically under $2B

**Large-Cap**
* McDonald's Corporation Common Stock (MCD)
* Amazon.com, Inc. Common Stock (AMZN)

**Mid-Cap**
* EMCOR Group, Inc. Common Stock (EME)
* Bank OZK Common Stock (OZK)

**Smal-Cap**
* First Financial Bancorp. Common Stock (FFBC)
* Douglas Emmett, Inc. Common Stock (DEI)

#### Let's import this index thanks to Yahoo Finance API

In [20]:
# Download historical stock price data using yfinance
symbols=["MCD", "AMZN", "EME", "OZK", "FFBC", "DEI"]

df_portfolio = yf.download(symbols, "2014-10-31", end = "2019-12-31")['Adj Close']
df_portfolio = df_portfolio.resample('M').sum()
df_portfolio.index = df_portfolio.index + pd.DateOffset(days=1)
df_portfolio = df_portfolio['2015-11-01':'2019-02-01']
df_portfolio

[*********************100%%**********************]  6 of 6 completed


Unnamed: 0_level_0,AMZN,DEI,EME,FFBC,MCD,OZK
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-11-01,623.417501,505.317141,972.11797,313.32098,1905.12632,825.270676
2015-12-01,657.695496,467.035675,948.56245,296.976921,1853.377609,844.967258
2016-01-01,736.188499,508.266535,1036.999561,306.444736,2128.295303,892.780865
2016-02-01,571.008501,419.743717,811.704872,231.999832,1854.394844,685.180843
2016-03-01,530.619999,408.554642,864.76178,245.358757,1957.997826,632.546032
2016-04-01,629.611498,489.490284,1008.602489,301.150006,2230.914101,736.175747
2016-05-01,644.273998,507.826378,978.213242,299.090363,2230.564117,720.69664
2016-06-01,732.347492,533.843159,952.772678,307.120089,2220.94532,637.621273
2016-07-01,788.029499,576.720175,1016.318577,326.409743,2239.266975,666.312452
2016-08-01,741.467499,562.85428,995.086601,308.926917,2060.482994,606.099115


#### Compute portfolio return

We consider that it is a equally portfolio

In [21]:
# Calculate the monthly returns for each asset
df_portfolio_return = df_portfolio.pct_change()

# Assume an equally weighted portfolio
weights = [1/len(df_portfolio.columns)] * len(df_portfolio.columns)

# Calculate the equally weighted portfolio returns
df_portfolio["return"] = (df_portfolio_return * weights).sum(axis=1)

# Display the portfolio returns
df_portfolio = df_portfolio['2015-12-01':'2019-02-01']
df_portfolio

Unnamed: 0_level_0,AMZN,DEI,EME,FFBC,MCD,OZK,return
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2015-12-01,657.695496,467.035675,948.56245,296.976921,1853.377609,844.967258,-0.016744
2016-01-01,736.188499,508.266535,1036.999561,306.444736,2128.295303,892.780865,0.08961
2016-02-01,571.008501,419.743717,811.704872,231.999832,1854.394844,685.180843,-0.203325
2016-03-01,530.619999,408.554642,864.76178,245.358757,1957.997826,632.546032,0.000768
2016-04-01,629.611498,489.490284,1008.602489,301.150006,2230.914101,736.175747,0.180266
2016-05-01,644.273998,507.826378,978.213242,299.090363,2230.564117,720.69664,0.000433
2016-06-01,732.347492,533.843159,952.772678,307.120089,2220.94532,637.621273,0.011532
2016-07-01,788.029499,576.720175,1016.318577,326.409743,2239.266975,666.312452,0.056517
2016-08-01,741.467499,562.85428,995.086601,308.926917,2060.482994,606.099115,-0.054632
2016-09-01,879.564999,665.370586,1262.474537,378.35425,2260.337013,696.021929,0.184531


In [22]:
df_derived_factor = pd.merge(merged_df, df_portfolio["return"], left_index=True, right_index=True)
df_derived_factor

Unnamed: 0,I_factor,TB_factor,LGB_factor,IP_factor,Baa_factor,EWNY_factor,VWNY_factor,CG_factor,OG_factor,EI_factor,...,RMW,CMA,MP_derived_factor,YP_derived_factor,UI_derived_factor,RHO_derived_factor,DEI_derived_factor,UPR_derived_factor,UTS_derived_factor,return
2015-12-01,0.118627,0.068947,3.03,98.939,5.46,32.149798,2080.6165,0.001956,-0.123704,2.6,...,-0.08,-0.11,-0.012536,-0.028924,-2.581373,-0.119103,-0.1,2.43,3.030476,-0.016744
2016-02-01,1.261583,0.224211,2.858421,98.9136,5.34,29.352911,1918.597895,0.005756,-0.042929,2.5,...,0.06,-0.35,-0.000257,-0.033266,-1.338417,-1.192636,-0.1,2.481579,2.789474,-0.203325
2016-03-01,1.261583,0.251,2.623,98.1907,5.13,29.295104,1904.4185,0.005756,0.238456,2.7,...,-0.58,-0.61,-0.007335,-0.046032,-1.238417,-1.037373,0.2,2.507,2.398789,0.000768
2016-04-01,1.261583,0.250909,2.684545,98.4669,4.79,31.709133,2021.954091,0.005756,0.08522,2.8,...,-0.39,-0.1,0.002809,-0.041665,-1.438417,-1.010583,0.1,2.105455,2.433545,0.180266
2016-06-01,1.261583,0.221429,2.627619,98.7275,4.53,32.607006,2065.550952,0.003151,0.043888,2.6,...,-0.29,0.02,0.002643,-0.042261,-1.538417,-1.010674,-0.2,1.902381,2.37671,0.011532
2016-07-01,1.261583,0.218636,2.452273,98.836,4.22,33.052939,2083.891364,0.003151,-0.08429,2.7,...,-0.07,0.27,0.001098,-0.047408,-1.338417,-1.040155,0.1,1.767727,2.230844,0.056517
2016-08-01,1.261583,0.2585,2.227,98.7554,4.24,34.094875,2148.902,0.004926,0.001568,2.5,...,0.47,-0.74,-0.000816,-0.024887,-1.438417,-1.042947,-0.2,2.013,2.008364,-0.054632
2016-09-01,1.261583,0.257826,2.261739,98.6596,4.31,34.637547,2177.482174,0.004926,0.010286,2.4,...,0.1,-0.14,-0.000971,-0.021294,-1.238417,-1.003083,-0.1,2.048261,2.003239,0.184531
2016-11-01,1.261583,0.236,2.5005,98.3452,4.71,34.071154,2143.020952,0.003201,-0.082764,2.4,...,-0.5,-0.06,-0.003192,-0.021456,-1.138417,-1.003757,0.0,2.2095,2.242674,-0.001795
2016-12-01,1.261583,0.2895,2.862,99.0314,4.83,34.725018,2164.985714,0.003201,0.138195,2.2,...,0.31,0.62,0.006953,-0.020725,-1.138417,-1.025583,-0.2,1.968,2.626,0.060186


#### Fit the following regression

$Portfolio_{return} = \alpha + \beta_1 MP(t) + \beta_2 YP(t) + \beta_3 E[I(t)] + \beta_4 UI(t) + \beta_5 RHO(t) + \beta_6 DEI(t) + \beta_7 URP(t) + \beta_8 UTS(t) + \beta_9 EI(t) + \beta_{10} Mkt-RF + \beta_{11} SMB + \beta_{12} HML$

$\alpha:$ the constant of the regression. The $\alpha$ is a performance indicator

Alpha is used in finance as a performance measure, indicating when a strategy has managed to beat the market return over a certain period.

Alpha refers to excess returns earned on an investment above the benchmark return. Active portfolio managers seek to generate alpha in diversified portfolios, with diversification aimed at eliminating unsystematic risks.

In other words, alpha is the return on an investment that is not the result of a general market movement. Thus, an alpha equal to zero would indicate that the portfolio or fund is tracking the benchmark perfectly, and that the manager has not added or lost any additional value relative to the overall market.


In [23]:
df_derived_factor.iloc[:,-12:-1]

Unnamed: 0,SMB,HML,RMW,CMA,MP_derived_factor,YP_derived_factor,UI_derived_factor,RHO_derived_factor,DEI_derived_factor,UPR_derived_factor,UTS_derived_factor
2015-12-01,-0.63,0.25,-0.08,-0.11,-0.012536,-0.028924,-2.581373,-0.119103,-0.1,2.43,3.030476
2016-02-01,-0.3,-1.0,0.06,-0.35,-0.000257,-0.033266,-1.338417,-1.192636,-0.1,2.481579,2.789474
2016-03-01,-0.65,0.39,-0.58,-0.61,-0.007335,-0.046032,-1.238417,-1.037373,0.2,2.507,2.398789
2016-04-01,-0.32,-0.62,-0.39,-0.1,0.002809,-0.041665,-1.438417,-1.010583,0.1,2.105455,2.433545
2016-06-01,0.63,-0.2,-0.29,0.02,0.002643,-0.042261,-1.538417,-1.010674,-0.2,1.902381,2.37671
2016-07-01,0.47,-0.43,-0.07,0.27,0.001098,-0.047408,-1.338417,-1.040155,0.1,1.767727,2.230844
2016-08-01,0.06,-0.9,0.47,-0.74,-0.000816,-0.024887,-1.438417,-1.042947,-0.2,2.013,2.008364
2016-09-01,0.07,-0.5,0.1,-0.14,-0.000971,-0.021294,-1.238417,-1.003083,-0.1,2.048261,2.003239
2016-11-01,-0.38,0.2,-0.5,-0.06,-0.003192,-0.021456,-1.138417,-1.003757,0.0,2.2095,2.242674
2016-12-01,-0.39,2.03,0.31,0.62,0.006953,-0.020725,-1.138417,-1.025583,-0.2,1.968,2.626


In [25]:
derived_factor = ["MP_derived_factor","YP_derived_factor","UI_derived_factor","RHO_derived_factor","DEI_derived_factor","UPR_derived_factor","UTS_derived_factor","EI_factor","Mkt-RF","SMB","HML"] 

y = df_derived_factor["return"]
X = df_derived_factor[derived_factor]  # don't take portfolio return column

X = sm.add_constant(X) # add constante (i.e alpha)
apt = lm.OLS(y, X).fit()
apt.summary()

0,1,2,3
Dep. Variable:,return,R-squared:,0.598
Model:,OLS,Adj. R-squared:,0.283
Method:,Least Squares,F-statistic:,1.895
Date:,"Fri, 17 Nov 2023",Prob (F-statistic):,0.13
Time:,00:13:09,Log-Likelihood:,32.157
No. Observations:,26,AIC:,-40.31
Df Residuals:,14,BIC:,-25.22
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1.9553,0.659,2.967,0.010,0.542,3.369
MP_derived_factor,6.6422,4.192,1.584,0.135,-2.349,15.634
YP_derived_factor,-0.7179,1.838,-0.391,0.702,-4.661,3.225
UI_derived_factor,-0.1286,0.143,-0.898,0.384,-0.436,0.178
RHO_derived_factor,0.0922,0.140,0.658,0.521,-0.209,0.393
DEI_derived_factor,0.4488,0.244,1.840,0.087,-0.074,0.972
UPR_derived_factor,-0.1869,0.111,-1.685,0.114,-0.425,0.051
UTS_derived_factor,-0.1854,0.109,-1.701,0.111,-0.419,0.048
EI_factor,-0.4717,0.229,-2.060,0.059,-0.963,0.019

0,1,2,3
Omnibus:,1.38,Durbin-Watson:,2.682
Prob(Omnibus):,0.502,Jarque-Bera (JB):,0.418
Skew:,0.226,Prob(JB):,0.811
Kurtosis:,3.426,Cond. No.,939.0
