# Factor Models
- Referenecs
    - Dr. Kempthorne (2013): [Factor Models](https://ocw.mit.edu/courses/mathematics/18-s096-topics-in-mathematics-with-applications-in-finance-fall-2013/lecture-notes/MIT18_S096F13_lecnote15.pdf)

## Arbitrage Pricing Theory (APT)
- The CAPM is derived from market equilibrium, the equality of asset demand and supply.
    - This equality implies that the market portfolio must be mean-variance efficient, and a typical investor holds the market portfolio.
    - The systematic risk embodied in the beta coefficients determines the risk premia.
- However, the APT doesn't need any utility function and doesn't assume that return rates follow a normal distribution.
- Ross (1976): [The arbitrage theory of capital asset pricing](https://www.sciencedirect.com/science/article/pii/0022053176900466)
    - By **the law of one price**, two portfolios that 
have the same risk must have the same expected return (or equivalently the same price); otherwise, an **arbitrage opportunity** exists!
        - [Arbitrage](https://en.wikipedia.org/wiki/Arbitrage): zero initial investment, bearing no risk (sure win), and positive return.
- Let $K$ be the number of factors.
- The arbitrage-free model can be formulated as $$R_i - R_f = \sum_{j = 1}^{K}\beta_{i, j}\left(\mathbf{E}(R_j) - R_f\right) + \varepsilon_{i}.$$ 
    - You can say that CAPM is a special case of APT!
- In APT, you can use any factor which is not necessary to be a market factor.
    - Chen, Roll, and Ross (1986): [Economic Forces and the Stock Market](https://www.jstor.org/stable/2352710)
        - Monthly growth in industrial production
        - Change in expected inflation
        - Unanticipated inflation
        - Unanticipated change in the risk permium between risky bonds and default-free bonds
        - Unanticipated change in the difference between the return on long-term government bonds and the return on the short-term government bonds
- Applications of APT:
    - Asset allocation;
    - Computation of the cost of capital;
    - Performance evaluation of managed funds.
- References
    - https://www0.gsb.columbia.edu/faculty/ghuberman/APT-Huberman-Wang.pdf

## Fama-French 3-Factor Model
- Fama and French (1993): [Common risk factors in the returns on stocks and bonds](https://www.sciencedirect.com/science/article/pii/0304405X93900235)
    - The CAPM is the single-factor model to explain the risk premium of stocks.
    - Fama and French consider the extra two explanatory variables in the model: (1) the market size, denoted by **SMB (small minus big)** and (2) the book-to-market ratio, denoted by **HML (high minus low)**:

$$R_{i} - R_f = \beta_{M, i} (R_M - R_f) + \beta_{\text{SMB}, i} \text{SMB} + \beta_{\text{HML}, i} \text{HML} + \varepsilon_i.$$

- Note that the HML factor means that value stocks ($\beta_{\text{HML}, i} > 0$) outperform growth stocks ($\beta_{\text{HML}, j} < 0$) based on the empirical evidences.
- Fama shared 2013 Nobel Memorial Prize in Economic Sciences.
- References
    - https://www.nobelprize.org/prizes/economic-sciences/2013/fama/facts/
    - Kenneth R. French's data library [link](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html)
    - 石川 (2019): [Eugene Fama，一段50年的傳奇](http://www.liang-xin.com/website/w/h?mt=2&mc=3194266&cc=3021364&fp=d&c=23909780)
    - 投資思維－主動投資或被動投資？ (The Limit of Theory) [link1](https://justininvesting.wordpress.com/2018/04/19/the-limits-of-theory/) [link2](https://justininvesting.wordpress.com/2018/04/24/the-limits-of-theory-2/)
    - [產業價值鏈資訊平台](https://ic.tpex.org.tw/index.php)
    

### Example: Bio-Tech in Taiwan Market

In [None]:
%%capture

!wget https://www.csie.ntu.edu.tw/~d00922011/python/data/ff3_monthly_data.csv
!pip install yfinance
!pip install --upgrade pandas-datareader

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import statsmodels.api as sm
import yfinance as yf

raw = pd.read_csv("ff3_monthly_data.csv")
print(raw["Code"].unique())

# https://pchome.megatime.com.tw/group/mkt0/cid22.html

  import pandas.util.testing as tm


[1316 1701 1707 1708 1709 1710 1711 1712 1713 1714 1717 1718 1720 1721
 1722 1723 1724 1725 1726 1727 1730 1731 1732 1733 1734 1735 1736 1762
 1773 1783 1786 1789 3164 3705 4104 4106 4108 4119 4133 4137 4141 4142
 4144 4164 4190 4720 4722 4725 4737 4739 4746 4755 4763 6452]


In [None]:
df = raw.groupby('Date')
df.head()

Unnamed: 0,Code,Date,r,MV,PB
0,1316,201601,1.5463,1673,1.36
1,1701,201601,-2.2788,5752,1.06
2,1707,201601,-0.8197,23649,7.45
3,1708,201601,-2.1809,5737,1.13
4,1709,201601,-2.9900,6277,0.97
...,...,...,...,...,...
2106,1316,201904,-4.1915,2871,1.80
2107,1701,201904,-0.2644,5618,1.00
2108,1707,201904,7.7892,29233,5.50
2109,1708,201904,1.3592,5257,0.92


In [None]:
def cal_smb_hml(df):
    
    import warnings
    warnings.filterwarnings("ignore")
    
    # Categorized into two groups: B(ig) and S(mall).
    med = df["MV"].median()
    df["SB"] = df["MV"].apply(lambda x: "B" if x >= med else "S")

    # Categorized into three groups: H, M, L
    df["BM"] = 1 / df["PB"] # BM: book-to-market ratio; PB: price-to-book ratio (股價淨值比)
    low_threshold, high_threshold = df["BM"].quantile([0.3, 0.7])
    df["HML"] = df["BM"].apply(lambda x: "H" if x >= high_threshold else "M")
    df["HML"] = df.apply(lambda row: "L" if row["BM"] <= low_threshold else row["HML"], axis = 1)

    # Categorized into 6 groups
    df_SL = df.query('(SB=="S") & (HML=="L")')
    df_SM = df.query('(SB=="S") & (HML=="M")')
    df_SH = df.query('(SB=="S") & (HML=="H")')
    df_BL = df.query('(SB=="B") & (HML=="L")')
    df_BM = df.query('(SB=="B") & (HML=="M")')
    df_BH = df.query('(SB=="B") & (HML=="H")')

    # Return rate for each group
    R_SL = (df_SL["r"] * df_SL['MV']).sum() / df_SL['MV'].sum() / 100
    R_SM = (df_SM["r"] * df_SM['MV']).sum() / df_SM['MV'].sum() / 100
    R_SH = (df_SH["r"] * df_SH['MV']).sum() / df_SH['MV'].sum() / 100
    R_BL = (df_BL["r"] * df_BL['MV']).sum() / df_BL['MV'].sum() / 100
    R_BM = (df_BM["r"] * df_BM['MV']).sum() / df_BM['MV'].sum() / 100
    R_BH = (df_BH["r"] * df_BH['MV']).sum() / df_BH['MV'].sum() / 100

    # then we calculate SMB and HML
    smb = (R_SL + R_SM + R_SH - R_BL - R_BM - R_BH) / 3
    hml = (R_SH + R_BH - R_SL - R_BL) / 2
    return smb, hml

In [None]:
factors = []
for date, group in df:
    smb, hml = cal_smb_hml(group)
    factors.append([date, smb, hml])
    
df_factor = pd.DataFrame(factors, columns = ['Date', 'SMB', 'HML'])
del df_factor["Date"]
df_factor.head()

Unnamed: 0,SMB,HML
0,-0.013187,0.016115
1,-0.019467,0.024062
2,0.04195,-0.019357
3,0.008583,0.040254
4,0.031335,-0.085267


In [None]:
def monthly_return_rate_calculator(array_like):
    return array_like[-1] / array_like[0] - 1

In [None]:
df_Ri = pd.DataFrame(yf.download(",".join(selected_ticks), start = "2016-01-01", end = "2019-04-30")["Adj Close"])
df_Ri.columns = [x[:4] + "tw" for x in df_Ri.columns]
df_Ri = df_Ri.resample("M").apply(monthly_return_rate_calculator)
df_Ri.reset_index(inplace = True)
del df_Ri["Date"]
df_Ri.head()

[*********************100%***********************]  54 of 54 completed

1 Failed download:
- 4725.TW: No data found, symbol may be delisted


Unnamed: 0,1316tw,1701tw,1707tw,1708tw,1709tw,1710tw,1711tw,1712tw,1713tw,1714tw,1717tw,1718tw,1720tw,1721tw,1722tw,1723tw,1724tw,1725tw,1726tw,1727tw,1730tw,1731tw,1732tw,1733tw,1734tw,1735tw,1736tw,1762tw,1773tw,1783tw,1786tw,1789tw,3164tw,3705tw,4104tw,4106tw,4108tw,4119tw,4133tw,4137tw,4141tw,4142tw,4144tw,4164tw,4190tw,4720tw,4722tw,4725tw,4737tw,4739tw,4746tw,4755tw,4763tw,6452tw
0,0.019689,-0.010363,0.008333,-0.025157,-0.010135,-0.039702,0.046154,-0.015504,0.022951,-0.043176,0.035256,-0.015757,-0.080429,-0.102041,-0.005917,0.02381,-0.063333,-0.043011,0.0,-0.080268,0.005714,-0.01845,-0.007722,-0.094522,-0.026549,-0.016807,-0.025341,0.061039,-0.01875,-0.055804,-0.065359,-0.034749,-0.039501,0.0,0.012146,0.027586,0.018519,-0.185841,-0.115152,-0.051095,-0.011945,-0.092742,-0.045956,-0.054146,0.0,-0.009259,0.019003,,-0.118644,-0.109317,-0.002849,0.03386,0.130484,-0.062674
1,0.012295,0.052494,0.01355,0.01461,0.095563,0.069136,0.009662,0.055118,0.009615,0.097561,0.012384,0.019656,-0.016854,0.107143,0.056604,0.017467,0.08156,-0.01107,0.080556,0.032727,0.051873,0.00738,0.037879,0.032787,0.027027,0.004274,0.058252,-0.006105,0.018963,0.084906,-0.054101,0.097416,0.16849,0.031579,0.009881,0.087883,-0.033688,0.031788,0.02349,-0.063291,0.003472,0.072052,0.007619,0.008803,-0.124088,0.042945,0.060465,,-0.04945,0.056604,0.05042,0.021834,-0.00463,0.036212
2,0.06998,-0.0325,-0.016,-0.055732,0.031056,-0.107456,-0.042654,0.07037,0.003174,-0.038561,0.003053,-0.019254,-0.002861,0.03252,0.065718,-0.025751,0.121311,0.04461,-0.002571,-0.007042,-0.012162,0.091575,-0.011029,0.056561,0.024433,0.144068,-0.031136,-0.045399,0.069512,0.041304,0.1527,-0.107914,-0.075926,0.020325,-0.00789,0.064087,0.682569,0.032175,0.001639,-0.064189,0.206294,0.023669,-0.075145,-0.045375,-0.016667,-0.017544,-0.015965,,-0.01711,0.257985,-0.002642,0.023707,0.04386,0.057718
3,-0.023365,-0.026042,0.062842,-0.038786,-0.054217,0.022222,-0.002475,-0.041379,-0.028125,-0.043941,-0.021244,-0.033457,-0.049929,0.050781,-0.053937,-0.04,-0.025937,-0.02847,-0.078205,-0.031579,0.032787,-0.060606,-0.007435,-0.068478,-0.060811,0.066176,-0.065934,-0.1,-0.005701,-0.079167,-0.123397,-0.127441,-0.120316,-0.038,-0.011976,-0.06214,-0.204819,0.079747,-0.058253,-0.181655,0.053296,-0.122047,-0.083682,-0.102334,-0.142259,-0.015015,-0.013158,,-0.069307,-0.121469,0.041444,0.077731,-0.179592,-0.102564
4,0.08134,0.010753,0.03,-0.017794,0.022364,-0.050481,0.02,0.025271,0.003236,-0.017021,0.021875,-0.037958,0.027108,0.117424,-0.001163,0.023256,-0.002985,0.018248,0.022315,-0.065217,-0.02122,-0.00722,0.007663,0.195828,0.003617,-0.087413,-0.020243,0.013024,0.044725,-0.002326,0.003759,0.104583,0.045249,-0.019648,0.034413,0.020356,0.037448,0.019977,0.300172,-0.132898,0.070876,0.06682,0.188095,0.030303,-0.009756,0.012121,0.073025,,0.163755,0.002132,0.181234,-0.003891,0.145,0.066667


In [None]:
df_Rm = pd.DataFrame(yf.download("^twii", start = "2016-01-01", end = "2019-04-30")["Adj Close"]).resample("M").apply(monthly_return_rate_calculator)
df_Rm.columns = ["Rm"]
df_Rm = df_Rm.reset_index()
del df_Rm["Date"]
df_Rm.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0,Rm
0,-0.004148
1,0.031165
2,0.030538
3,-0.032301
4,0.029113


In [None]:
Rf = 1.01 ** (1 / 12) - 1
print(Rf)

0.0008295381143461622


In [None]:
df_RimRf = df_Ri - Rf
df_RmmRf = df_Rm - Rf

df2 = pd.merge(df_RmmRf, df_factor, left_index = True, right_index = True, how = 'inner')
df3 = pd.merge(df2, df_RimRf, left_index = True, right_index = True, how = 'inner')
df3.head()

Unnamed: 0,Rm,SMB,HML,1316tw,1701tw,1707tw,1708tw,1709tw,1710tw,1711tw,1712tw,1713tw,1714tw,1717tw,1718tw,1720tw,1721tw,1722tw,1723tw,1724tw,1725tw,1726tw,1727tw,1730tw,1731tw,1732tw,1733tw,1734tw,1735tw,1736tw,1762tw,1773tw,1783tw,1786tw,1789tw,3164tw,3705tw,4104tw,4106tw,4108tw,4119tw,4133tw,4137tw,4141tw,4142tw,4144tw,4164tw,4190tw,4720tw,4722tw,4725tw,4737tw,4739tw,4746tw,4755tw,4763tw,6452tw
0,-0.004978,-0.013187,0.016115,0.01886,-0.011192,0.007504,-0.025987,-0.010965,-0.040532,0.045324,-0.016333,0.022121,-0.044005,0.034427,-0.016587,-0.081259,-0.10287,-0.006747,0.02298,-0.064163,-0.04384,-0.00083,-0.081097,0.004885,-0.01928,-0.008552,-0.095352,-0.027378,-0.017636,-0.026171,0.060209,-0.01958,-0.056633,-0.066189,-0.035579,-0.040331,-0.00083,0.011316,0.026757,0.017689,-0.18667,-0.115981,-0.051924,-0.012775,-0.093572,-0.046785,-0.054975,-0.00083,-0.010089,0.018173,,-0.119474,-0.110146,-0.003678,0.03303,0.129654,-0.063504
1,0.030336,-0.019467,0.024062,0.011465,0.051664,0.012721,0.013781,0.094734,0.068306,0.008832,0.054289,0.008786,0.096731,0.011554,0.018826,-0.017683,0.106313,0.055774,0.016638,0.080731,-0.0119,0.079726,0.031898,0.051044,0.006551,0.037049,0.031957,0.026198,0.003444,0.057423,-0.006934,0.018134,0.084076,-0.054931,0.096586,0.167661,0.030749,0.009052,0.087053,-0.034518,0.030959,0.02266,-0.064121,0.002643,0.071223,0.006789,0.007973,-0.124917,0.042115,0.059636,,-0.05028,0.055774,0.049591,0.021005,-0.005459,0.035382
2,0.029709,0.04195,-0.019357,0.06915,-0.033329,-0.01683,-0.056562,0.030226,-0.108286,-0.043483,0.069541,0.002345,-0.03939,0.002224,-0.020083,-0.003691,0.031691,0.064888,-0.026581,0.120482,0.04378,-0.0034,-0.007872,-0.012992,0.090745,-0.011859,0.055732,0.023603,0.143238,-0.031965,-0.046228,0.068683,0.040475,0.151871,-0.108743,-0.076755,0.019496,-0.008719,0.063257,0.681739,0.031346,0.00081,-0.065019,0.205464,0.022839,-0.075974,-0.046205,-0.017496,-0.018373,-0.016795,,-0.01794,0.257156,-0.003472,0.022877,0.04303,0.056888
3,-0.033131,0.008583,0.040254,-0.024194,-0.026871,0.062012,-0.039615,-0.055046,0.021393,-0.003305,-0.042209,-0.028955,-0.044771,-0.022074,-0.034287,-0.050758,0.049952,-0.054767,-0.04083,-0.026766,-0.029299,-0.079035,-0.032409,0.031957,-0.061436,-0.008264,-0.069308,-0.06164,0.065347,-0.066764,-0.10083,-0.006531,-0.079996,-0.124227,-0.128271,-0.121145,-0.038829,-0.012805,-0.06297,-0.205649,0.078917,-0.059082,-0.182484,0.052466,-0.122877,-0.084512,-0.103163,-0.143089,-0.015845,-0.013987,,-0.070136,-0.122298,0.040614,0.076902,-0.180421,-0.103394
4,0.028284,0.031335,-0.085267,0.08051,0.009923,0.02917,-0.018623,0.021535,-0.05131,0.01917,0.024441,0.002407,-0.017851,0.021045,-0.038788,0.026279,0.116595,-0.001992,0.022426,-0.003815,0.017419,0.021486,-0.066047,-0.02205,-0.00805,0.006833,0.194999,0.002787,-0.088242,-0.021073,0.012195,0.043895,-0.003155,0.00293,0.103753,0.044419,-0.020478,0.033584,0.019527,0.036619,0.019147,0.299342,-0.133727,0.070047,0.065991,0.187266,0.029474,-0.010586,0.011292,0.072196,,0.162926,0.001303,0.180404,-0.004721,0.14417,0.065837


In [None]:
for stock in df3.columns[3:]:
    model = sm.OLS(df3[stock], sm.add_constant(df3[['Rm', 'SMB', 'HML']].values))
    result = model.fit()
    print(result.summary())
    print()

                            OLS Regression Results                            
Dep. Variable:                 1316tw   R-squared:                       0.204
Model:                            OLS   Adj. R-squared:                  0.138
Method:                 Least Squares   F-statistic:                     3.080
Date:                Sat, 03 Jul 2021   Prob (F-statistic):             0.0396
Time:                        13:16:34   Log-Likelihood:                 55.996
No. Observations:                  40   AIC:                            -104.0
Df Residuals:                      36   BIC:                            -97.24
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0066      0.011      0.603      0.5

## Extension: Carhart 4-Factor Model
- Carhart (1997): [On Persistence in Mutual Fund Performance](https://onlinelibrary.wiley.com/doi/full/10.1111/j.1540-6261.1997.tb03808.x)
- References
    - https://en.wikipedia.org/wiki/Carhart_four-factor_model

## Extension: Fama-French 5-Factor Model
- References
    - Fama and French (2015): [A Five-Factor Asset Pricing Model](https://www.sciencedirect.com/science/article/abs/pii/S0304405X14002323)
    - Description of Fama/French 5 Factors (2x3) [link](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_5_factors_2x3.html)

## AQR: Buffett's Alpha
- Frazzini, Kabiller, and Pedersen (2018): [Buffett's Alpha](https://www.tandfonline.com/doi/full/10.2469/faj.v74.n4.3)
    - Asness, Moskowitz, and Pedersen (2013): [Value and Momentum Everywhere](https://www.aqr.com/Insights/Research/Journal-Article/Value-and-Momentum-Everywhere)
    - Frazzini and H. Pedersen (2014): [Betting Against Beta](https://www.sciencedirect.com/science/article/pii/S0304405X13002675)
        - 石川 (2019): [BAB vs. BABAB](https://zhuanlan.zhihu.com/p/58479814)
    - Asness, Frazzini, and H. Pedersen (2018): [Quality Minus Junk](https://link.springer.com/article/10.1007/s11142-018-9470-2)
    - Gupta and Kelly (2019): [Factor Momentum Everywhere](https://www.aqr.com/Insights/Research/Working-Paper/Factor-Momentum-Everywhere)
- Glossary
    - [名詞解釋：何謂13F報告](https://news.cnyes.com/news/id/4568967)


## Example: Factor ETFs
- [Vanguard U.S. Momentum Factor ETF](https://www.etf.com/VFMO) (VFMO)
- [Vanguard U.S. Multifactor ETF](https://www.etf.com/VFMF) (VFMF)
- [Top 25 Quality Factor ETFs](https://etfdb.com/themes/quality-factor-etfs/)

### Smart Beta?
- [富邦道瓊臺灣優質高息30ETF基金](https://www.moneydj.com/ETF/X/Basic/Basic0004.xdjhtm?etfid=00730.TW) (00730.TW)
    - [富邦臺灣優質高息ETF（00730）完整介紹：平均殖利率4.85%！能取代元大高股息嗎？](https://earning.tw/what-is-00730-etf), 2018.02.21
* References
    - Jason Hsu and Vitali Kalesnik (2014): [Finding Smart Beta in the Factor Zoo](https://www.researchaffiliates.com/en_us/publications/articles/223_finding_smart_beta_in_the_factor_zoo.html)
    - Saud AlMahdi (2015): [Smart beta portfolio optimization](https://file.scirp.org/pdf/JMF_2015052615053472.pdf)

## MSCI Barra Multi-Factor Analysis
- MSCI refers to **M**organ **S**tanley **C**apital **I**nternational.
    - [MSCI 是什麼 ? 最新增減個股一覽](https://www.stockfeel.com.tw/msci-%e5%8f%b0%e7%81%a3-%e6%88%90%e5%88%86%e8%82%a1-%e6%ac%8a%e9%87%8d/)
    - [MSCI Taiwan Index](https://histock.tw/stock/mscitaiwan.aspx)
- Barra Risk Factor Analysis, proposed by [Barr Rosenberg](http://www.barrrosenberg.com/) in 1975, incorporates over 40 data metrics, including earnings growth, share turnover and senior debt rating.
- References
    - [MSCI Barra Factor Indexes Methodology](https://www.msci.com/eqb/methodology/meth_docs/MSCI_Barra_Factor_Indexes_Methodology_Mar18.pdf), 2018

## Factor Zoo & Factor War
- Kakushadze (2015): [WorldQuant 101 Alphas](https://arxiv.org/ftp/arxiv/papers/1601/1601.00991.pdf)
- Feng, Giglio, and Xiu (2018): [Taming the Factor Zoo](https://www.aqr.com/About-Us/AQR-Insight-Award/2018/Taming-the-Factor-Zoo)
    - Feng, Giglio, and Xiu (2020): [Taming the Factor Zoo: A Test of New Factors](https://onlinelibrary.wiley.com/doi/abs/10.1111/jofi.12883)
- Kewei Hou, Chen Xue, and Lu Zhang (2020): [Replicating Anomalies](http://global-q.org/uploads/1/2/2/6/122679606/houxuezhang2020rfs.pdf)
    - 石川 (2019): [从 Factor Zoo 到 Factor War，实证资产定价走向何方？](https://zhuanlan.zhihu.com/p/72957469)
- 石川 (2019): [股票多因子模型的回归检验](https://zhuanlan.zhihu.com/p/40984029)
    - Hansen (1982): [Large Sample Properties of Generalized Method of Moments Estimators](https://www.jstor.org/stable/1912775)
    - Shanken (1992): [On the Estimation of Beta-Pricing Models](https://www.jstor.org/stable/2962011)
    - Petersen (2009): [Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches](https://www.jstor.org/stable/40056916)
- References
    - Mateus, Mateus, and Todorovic (2019): [Review of new trends in the literature on factor models and mutual fund performance](https://www.sciencedirect.com/science/article/abs/pii/S1057521918305222)
    