# Section 1: Linear Regression Application 1: CAPM
Capital Asset Pricing Model (CAPM) is one of the most important pricing model in finance, identified as:
$$
E(R_i)=r_f+\beta \times E(R_m - r_f)
$$
where:
- **\( $E(R_i)$ \)**: Expected return of asset i.
- **\( $r_f$ \)**: Risk-free rate.
- **\( $R_m$\)**: Return of market portfolio.
- **\( $\beta$ \)**: Measure the sensitivity of the change in $E(R_i)$ when $E(R_m)$ changes.
Based on the linear regression theory, the intercept in the regression is uncertain, which is violating the CAPM. To meet the model, we can reshape CAPM:
$$
E(R_i) - r_f = \alpha +\beta \times E(R_m - r_f) + \epsilon_i
$$
where: 
- **\( $\alpha)$ \)**: Jensen's Alpha (or Alpha). Asset i is underpriced (overpriced) if Alpha is significant and > 0 (< 0). 
- **\( $\epsilon_i$ \)**: Disturbance term in the linear regression. 


In this section, we will regress our first financial model based on CAPM.

In [1]:
import pandas as pd
import numpy as np 
import yfinance as yf 
import statsmodels.api as sm 


In [2]:
#step 1: Select a security and download the data via yfinance package

#I selected walmart
tickers = ["WMT", "SPY"]

#I consider 10-year historical data
start_date = "2014-01-01"
end_date = "2025-01-01"

#I consider monthly data

freq = "1mo"

#I finally download the data from yfinance, including closing price only

p_close = yf.download(tickers, start_date, end_date, interval= freq)["Close"]

p_close


[*********************100%***********************]  2 of 2 completed


Ticker,SPY,WMT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2014-01-01 00:00:00+00:00,178.179993,24.893333
2014-02-01 00:00:00+00:00,186.289993,24.900000
2014-03-01 00:00:00+00:00,187.009995,25.476667
2014-04-01 00:00:00+00:00,188.309998,26.570000
2014-05-01 00:00:00+00:00,192.679993,25.590000
...,...,...
2024-08-01 00:00:00+00:00,563.679993,77.230003
2024-09-01 00:00:00+00:00,573.760010,80.750000
2024-10-01 00:00:00+00:00,568.640015,81.949997
2024-11-01 00:00:00+00:00,602.549988,92.500000


In [3]:
#step 2: Calculate the monthly return of walmart.

r = np.log(p_close) - np.log(p_close.shift(1))

returns = r.dropna()

returns

Ticker,SPY,WMT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2014-02-01 00:00:00+00:00,0.044510,0.000268
2014-03-01 00:00:00+00:00,0.003857,0.022895
2014-04-01 00:00:00+00:00,0.006927,0.042020
2014-05-01 00:00:00+00:00,0.022941,-0.037581
2014-06-01 00:00:00+00:00,0.015654,-0.022393
...,...,...
2024-08-01 00:00:00+00:00,0.023097,0.117913
2024-09-01 00:00:00+00:00,0.017725,0.044570
2024-10-01 00:00:00+00:00,-0.008964,0.014751
2024-11-01 00:00:00+00:00,0.057923,0.121099


In [4]:
#step 3: determine the risk-free rate 

# to save your time, I assume the annual risk-free rate as 3% for the entire class and semester. 

#But we need to convert these risk-free rate to monthly perspective. 

r_f = (1+0.03)**(1/12) - 1

r_f


0.0024662697723036864

In [5]:
#step 4: determine the excess return of walmart (Y) and the market risk premium

Y = returns["WMT"] - r_f

X = returns["SPY"] - r_f




In [6]:
#step 5: present linear regression

X = sm.add_constant(X)

capm = sm.OLS(Y, X).fit()

print(capm.summary())

                            OLS Regression Results                            
Dep. Variable:                    WMT   R-squared:                       0.150
Model:                            OLS   Adj. R-squared:                  0.144
Method:                 Least Squares   F-statistic:                     22.82
Date:                Tue, 31 Dec 2024   Prob (F-statistic):           4.77e-06
Time:                        19:01:33   Log-Likelihood:                 209.66
No. Observations:                 131   AIC:                            -415.3
Df Residuals:                     129   BIC:                            -409.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0042      0.004      0.969      0.3

In [7]:
#Step 6: Interpret the regression result.
#Based on the regression, I observe that the beta (slope of regression) of walmart is 0.4755, and I round it up to 0.48. The alpha (the constant term)
#of walmart is 0.0043. Based on this information, what's your comment on Walmart's stock?







In [8]:
#In-Class Exersize 1 (10 minutes): Select a stock and replicate the entire regression process. Based on your estimation results, introduce this stock and tell me about tyour thoughts.













# Section 2: Linear Regression Application 2: Multifactor Model and Arbitrage Pricing Theory (APT)
CAPM assume that the return of stock is influenced by market fluctuation only. However, the return of stock can not only be influenced by market, and it has more information that can add into our regression model. For example, some macro indicators such as interest rate, GDP growth rate, or firm's specific information such as size (total asset) and revenue, etc. Multifactor model is widely used in hedging, stock pitch and statistical arbitrage. 
The multifactor model is shown below:
$$
E(R_i)=r_f+\beta_1 \times \lambda_1 + \beta_2 \times \lambda_2+...+\beta_n \times \lambda_n
$$
where:
- **\( $E(R_i)$ \)**: Expected return of asset i.
- **\( $r_f$ \)**: Risk-free rate.
- **\( $\lambda_n$\)**: Risk premium of factor n (or factor premium), calculated by ($R_n-r_f$) .
- **\( $\beta_n$ \)**: Measure the sensitivity of the change in $E(R_i)$ when factor n changes.

Again, based on the linear regression theory, the intercept in the regression is still uncertain. To meet the model requirment, we can reshape multi-factor model to:
$$
E(R_i) - r_f = \alpha +\beta_1 \times \lambda_1 + \beta_2 \times \lambda_2+...+\beta_n \times \lambda_n+\epsilon_i
$$
where: 
- **\( $\alpha)$ \)**: Jensen's Alpha (or Alpha). Asset i is underpriced (overpriced) if Alpha is significant and > 0 (< 0). 
- **\( $\epsilon_i$ \)**: Disturbance term in the multiple linear regression. 


In this section, we will regress our first financial model based on multifactor model.

## Section 2.1: Fama-French 3-factor and 5-factor models
Eugene Fama and Kenneth French developed their 3-factor model in 1992 that expands the CAPM by adding size risk and value risk factors to the market risk factor in CAPM. Fama shared the Nobel Prize in Economic Science because of his research for "Efficient Market Hypothesis". The 3-factor model is:

$$
R_{i,t}-r_{f,t}=\alpha_{i,t}+\beta_1(R_{m,t}-r_{f,t})+\beta_2SMB_t+\beta_3HML_t+\epsilon_{i,t}
$$
where:
- **\( $R_{i,t}$ \)**: Expected return of asset i at time t.
- **\( $r_{f,t}$ \)**: Risk-free rate at time t.
- **\( $R_{m,t}$\)**: Return of market risk premium at time t.
- **\( $SMB_t$\)**: Size premium (small minus big) at time t.
- **\( $HML_t$ \)**: Value premium (high minus low) at time t.
- **\( $\beta_{1,2,3}$ \)**: factor coefficients.

The 5-factor model is:

$$
R_{i,t}-r_{f,t}=\alpha_{i,t}+\beta_1(R_{m,t}-r_{f,t})+\beta_2SMB_t+\beta_3HML_t+\beta_4RMW_t+\beta_5CMA_t+\epsilon_{i,t}
$$
where:
- **\( $RMW_t$\)**: Difference between the returns with robust and weak profitability (robust minus weak) at time t.
- **\( $CMA_t$ \)**: Difference between the returns on conservative and aggressive investment strategy (conservative minus aggressive) at time t.

Let's apply the these two model to re-regress walmart's return again.


In [10]:
factor_3 =pd.read_csv("3_factor.CSV")

factor_3

Unnamed: 0,date,Mkt-RF,SMB,HML,RF
0,192607,2.96,-2.56,-2.43,0.22
1,192608,2.64,-1.17,3.82,0.25
2,192609,0.36,-1.40,0.13,0.23
3,192610,-3.24,-0.09,0.70,0.32
4,192611,2.53,-0.10,-0.51,0.31
...,...,...,...,...,...
1175,202406,2.77,-3.06,-3.31,0.41
1176,202407,1.24,6.80,5.74,0.45
1177,202408,1.61,-3.55,-1.13,0.48
1178,202409,1.74,-0.17,-2.59,0.40
