# **Runtime Dependencies: Must Run First!**

In [None]:
import pandas as pd
from pandas.tseries.offsets import MonthEnd
from datetime import datetime
from matplotlib import pyplot as plt

# Statsmodels API - Standard
import statsmodels.api as sm

# Statsmodels API - Formulaic
import statsmodels.formula.api as smf

# ### Bonus: Multiple Outputs Per Cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# **Module 9 - Topic 1: Linear Regression with Statsmodels, Part 2**


## **Module 9.1.5: Multivariate Linear Regression**

In the last notebook, I introduced the CAPM theory and regression with the market to calculate the alpha (y intercept) and beta (slope coefficient) of a stock when plotting excess stock returns vs excess market returns.

In this notebook, I'm going to cover running a Fama French 3 Factor regression with extra x's!

I'm starting below with importing the data sets from the first notebook:

In [None]:
loc = "https://github.com/mhall-simon/python/blob/main/data/misc/stocks-factors-capm.xlsx?raw=true"
aapl = pd.read_excel(loc, sheet_name="AAPL", index_col=0, parse_dates=True)
aapl['Return'] = (aapl.Close - aapl.Close.shift(1)) / aapl.Close.shift(1)
aapl.dropna(inplace=True)
aapl.index = aapl.index + MonthEnd(1)
R = pd.DataFrame(aapl.Return)
R = R.rename(columns={"Return":"AAPL"})
amzn = pd.read_excel(loc, sheet_name="AMZN", index_col=0, parse_dates=True)
tsla = pd.read_excel(loc, sheet_name="TSLA", index_col=0, parse_dates=True)
amzn['Return'] = (amzn.Close - amzn.Close.shift(1)) / amzn.Close.shift(1)
tsla['Return'] = (tsla.Close - tsla.Close.shift(1)) / tsla.Close.shift(1)
amzn.dropna(inplace=True)
tsla.dropna(inplace=True)
amzn.index = amzn.index + MonthEnd(1)
tsla.index = tsla.index + MonthEnd(1)
R = pd.merge(R, tsla.Return, left_index=True, right_index=True)
R = pd.merge(R, amzn.Return, left_index=True, right_index=True)
R = R.rename(columns={"Return_x":"TSLA","Return_y":"AMZN"})

dp = lambda x: datetime.strptime(x, "%Y%m")
ff = amzn = pd.read_excel(loc, sheet_name="MktRf", index_col=0, parse_dates=True, date_parser=dp, header=3)
ff.index = ff.index + MonthEnd(1)

R = pd.merge(R, ff, left_index=True, right_index=True)

R.AAPL = R.AAPL - R.RF/12
R.TSLA = R.TSLA - R.RF/12
R.AMZN = R.AMZN - R.RF/12

R = R.rename(columns={'Mkt-RF':'MRP'})

R.head()

When we run a multivariate linear regression, we will get a result that looks like this:

$$y =m_1x_1 + m_2x_2 + ... + m_nx_n$$

And here's how we can easily run the regression using statsmodels:

In [None]:
Y = R.AAPL

X = R[['MRP','SMB','HML']]
X = sm.add_constant(X)

model = sm.OLS(Y,X)
res = model.fit()

print(res.summary())

Very easy!

By including these additional factors, our R squared value increased from 0.365 to 0.495!

Another interesting factor to keep in mind is that the y intercept dropped from -0.0792 to 0.0040, becoming much closer to zero! In the 5 years of observable data, by including the additional factors, there is very little alpha for Apple!

*Tip: If you don't understand the coefficient of determination, Google search it to learn more about OLS and linear regression!*

## **Module 9.1.6: Multivariate Linear Regression, Formulaic Method**

We can also use the formulaic API for statsmodels!

This is exactly the same regression, the only difference being syntax!

In [None]:
resf = smf.ols(formula='AMZN ~ MRP + SMB + HML', data=R).fit()

print(resf.summary())

## **Statsmodels Linear Regression Summarized**

Using statsmodels, if our data is in a DataFrame, it's really easy to run a lienar regression!

Using the standard API it looks like this:

```python
Y = ___ # The Response Variable

X = ___ # Predictor(s) Variable
X = sm.add_constant(X)

model = sm.OLS(Y,X)
res = model.fit()

print(res.summary())
```

Or we can use the formulaic method:

```python
resf = smf.ols(formula='response ~ predictors', data=df).fit()

print(resf.summary())
```

And we can pull out our coefficients via the following:

```python
res.params['name']
```