## Ridge regression

- Consider linear model

. . .

$$y = \beta_0 + \beta_1 x_1 + \cdots \beta_n x_n + \varepsilon$$

- Usual (OLS) estimate of $\beta=(\beta_0, \ldots, \beta_n)$ minimizes SSE = sum of squared errors

- Ridge regression minimizes SSE plus a penalty for large $\hat \beta$'s

$$\text{SSE} + \alpha \sum_{i=0}^n \hat \beta_i^2$$

## Why the name "ridge"?

- Let  m = number of observations, $y$ = m-vector of dependent variable observations, $X$ = m $\times$ (n+1) matrix of independent variable observations, with first column being a column of 1's, $\beta$ = (n+1)-vector of coefficients.
- SSE = $(y-X\hat \beta)^\\top (y-X \hat\beta) = y^\\top y - 2 y^\\top X \hat\beta + \hat\beta^top X^\\top X \hat\beta$


## New imports

. . .

```{.p}
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Ridge
```

. . .

Create pipeline as before but use Ridge instead of LinearRegression

##


In [None]:
from sqlalchemy import create_engine
import pymssql
import pandas as pd

server = "mssql-82792-0.cloudclusters.net:16272"
username = "user"
password = "RiceOwls1912" 
database = "ghz"
string = "mssql+pymssql://" + username + ":" + password + "@" + server + "/" + database
conn = create_engine(string).connect()

df = pd.read_sql(
    """
    select date, ticker, ret, roeq, mom12m
    from data
    where date>='2010-01' and date<='2017-12'
    order by date, ticker
    """, 
    conn
)
df = df.dropna()

In [None]:
from pandas_datareader import DataReader as pdr

mkt = pdr(
  "F-F_Research_Data_Factors",
  "famafrench",
  start="2009-12",
  end="2017-12"
)

mkt = mkt[0] / 100
mkt.index = mkt.index.astype(str)
mkt["mkt"] = mkt["Mkt-RF"] + mkt["RF"]
mkt["lagmkt"] = mkt.mkt.shift()
df = df.merge(mkt, left_on="date", right_index=True, how="inner")

In [None]:
from sklearn.preprocessing import QuantileTransformer
from sklearn.compose import TransformedTargetRegressor
from sklearn.linear_model import LinearRegression

model = TransformedTargetRegressor(
    transformer=QuantileTransformer(
      output_distribution="normal"
    ),
    regressor=LinearRegression()
)

In [None]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

pipe = make_pipeline(
  QuantileTransformer(output_distribution="normal"),
  PolynomialFeatures(degree=2),
  QuantileTransformer(output_distribution="normal"),
  model
)

X = df[["roeq", "mom12m", "lagmkt"]]
y = df.ret - df.mkt
_ = pipe.fit(X, y)