# Reconciling alpha in statsmodels and sklearn ridge regression

This analysis by Paul Zivich (https://sph.unc.edu/adv_profile/paul-zivich/) explains how to get the same results of ridge regression from statsmodels and sklearn. The difference is that sklearn's Ridge function scales the input of the 'alpha' regularization term during excecution as alpha / n_samples where n_samples is the number of samples, compared with statsmodels which does not apply this scaling of the regularization parameter during execution. You can have the ridge implementations match if you re-scale the sklearn input alpha = alpha / n_samples for statsmodels. Note that this rescaling of alpha only applies to ridge regression. The sklearn and statsmodels results for Lasso regression using exactly the same alpha values for input without rescaling.

Here is a link to the original post of this analysis by Paul Zivich on stackoverflow.com:  

https://stackoverflow.com/questions/72260808/mismatch-between-statsmodels-and-sklearn-ridge-regression

While comparing statsmodels and sklearn, Paul found that the two libraries result in different output for ridge regression. Below is an simple example of the difference

In [12]:
import numpy as np
import pandas as pd 
import statsmodels.api as sm
from sklearn.linear_model import Lasso, Ridge

np.random.seed(142131)

n = 500
d = pd.DataFrame()
d['A'] = np.random.normal(size=n)
d['B'] = d['A'] + np.random.normal(scale=0.25, size=n)
d['C'] = np.random.normal(size=n)
d['D'] = np.random.normal(size=n)
d['intercept'] = 1
d['Y'] = 5 - 2*d['A'] + 1*d['D'] + np.random.normal(size=n)

y = np.asarray(d['Y'])
X = np.asarray(d[['intercept', 'A', 'B', 'C', 'D']])

First, using sklearn and ridge:

In [13]:
alpha_sklearn = 1
ridge = Ridge(alpha=alpha_sklearn, fit_intercept=True)
ridge.fit(X=np.asarray(d[['A', 'B', 'C', 'D']]), y=y)
print('ridge params from sklearn intercept and coefs: \n',ridge.intercept_, ridge.coef_)

ridge params from sklearn intercept and coefs: 
 4.997208595888691 [-2.00968258  0.03363013 -0.02144874  1.02895154]


Next, statsmodels and OLS.fit_regularized:

In [14]:
alpha_statsmodels = np.array([0, 1., 1., 1., 1.])
ols = sm.OLS(y, X).fit_regularized(L1_wt=0., alpha=alpha_statsmodels)
print('ridge params from statsmodels: \n',ols.params)

ridge params from statsmodels: 
 [ 5.01623298e+00 -6.91643749e-01 -6.39008772e-01  1.55825435e-03
  5.51575433e-01]


which outputs [5.01623, -0.69164, -0.63901, 0.00156, 0.55158]. However, since these both are implementing ridge regression, Paul expected them to be the same.

Note, that neither of these penalize the intercept term (Paul checked that as a possible potential difference). Paul found that statsmodels and sklearn provide the same output for LASSO regression. Below is a demonstration with the previous data:

In [15]:
# sklearn LASSO
alpha_sklearn = 0.5
lasso = Lasso(alpha=alpha_sklearn, fit_intercept=True)
lasso.fit(X=np.asarray(d[['A', 'B', 'C', 'D']]), y=y)
print('lasso params from sklearn intercept and coefs: \n',lasso.intercept_, lasso.coef_)

# statsmodels LASSO
alpha_statsmodels = np.array([0, 0.5, 0.5, 0.5, 0.5])
ols = sm.OLS(y, X).fit_regularized(L1_wt=1., alpha=alpha_statsmodels)
print('lasso params from statsmodels: \n',ols.params)

lasso params from sklearn intercept and coefs: 
 5.014649977131442 [-1.5183174  -0.          0.          0.57799164]
lasso params from statsmodels: 
 [ 5.01464998 -1.51831729  0.          0.          0.57799166]


which both output [5.01465, -1.51832, 0., 0., 0.57799].

So Paul's question is why do the estimated coefficients for ridge regression differ across implementations in sklearn and statsmodels?

After digging around a little more, Paul discovered the answer by trial and error as to why the statsmodels and sklearn ridge regression results differ. The difference is that sklearn's Ridge scales the regularization term as alpha_scaled = alpha_input / n where n is the number of observations and alpha_input is the input argument values of alpha used with sklearn. statsmodels does not apply this scaling of the regularization parameter. You can have the statsmodels and sklearn ridge implementations match if you re-scale the regularizaiton parameter used for input to sklearn when you prepare the input required for statsmodels.

In other words, if you use the following input values of alpha for sklearn:

alpha_sklearn = 1

then you would need to use the following input of alpha=alpha_scaled when using statsmodels to get the same result:

alpha_statsmodels = alpha_sklearn / n_samples

where n_samples is the number of samples (n_samples = X.shape[0]).

Using Paul's posted example, here is how you would have the output of ridge regression parameters match between the statsmodels and sklearn:

In [16]:
# sklearn 
# NOTE: there is no difference from above
alpha_sklearn = 1
ridge = Ridge(alpha=alpha_sklearn, fit_intercept=True)
ridge.fit(X=np.asarray(d[['A', 'B', 'C', 'D']]), y=y)
print('ridge params from sklearn intercept and coefs: \n',ridge.intercept_, ridge.coef_)

# statsmodels
# NOTE: going to re-scale the regularization parameter based on n observations
n_samples = X.shape[0]
alpha_statsmodels = np.array([0, 1., 1., 1., 1.]) / n_samples  # scaling of alpha by n
ols = sm.OLS(y, X).fit_regularized(L1_wt=0., alpha=alpha_statsmodels)
print('ridge params from statsmodels with alpha=alpha/n: \n',ols.params)

ridge params from sklearn intercept and coefs: 
 4.997208595888691 [-2.00968258  0.03363013 -0.02144874  1.02895154]
ridge params from statsmodels with alpha=alpha/n: 
 [ 4.9972086  -2.00968258  0.03363013 -0.02144874  1.02895154]


Now both output [ 4.99721, -2.00968, 0.03363, -0.02145, 1.02895].

Paul posted this analysis in the hopes that if someone else is in the same situation trying to match resuts of ridge regression using statsmodels and sklearn, they can find the answer more easily (since Paul had not seen any discussion of this difference before). It is also noteworthy that sklearn's Ridge re-scales the tuning parameter but sklearn's Lasso does not. Paul was not able to find an explanation of this behaviour in the sklearn documentation for Ridge and LASSO.