# Ridge Regression
* Ridge Regression shares the same assumptions as OLS, but Ridge Regression helps address multicollinearity and reduce overfitting. Ridge is particularly useful when you have a number of features and you want to keep all of them (as opposed to having weights go to 0 in Lasso)
    * Linearity: There should be a linear relationship between the features and the target
    * Independence of Errors: Errors (residuals) should be independent of each other. Patterns in a residual plot may suggest a lack of independence
    * Homoscedasticity: Variance of errors should be constant. An example of heteroscedsticity (bad) is if you have a conal shape in your residual plot
    * Normality of Errors: The errors should be normally distributed. Check the Q-Q plot - if the residuals are normally distributed, then the points should fall approximately along a straight line
    * No perfect collinearity: Features are not perfectly correlated (perfect collinearity)

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import RidgeCV
from sklearn.metrics import mean_squared_error

In [2]:
X, y = make_regression(n_samples=10000, n_features = 500, noise = 10, random_state = 42)
feature_names = ["feature " + str(i) for i in range(1, len(X[0,:])+1)]
X = pd.DataFrame(dict(zip(feature_names, np.transpose(X))))

In [3]:
# We will split the data here
# If you have access to a separate test set, treat this as validation to make sure your data satisfies the assumptions
# of linear regression. Retrain on the entire dataset and then make predictions on the test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

In [4]:
# Use scaled data for regularization so that all features are weighed equally

# Import Standard Scaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

# Fit the scaler on the training data
scaler.fit(X_train)

# Transform X_train and X_test
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [5]:
# Ridge Regression on the scaled data
model = RidgeCV(alphas = 10**np.linspace(-2, 2, 100), cv = 5)
model.fit(X_train_scaled, y_train)
print(f"Best Ridge alpha: {model.alpha_}")
print(f"Training R-Squared: {model.score(X_train_scaled, y_train):.4f}")
print(f"Testing R-Squared: {model.score(X_test_scaled, y_test):.4f}")
print(f"Mean Squared Error: {mean_squared_error(y_test, model.predict(X_test_scaled)):.4f}")

Best Ridge alpha: 2.4201282647943834
Training R-Squared: 0.9981
Testing R-Squared: 0.9978
Mean Squared Error: 104.0683
