# Regularization
**Regularization**: a method for controlling model complexity by discouraging extreme parameter values during training (i.e., prevent overfitting). It *reduces variance* by *incorporating bias*, through additional terms on the objective function to penalize large or unstable values. Regularization helps control overfitting with the following issues:
- When models become highly expressive compared to the amount of data, there is a high amount of variance. This restricts how extreme parameters can become by limiting the effective capacity of the model without changing its functional form.
- When features are highly correlated (multicollinearity) or the number of features approaches or exceeds the number of samples, coefficients can explode to large magnitudes and cancel each other out. This makes the coefficients numerically stable.

There are numerous techniques for regularization. Aside from the linear model regularization techniques, there is:
- **Data Augmentation**: modifying existing training data to create artificial data samples from pre-existing samples (i.e., rotating or grayscaling an image).
- **Early Stopping**: limiting the number of iterations during training.

## Linear Model Regularization
*Note*: for the techniques below, we generally want to tune the hyperparameter `alpha` or `l1_ratio`.

### L1 Regularization
**Lasso Regression**: technique that penalizes high-value, correlated coefficients. It introduces a penalty term into the loss function, which is the absolute value of the sum of the coefficients. Its strength is controlled by $\lambda$ (`alpha`).

$$ loss=\frac{1}{n}\Sigma(y_i-\hat{y_i}) + \lambda\Sigma|w_i| $$

In [1]:
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error

X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

y_pred = lasso.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

print("Coefficients:", lasso.coef_)

Mean Squared Error: 0.06362439921332558
Coefficients: [60.50305581 98.52475354 64.3929265  56.96061238 35.52928502]


### L2 Regularization
**Ridge Regression**: technique that also penalizes high-value coefficients, except it adds a penalty term that is the sum of squared coefficients. While lasso regression can enact feature selection (remove features outright), ridge regression can drive coefficients towards zero but never to zero. Its strength is controlled by $\lambda$ (`alpha`).

$$ loss=\frac{1}{n}\Sigma(y_i-\hat{y_i}) + \lambda\Sigma{w_i}^2 $$

In [2]:
from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
y_pred = ridge.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
print("Coefficients:", ridge.coef_)

Mean Squared Error: 4.114050771972588
Coefficients: [59.87954432 97.15091098 63.24364738 56.31999433 35.34591136]


### Elastic Net Regularization
**Elastic Net Regularization**: combines L1 and L2 regularization by adding both L1 and L2 penalty terms into the loss function. Thus, it addresses multicollinearity while enabling feature selection. $\alpha$ (`l1_ratio`) is introduced to control the ratio of L1 to L2.

$$ loss=\frac{1}{n}\Sigma(y_i-\hat{y_i}) + \lambda((1-\alpha)\Sigma|w_i|+\alpha\Sigma{w_i}^2) $$


In [3]:
from sklearn.linear_model import ElasticNet
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = ElasticNet(alpha=1.0, l1_ratio=0.5)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

print("Mean Squared Error:", mse)
print("Coefficients:", model.coef_)

Mean Squared Error: 7785.886176938016
Coefficients: [16.84528938 31.77080959  4.05901996 40.18486737 57.25856154 45.81463318
 58.97979422 -0.          3.82816854 41.1096051 ]
