# **Homework Assignment: Analyzing and Plotting Bias in Penalized Regression**

-------------------------------

In this assignment, you will explore how **Ridge** and **Lasso** regression introduce **bias** into a model to reduce **variance**, and how the choice of the regularization parameter $\lambda$ affects this trade-off. The goal is to visualize and analyze the **bias-variance trade-off** and understand the conditions under which penalization helps or hinders model performance.

## **The Question**

**How does varying the regularization parameter $\lambda$ in Ridge and Lasso regression impact the trade-off between bias and variance?**

- Generate a synthetic dataset based on a **known** linear relationship:
  
  $$
  y = \beta_0 + \beta_1 x + \ldots + \epsilon
  $$

  where $$\epsilon \sim \mathcal{N}(0, \sigma^2),$$

  use a **high-dimensional** setting (e.g., 50 predictors) with only a few non-zero true coefficients to emphasize the effects of regularization. I stress, the $\beta_i$ coefficients should be known for this experiment and they should be mostly 0, with only a few non-zero parameters.

- Investigate how increasing $\lambda$ influences the model’s **bias**, **variance**, and **Mean Squared Error (MSE)**.
- Plot **Bias²**, **Variance**, and **MSE** on a single graph for both Ridge and Lasso models.
- Explain MSE decomposition into bias and variance. Read more on the MSE decomposition if you need to.

**Does the regularization lead to an optimal trade-off point where MSE is minimized? Explain why this point exists.**




**Expected Outcome:**
 - As $\lambda$ increases:
   - **Bias** increases (the model becomes too simple).
   - **Variance** decreases (the model becomes more stable).
   - **MSE** forms a **U-shape**, revealing the optimal trade-off.

- Analyze how **Ridge** and **Lasso** differ in terms of their bias-variance trade-offs.
- Discuss situations where one method may outperform the other, considering factors like **feature sparsity** and **multicollinearity**.


## **Colab Notebook Requirements**
- Your **Colab notebook** should:
  - Simulate the synthetic dataset and apply Ridge and Lasso regression.
  - Plot **Bias²**, **Variance**, and **MSE** against $\lambda$ for both models.
  - Include a section answering the questions.
  - Be well-documented with comments and explanations for each step.


## **Publish on GitHub**
- Upload your Colab notebook to your **GitHub repository** for this course.
- In your repository’s **README**, include a **link** to the notebook.
- In the notebook include **“Open in Colab”** badge so the notebook can be launched directly from GitHub.

## Solution ##

Firstly, we are going to create synthetic dataset.

In [19]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

n_samples =  200
n_features = 50 

true_coef = np.zeros(n_features)
non_zero_indices = [0, 1, 4, 16, 22, 36]  
beta_values = [5, 12, -3, 5, 9, 12]
true_coef[non_zero_indices] = beta_values


X_test = np.random.uniform(-1, 1, (n_samples, n_features - 1))
X_test = np.hstack((np.ones((n_samples, 1)), X_test)) # adding bias

y_test = X_test@true_coef #think about adding noise or not 

In [20]:
def true_function(X):
    return X @ true_coefs

def generate_train_data(n_features, n_samples, betas, sigma2):
    X_train = np.random.uniform(-1, 1, (n_samples, n_features - 1))
    X_train = np.hstack((np.ones((n_samples, 1)), X_train)) # adding bias

    errors = sigma2*np.random.randn(n_samples)
    y_train = X_train@beta + errors
    return X_train, y_train

Creating models.

In [18]:
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

alphas = np.logspace(-2, 2, 50)
results = {'ridge': {'bias2': [], 'variance': [], 'mse': []},
           'lasso': {'bias2': [], 'variance': [], 'mse': []}}
n_simulations = 100