# LINEAR REGRESSION WITH REGULARIZATION AND ONLINE UPDATES

Follow the instructions below to solve a regression problem using ridge regression with varying regularization
parameters (λ) and analyze the bias-variance trade-off. Use only the ‘numpy’ library for computations and
‘matplotlib’ for plotting. When uploading to Gradescope, you will need to produce a PDF version of your
solutions and code. One way to do this is to use a notebook (https://jupyter.org).

##  Regularized Linear Regression
In many real-world applications, predictive models often encounter challenges such as multicollinearity among features, small sample sizes, or noisy data. These challenges can lead to overfitting, where the model captures noise instead of meaningful patterns. To address this, regularized linear regression, such as ridge regression, introduces a penalty term to the loss function, which helps constrain the model complexity and improve generalization.

Consider the following scenarios where regularized linear regression plays a critical role:

- __Financial Forecasting__: Predicting stock prices or company revenues often involves highly correlated features, such as economic indicators or market trends. Ridge regression can reduce the impact of multicollinearity, ensuring stable and interpretable predictions.
- __Healthcare Analytics__: In clinical studies with limited patient data, predictive models for treatment outcomes may suffer from high variance due to noise. Regularization helps to avoid overfitting by penalizing extreme weight values.
- __Marketing Campaign Analysis__: Estimating the impact of various advertisement strategies on sales can involve sparse and noisy data. Regularization ensures that the model remains robust despite inconsistencies in the data.

In this problem, you will explore ridge regression on synthetic data to understand its behavior and implications for real-world applications. Specifically, you will:

- __Analyze the behavior of learned coefficients__: Observe how the learned coefficients vary as the regularization parameter λ changes, helping to illustrate how ridge regression stabilizes the model
- __Assess model performance__: Measure the model’s performance using Root Mean Squared Error (RMSE) on validation data to identify the optimal regularization parameter λ
- __Investigate the bias-variance trade-off__: Quantify and visualize how regularization influences bias and variance, providing insights into the trade-offs inherent in predictive modeling.


## SYNTHETIC DATA GENERATION

Generate a synthetic dataset for training, validation, and testing. The dataset consists of N = 500 data points and d = 12 features. Each data point is generated as follows:

$y = x^T + \epsilon$

where  $ x \sim N(0, 1)$, $\textbf{w}$ is a vector of length d with linearly spaced values between 1 and 5, and $\epsilon \sim N(0, 0.5^2)$

split the data into:

- __Training set (70%)__: For training the model.
- __Validation set (15%)__: For selecting the optimal λ.
-  __Test set (15%)__: For final evaluation

In [2]:
# Step 1: Generate synthetic data
import numpy as np
np.random.seed(42)
N = 500 # Total data points
d = 12 # Number of features
train_ratio = 0.7
val_ratio = 0.15
# Generate feature matrix and true weights
X = np.random.normal(0, 1, (N, d))
true_weights = np.linspace(1, 5, d) # Linearly spaced true weights
epsilon = np.random.normal(0, 0.5, N) # Noise
y = X @ true_weights + epsilon # Generate target values
# Split data into train, validation, and test sets
train_size = int(N * train_ratio)
val_size = int(N * val_ratio)
test_size = N - train_size - val_size
X_train, X_val, X_test = X[:train_size], X[train_size:train_size+val_size], X[train_size+val_size:]
y_train, y_val, y_test = y[:train_size], y[train_size:train_size+val_size], y[train_size+val_size:]

## TASKS

### Plot coefficients  vrs $\lambda$:

- Train ridge regression models for $\lambda \in \{a \cdot 10^b : a \in \{1, 2, \ldots, 9\}, b \in \{-5, -4, \ldots, 2\}\}$.
- Plot the learned coefficients ($w_i$) for all 12 features and the bias term against $\lambda$ (use a log scale on the vertical axis).

### Validate RMSE vs $\lambda$:

- Calculate the RMSE (root mean squared error) on the validation set for each λ.
- Plot the validation RMSE against λ (use a logarithmic scale on the vertical axis) and identify $\lambda^*$ the value that minimizes the RMSE on the validation dataset.

###  Predicted vs. True Values:

- Use $\lambda^*$ to train the model on the combined training and validation sets.
- Plot the predicted values against the true values for the test set. Your result should be a scatter plot with each point representing a (predicted value, true value) pair for one data point in the test set.

### Bias-Variance Trade-off:

- Generate L = 20 independent training datasets of size $N_{sub}$ = 50 by sampling with replacement from the training data.
- Train models for each dataset and calculate the bias and variance for each λ.
- Plot the variance against λ (use a logarithmic scale for the vertical axis)

## Hints

- Use gradient descent to solve ridge regression:

$$
w^* = \arg\min_w \|y - Xw\|_2^2 + \lambda \|w\|_2^2
$$

- Initialize \( w = 0 \), and use a learning rate of 0.01 with a stopping criterion of \( 10^{-6} \).
- For sampling with replacement, use `numpy.random.choice`.

In [None]:
# Step 2: Ridge regression functions
def ridge_loss(w, X, y, lam):
    """Calculate the ridge regression loss."""
    residuals = y - X @ w
    return NotImplemented
def ridge_gradient(w, X, y, lam):
    """Calculate the gradient of the ridge regression loss."""
    residuals = y - X @ w
    grad = NotImplemented
    return grad
def gradient_descent(loss_fn, grad_fn, w_init, X, y, lam, lr=0.01, tol=1e-6, max_iters=1000):
    """Perform gradient descent to minimize the ridge regression loss."""
    w = w_init
    for i in range(max_iters):
        grad = grad_fn(w, X, y, lam)
        w_new = w - lr * grad
        if np.linalg.norm(w_new - w, ord=2) < tol:
            break
        w = w_new
    return w


In [None]:
# Step 3: Variance and bias calculation
def calculate_bias_variance(X_train, y_train, X_val, y_val, lambdas, num_datasets=20, sub_sample_size=50):
    """
    Calculate the bias and variance for ridge regression models trained on multiple datasets.
    """
    biases, variances = [], []
    for lam in lambdas:
        predictions = []
        for _ in range(num_datasets):
            # Sample with replacement
            indices = np.random.choice(len(X_train), size=sub_sample_size, replace=True)
            X_sample, y_sample = X_train[indices], y_train[indices]
            # Train ridge regression
            w_init = np.zeros(d)
            w = gradient_descent(ridge_loss, ridge_gradient, w_init, X_sample, y_sample, lam)
            # Predict on validation data
            predictions.append(X_val @ w)
        # Average predictions
        predictions = np.array(predictions)
        mean_prediction = np.mean(predictions, axis=0)
        bias = np.mean((mean_prediction - y_val)**2)
        variance = np.mean(np.var(predictions, axis=0))
        biases.append(bias)
        variances.append(variance)
    return biases, variances

# Empty sections for students to complete
def plot_coefficients_vs_lambda():
    pass
def plot_rmse_vs_lambda():
    pass
def plot_predicted_vs_true():
    pass
def plot_bias_variance_tradeoff():
    pass


## Deliverables

- Python code for all tasks, including plots.
-  Saved figures:
    - coefficients vs lambda.png
    - rmse vs lambda.png
    - predicted vs true.png
    - bias variance tradeoff.png
- Analysis discussing:
    - How coefficients behave as λ increases.
    - The trade-off between RMSE and λ.
    - Observations from the bias-variance trade-off plot.

## Evaluation Criteria

- [10 points] Correct implementation of ridge regression and bias-variance analysis.
- [4 points] Visualization of coefficients with respect to λ.
- [6 points] Validation RMSE plot and identification of $\lambda^*$
- [4 points] Scatter plot for predictions versus true values.
- [6 points] Plot and meaningful analysis of bias-variance trade-off with respect to λ.
- *Make sure that all figures, plots, and diagrams referenced in your work are embedded directly in the PDF file you submit.