# CS485 - Data Science and Applications

  **Homework 2**

  Feel free to ignore the already implemented code and use your own.

## **Multiple Linear Regression Using Gradient Descent**
The goal of this exercise is to train a linear regression model using **many features simultaneously** and implement **gradient descent** to minimize the Mean Squared Error (MSE).

---

## **Step 1: Load the Data**
1. Extract all **feature columns** and the **target column**.
2. Print the shape of `X` (features) and `y` (target).

In [None]:
import numpy as np

url = 'https://raw.githubusercontent.com/lpoly/public_data/main/car_data.csv'

# Load data from CSV
##################
# YOUR CODE HERE #
##################

Mean Squared Logarithmic Error (MSLE) measures the **log difference** between predicted and actual values.
$$
\text{MSLE} = \frac{1}{n} \sum_{i=1}^{n} \left( \log(1 + y_i) - \log(1 + \hat{y}_i) \right)^2
$$

Where:
- $ y_i $ is the actual target value.
- $ \hat{y}_i = wX + b $ is the predicted value.
- The **log transformation** reduces the impact of large errors for high values.


## **Step 2: Define the MSLE Loss Function**

1. Implement the **Mean Squared Logarithmic Error (MSLE)** function.This function should take `w` (weights), `b` (bias), `X` (features), and `y` (target)

2. Initialize the weights and bias.
3. Test the MSLE with the initial values.


In [None]:
# Define MSLE function
##################
# YOUR CODE HERE #
##################

# Test the function
##################
# YOUR CODE HERE #
##################

## Step 3: Analytically derivate the Loss Function
1. Derive the gradient formulas for MSLE.
2. Implement a Python function `gradient_msle` that computes the **partial derivatives** with respect to `w` and `b`.
3. Test the function with initial values.


*Write some Markdown code here to get the derivative.*

In [None]:
# Define the gradient of MSLE
##################
# YOUR CODE HERE #
##################

# Test the function
##################
# YOUR CODE HERE #
##################t)

## **Step 4: Standardization**
1. Compute the **mean** and **standard deviation** of each feature.
2. Standardize the feature matrix \(X\) using:
   $$
   X_{\text{standardized}} = \frac{X - \mu}{s}
   $$
3. Print the **mean** and **standard deviation** after transformation to verify standardization.


In [None]:
# Compute mean and standard deviation of each feature
##################
# YOUR CODE HERE #
##################

# Standardize the features
##################
# YOUR CODE HERE #
##################

# Verify standardization (mean \approx 0, std \approx 1)
##################
# YOUR CODE HERE #
##################

## **Step 5: Implement Gradient Descent**
1. Implement a `gradient_descent` or a `momentum_gradient_descent` function to **iteratively update weights and bias** using MSLE.
2. Run it for **500 iterations** with $\tau$ = 0.5.
3. Store the loss at each step for visualization.

In [None]:
# Define Gradient Descent function
##################
# YOUR CODE HERE #
##################

# Perform gradient descent on all features using MSLE
##################
# YOUR CODE HERE #
##################

# Display final weights and bias
##################
# YOUR CODE HERE #
##################

## **Step 6: Plot the Loss Curve**
1. Plot the loss (MSLE) over **iterations**.
2. Observe whether the model is converging.

In [None]:
import matplotlib.pyplot as plt

# Plot loss over iterations
##################
# YOUR CODE HERE #
##################

## **Step 7: Unstandardization of the Model**
1. Since we trained on **standardized features**, we must **convert** the model parameters back to the **original scale**.
2. The original linear regression equation is:
   $$
   y = w_{\text{standardized}} X_{\text{standardized}} + b_{\text{standardized}}
   $$
   We need to convert it back to:
   $$
   y = w_{\text{original}} X_{\text{original}} + b_{\text{original}}
   $$
3. Use the following transformations:
   - **Weights:**
     $$
     w_{\text{original}} = \frac{w_{\text{standardized}}}{\sigma}
     $$
   - **Bias:**
     $$
     b_{\text{original}} = b_{\text{standardized}} - \sum \left( \frac{w_{\text{standardized}} \cdot \mu}{\sigma} \right)
     $$
4. Print the **converted weights and bias** and compare with the standardized values.
5. Discuss: When should we **keep using standardized features** vs. when to **convert back to the original scale**?


In [None]:
# Convert weights and bias back to original scale
##################
# YOUR CODE HERE #
##################

# Display the final model parameters
##################
# YOUR CODE HERE #
##################

## **Step 8: Solve Linear Regression Using the Least Squares Solution**
1. Solve the linear regression problem using the **closed-form Least Squares solution**:
   $$
   (X^T X)w_{\text{LS}} = X^T y
   $$
   $$
   b_{\text{LS}} = \text{mean}(y) - w_{\text{LS}} \cdot \text{mean}(X)
   $$
2. Compare the weights and bias with those obtained from **gradient descent**.
3. Why do you think gradient descent yields different weights ?

In [None]:
# Solve for w using the Least Squares solution
##################
# YOUR CODE HERE #
##################

# Extract bias and weights separately
##################
# YOUR CODE HERE #
##################

# Display the least squares solution
##################
# YOUR CODE HERE #
##################

# Compare with Gradient Descent Results
##################
# YOUR CODE HERE #
##################

## **Step 9: Compare with Built-in Linear Regression**
1. Use **scikit-learn’s** built-in [LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) to fit the model on **standardized features**.
2. Extract the learned **weights and bias**.
3. Compare the results with:
   - **Gradient Descent**
   - **Least Squares Solution**

In [None]:
from sklearn.linear_model import LinearRegression

# Train a linear regression model using scikit-learn
##################
# YOUR CODE HERE #
##################

# Extract weights and bias
##################
# YOUR CODE HERE #
##################

# Display results
##################
# YOUR CODE HERE #
##################

# Compare with Gradient Descent & Least Squares
##################
# YOUR CODE HERE #
##################

## **Step 10: Retrain**
1. Retrain the model by minimizing MSE instead of the MSLE.
2. Run it for **500 iterations** with $\tau$ = 0.1.
3. What do you observe ? Why do you think that happens ?


In [None]:
# Define function and derivative
##################
# YOUR CODE HERE #
##################

# Train the model
##################
# YOUR CODE HERE #
##################

# Display final weights and bias
##################
# YOUR CODE HERE #
##################

*Write an explanation on question 3: What do you observe ? Why do you think that happens ?*