> https://chat.openai.com/share/1a8d2884-7cf4-4f26-a3b6-87fdb254d492

**Objective:** The goal of this lab is to reinforce your understanding of gradient descent by implementing it from scratch to solve a simple linear regression problem. You will be using Python, Jupyter notebooks, and PyTorch for this assignment.

**Problem Description:** You are given a dataset with two variables, x and y, where y is a linear function of x with some added noise. Your task is to find the line of best fit for this data using gradient descent.

**Instructions:**

1. **Data Generation:** Generate a synthetic dataset of 100 points for this task. You can use the torch.rand function to generate x and then define y as y = 2x + 3 + noise, where noise is a random value added to each point to simulate real-world data. The coefficients 2 and 3 are the true weight and bias, respectively, that you will try to learn with gradient descent.

2. **Model Definition:** Define a simple linear regression model y = wx + b, where w is the weight and b is the bias. Initialize w and b to any values of your choice.

3. **Loss Function:** Define the mean squared error loss function, which is the function you will minimize using gradient descent.

4. **Gradient Descent:** Implement the gradient descent algorithm. At each step, compute the gradients of the loss with respect to w and b, and then update w and b in the direction that reduces the loss. Repeat this process for a fixed number of iterations, or until w and b converge to the true values within a certain tolerance.

5. **Evaluation:** Plot the original data along with the line of best fit found by your model. Also, plot the loss over time to see how it decreases as the model learns.

**Questions to Consider:**

- How do different initial values of w and b affect the number of iterations needed for convergence?
- How does the learning rate affect the speed of convergence and the final result?
- What happens if the learning rate is set too high or too low?
- How does the model perform if you increase the amount of noise in the data?
- Remember, you can ask for help at any time if you're unsure about how to proceed. Good luck!

In [20]:
import torch
from torch.testing import assert_close

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

xs = torch.randn(100, 1) * 10
noise = torch.randn(100, 1) * 3
ys = torch.nn.functional.linear(xs, torch.tensor([[2.0]]), torch.tensor([3.0]) + noise)


assert_close(ys, 2 * xs + 3 + noise)
kak

In [43]:
initial_weight = 1.0
initial_bias = 1.0

def mean_squared_loss(y_pred, y_true):
    return ((y_pred - y_true) ** 2).mean()

def compute_gradient(loss, params):
    # Compute the gradient of loss with respect to params
    # For y=wx+b (using mean-squared loss function) the gradient is:
    # dw = 2*x*(wx+b-y)
    return 2 * xs * (params[0] * xs + params[1] - ys)

def linear_regression(xs, ys, epochs, learning_rate):
    w = torch.tensor([[initial_weight]], dtype=torch.float32, requires_grad=False)
    b = torch.tensor([initial_bias], dtype=torch.float32, requires_grad=False)
    print(xs.shape)
    print(w.shape)
    print(b.shape)
    for epoch in range(epochs):
        y_pred = 2 * xs + 3 + noise
        loss = mean_squared_loss(y_pred, ys)
        loss.grad = compute_gradient(loss, [w, b])
        with torch.no_grad():
            w -= learning_rate * w.grad
            b -= learning_rate * b.grad
            w.grad.zero_()
            b.grad.zero_()
    return w, b
linear_regression(xs, ys, epochs=100, learning_rate=0.001)

torch.Size([100, 1])
torch.Size([1, 1])
torch.Size([1])


RuntimeError: assigned grad has data of a different size