# Linear Regression and Gradient Descent

## Linear Regression Model

The linear regression model is defined as:

$f_{w,b}(x) = wx + b$

Where:
- $w$ is the weight (slope)
- $b$ is the bias (y-intercept)
- $x$ is the input variable

## Cost Function

The cost function $J(w,b)$ is given by:

$J(w,b) = \frac{1}{2m} \sum_{i=1}^m (f_{w,b}(x^{(i)}) - y^{(i)})^2$

Where:
- $m$ is the number of training examples
- $x^{(i)}$ is the i-th input
- $y^{(i)}$ is the i-th target output

## Gradient Descent Algorithm

The gradient descent algorithm is used to minimize the cost function:


w = w - α $\frac{\partial}{\partial w} J(w,b)$

b = b - α $\frac{\partial}{\partial b}J(w,b)$


Where:
- α is the learning rate
- ∂/∂w J(w,b) is the partial derivative of J with respect to w
- ∂/∂b J(w,b) is the partial derivative of J with respect to b

The partial derivatives are calculated as:

$\frac{\partial}{\partial w} J(w,b) = \frac{1}{m} \sum_{i=1}^m (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)}$

$\frac{\partial}{\partial b} J(w,b) = \frac{1}{m} \sum_{i=1}^m (f_{w,b}(x^{(i)}) - y^{(i)})$

These formulas are used to update the parameters w and b in each iteration of the gradient descent algorithm.



In [None]:
%pip install numpy
%pip install matplotlib

In [35]:
import numpy as np
import matplotlib.pyplot as plt
import copy, math

In [36]:
class MultipleLinearRegression:
    def __init__(self, x_train, y_train):
        """
        Initializes the Multiple Linear Regression model with training data.

        Parameters:
        - x_train: Matrix of input features.
        - y_train: Array of corresponding target values.
        """
        self.x_train = x_train
        self.y_train = y_train
        self.b_in = 0
        self.m = len(y_train)  # Number of training examples
        self.w_in = np.zeros(len(self.x_train[0]))  # Initialize the weight with zeros

    def cost_function(self, w, b):
        """
        Computes the cost function for multiple linear regression, which measures the difference between
        predicted values and actual values.

        The cost function used here is Mean Squared Error (MSE).

        Parameters:
        - w: Weight matrix for the multiple linear regression model.
        - b: Bias for the linear regression model.

        Returns:
        - The computed cost (MSE).
        """
        total_cost = 0

        # Sum of squared differences between predicted and actual values
        for i in range(self.m):
            y_predicted = np.dot(self.x_train[i], w) + b
            total_cost += (y_predicted - self.y_train[i]) ** 2

        # Return the mean of the squared differences
        return (1 / (2 * self.m)) * total_cost

    def compute_prediction_array(self, w, b):
        """
        Computes the predicted values (y_hat) for the training set using the current weight and bias.

        Parameters:
        - w: Weight matrix for the multiple linear regression model.
        - b: Bias for the linear regression model.

        Returns:
        - An array of predicted values for each training example.
        """
        result = np.zeros(self.m)
        for i in range(self.m):
            result[i] = np.dot(self.x_train[i], w) + b

        return result

    def gradient_w(self, w, b):
        """
        Computes the gradient of the cost function with respect to the weight matrix(w).
        This gradient indicates how much the cost would change with a small change in the weight.

        Parameters:
        - w: Current weight matrix value.
        - b: Current bias value.

        Returns:
        - The gradient vector for the weight matrix.
        """
        gradient_value = np.zeros(len(self.x_train[0]))

        for i in range(self.m):
            y_predicted = np.dot(self.x_train[i], w) + b
            for j in range(len(self.x_train[0])):  
                gradient_value[j] += (y_predicted - self.y_train[i]) * self.x_train[i][j]
                
        # Average gradient over all training examples
        return (1 / self.m) * gradient_value

    def gradient_b(self, w, b):
        """
        Computes the gradient of the cost function with respect to the bias (b).
        This gradient indicates how much the cost would change with a small change in the bias.

        Parameters:
        - w: Current weight value.
        - b: Current bias value.

        Returns:
        - The gradient value for the bias.
        """
        gradient_value = 0

        for i in range(self.m):
            y_predicted = np.dot(self.x_train[i], w) + b
            gradient_value += y_predicted - self.y_train[i]

        # Average gradient over all training examples
        return (1 / self.m) * gradient_value

    def gradient_descent(self, iterations=1200, learning_rate=0.01):
        """
        Performs the gradient descent optimization to find the optimal values of weight matrix(w) and bias (b)
        that minimize the cost function.

        Parameters:
        - iterations: Number of iterations to run the gradient descent (default is 1200).
        - learning_rate: Step size for each iteration of gradient descent (default is 0.01).

        Returns:
        - Optimized weight matrix (w_optimised) and bias (b_optimised).
        """
        w_optimised = copy.deepcopy(self.w_in)  # Initialize the optimized weight matrix with the initial value
                                                #avoid modifying global w within function, so use deepcopy
        b_optimised = self.b_in  # Initialize the optimized bias with the initial value

        for i in range(iterations):
            # Compute gradients for weight matrix and bias
            dj_dw = self.gradient_w(w_optimised, b_optimised)
            dj_db = self.gradient_b(w_optimised, b_optimised)

            # Update the weight and bias by moving in the opposite direction of the gradients
            temp_w = w_optimised - learning_rate * dj_dw
            temp_b = b_optimised - learning_rate * dj_db

            # Assign the updated values back to the optimized variables
            w_optimised = temp_w
            b_optimised = temp_b

            # Print the progress every 100 iterations
            if (i + 1) % 100 == 0:
                print(f"After {i + 1} iterations: w -> {w_optimised}      b -> {b_optimised}.       cost -> {self.cost_function(w_optimised, b_optimised)}")

        return w_optimised, b_optimised




In [37]:
# Training data
x_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

# Initialize the Linear Regression model
model = MultipleLinearRegression(x_train, y_train)


# Perform gradient descent to find the optimal weight and bias
w, b = model.gradient_descent(iterations=1000, learning_rate=0.001)

# # Scatter plot of the training data
# plt.scatter(x_train, y_train, marker = "x", color = "red")

# # Display the final optimized weight and bias
# print(f"w -> {w}.     b -> {b}")

# # Show the plot
# plt.show()

After 100 iterations: w -> [nan nan nan nan]      b -> nan.       cost -> nan
After 200 iterations: w -> [nan nan nan nan]      b -> nan.       cost -> nan
After 300 iterations: w -> [nan nan nan nan]      b -> nan.       cost -> nan
After 400 iterations: w -> [nan nan nan nan]      b -> nan.       cost -> nan
After 500 iterations: w -> [nan nan nan nan]      b -> nan.       cost -> nan
After 600 iterations: w -> [nan nan nan nan]      b -> nan.       cost -> nan
After 700 iterations: w -> [nan nan nan nan]      b -> nan.       cost -> nan
After 800 iterations: w -> [nan nan nan nan]      b -> nan.       cost -> nan
After 900 iterations: w -> [nan nan nan nan]      b -> nan.       cost -> nan
After 1000 iterations: w -> [nan nan nan nan]      b -> nan.       cost -> nan


  gradient_value[j] += (y_predicted - self.y_train[i]) * self.x_train[i][j]
  temp_w = w_optimised - learning_rate * dj_dw


In [5]:
x_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

print(x_train[0])

[2104    5    1   45]
