## Linear Regression Using Gradient Descent
Easy
Machine Learning

Write a Python function that performs linear regression using gradient descent. The function should take NumPy arrays X (features with a column of ones for the intercept) and y (target) as input, along with learning rate alpha and the number of iterations, and return the coefficients of the linear regression model as a NumPy array. Round your answer to four decimal places. -0.0 is a valid result for rounding a very small number.

Example:

Input:

In [None]:
'''
Clarify:
- what type of GD to use: all samples per iteration?

'''

In [None]:
import numpy as np


In [None]:
# Matrix computation
def linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, 
                                       alpha: float, iterations: int) -> np.ndarray:
    m, n = X.shape
    theta = np.zeros((n, 1))
    y = y.reshape(-1, 1)  # Ensure column vector

    for _ in range(iterations):
        y_pred = X @ theta
        gradient = (X.T @ (y_pred - y)) / m
        theta -= alpha * gradient

    theta = np.round(theta, 4)
    return [val.item() for val in theta]


In [None]:
# ###
def update_w_and_b(spendings, sales, w, b, alpha):
    dl_dw = 0.0
    dl_db = 0.0
    N = len(spendings)
    for i in range(N):
        dl_dw += -2*spendings[i]*(sales[i] - (w*spendings[i] + b))
        dl_db += -2*(sales[i] - (w*spendings[i] + b))
        
    # update w and b
    w = w - (1/float(N))*dl_dw*alpha
    b = b - (1/float(N))*dl_db*alpha
    
    return w, b

In [18]:
X = np.array([[1, 1], [1, 2], [1, 3]])
y = np.array([1, 2, 3])
alpha = 0.01
iterations = 1000

print(linear_regression_gradient_descent(X, y, alpha, iterations))
# Output: np.array([0.1107, 0.9513])

[0.1107, 0.9513]


In [22]:
np.insert(X, 0, 1, axis=1)
np.delete(X, 0, axis=1)
X

array([[1, 1],
       [1, 2],
       [1, 3]])

In [19]:
y = y.reshape(-1, 1)
y

array([[1],
       [2],
       [3]])

In [None]:

m, n = X.shape
theta = np.zeros((n, 1))

y_pred = (X @ theta).ravel()
theta += ((y - y_pred) @ X * alpha).reshape(n, 1)
theta

## Implement Gradient Descent Variants with MSE Loss
Medium
Machine Learning


In this problem, you need to implement a single function that can perform three variants of gradient descent Stochastic Gradient Descent (SGD), Batch Gradient Descent, and Mini Batch Gradient Descent using Mean Squared Error (MSE) as the loss function. The function will take an additional parameter to specify which variant to use. 

Note: **Do not shuffle** the data


In [24]:
'''
Clarify:
- Include intercept in the paramenters?
- Mini-batch: don't shuffle -> use the data in the original order?
- batch_size: can we assume it can be divided by the sample size?

MSE = mean[(y_pred - y_true) ^ 2]
gradient = 2 * X @ (y_pred - y_true) / batch_size 
theta = theta - lr * gradient
'''

"\nClarify:\n- Include intercept in the paramenters?\n- Mini-batch: don't shuffle -> use the data in the original order?\n- batch_size: can we assume it can be divided by the sample size?\n\nMSE = mean[(y_pred - y_true) ^ 2]\ngradient = 2 * X @ (y_pred - y_true) / batch_size \ntheta = theta - lr * gradient\n"

In [31]:
import numpy as np

def gradient_descent(X, y, weights, learning_rate, n_iterations, batch_size=1, method='batch', shuffle=True):
    """
    X: (m, n) input features
    y: (m,) target values
    weights: (n,) initial weights
    learning_rate: float
    n_iterations: int
    batch_size: int
    method: 'batch', 'stochastic', or 'mini_batch'
    shuffle: whether to shuffle data each epoch (default True)
    """
    m, n = X.shape

    if method == 'batch':
        batch_size = m
    elif method == 'stochastic':
        batch_size = 1
    elif method == 'mini_batch':
        assert 1 < batch_size < m, "Mini-batch size must be between 1 and m"

    for _ in range(n_iterations):
        # if shuffle:
        #     indices = np.random.permutation(m)
        # else:
        indices = np.arange(m)

        for start in range(0, m, batch_size):
            end = start + batch_size
            batch_idx = indices[start : end]

            X_batch = X[batch_idx]
            y_true = y[batch_idx]
            y_pred = X_batch @ weights
            
            gradient = 2 * X_batch.T @ (y_pred - y_true) / batch_size
            weights -= learning_rate * gradient

    return weights


In [None]:
# Sample data
X = np.array([[1, 1], [2, 1], [3, 1], [4, 1]])
y = np.array([2, 3, 4, 5])

# Parameters
learning_rate = 0.01
n_iterations = 100
batch_size = 2

# Initialize weights
weights = np.zeros(X.shape[1])

# Test Batch Gradient Descent
final_weights = gradient_descent(X, y, weights, learning_rate, n_iterations, method='batch')
print(final_weights)

# Test Stochastic Gradient Descent
final_weights = gradient_descent(X, y, weights, learning_rate, n_iterations, method='stochastic')
print(final_weights)

# Test Mini-Batch Gradient Descent
final_weights = gradient_descent(X, y, weights, learning_rate, n_iterations, batch_size, method='mini_batch')
print(final_weights)

[1.10334065 0.68329431]
