<a href="https://colab.research.google.com/github/huguryildiz/MLbasics/blob/main/gradient_descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ML Basics

# Gradient Descent

In [None]:
# GRADIENT DESCENT - SIMPLE EXAMPLE

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Generate synthetic data
np.random.seed(42)
x = np.linspace(0, 10, 100)
y = 3 * x + 7 + np.random.randn(100) * 2  # y = 3x + 7 + noise

# Initialize parameters
w = np.random.randn()
b = np.random.randn()

# Hyperparameters
learning_rate = 0.01
iterations = 1000

# Gradient Descent
cost_history = []
w_history = []
b_history = []
for i in range(iterations):
    y_pred = w * x + b
    error = y_pred - y
    cost = np.mean(error ** 2)  # Mean Squared Error
    cost_history.append(cost)
    w_history.append(w)
    b_history.append(b)

    # Compute gradients
    dw = (2 / len(x)) * np.sum(error * x)
    db = (2 / len(x)) * np.sum(error)

    # Update parameters
    w -= learning_rate * dw
    b -= learning_rate * db

    # Print status every 100 iterations
    if i % 100 == 0:
        print(f"Iteration {i}: Cost = {cost:.4f}, w = {w:.4f}, b = {b:.4f}")

# Final parameter values
print(f"Cost = {cost:.4f}, Final parameters: w = {w:.4f}, b = {b:.4f}")

# Plot cost function in 3D
w_values = np.linspace(w - 2, w + 2, 50)
b_values = np.linspace(b - 2, b + 2, 50)
J_values = np.zeros((50, 50))

for i in range(50):
    for j in range(50):
        w_temp = w_values[i]
        b_temp = b_values[j]
        y_pred_temp = w_temp * x + b_temp
        J_values[i, j] = np.mean((y_pred_temp - y) ** 2)

W, B = np.meshgrid(w_values, b_values)
fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(W, B, J_values, cmap='turbo', alpha=1)
ax.set_xlabel('Weight (w)')
ax.set_ylabel('Bias (b)')
ax.set_zlabel('Cost J(w,b)')
ax.set_title('Cost Function Surface')
ax.view_init(elev=25, azim=135)  # Adjust these values for a better perspective
# Plot gradient descent path
ax.scatter(w_history, b_history, cost_history, color='red', marker='o', s=5, label='Gradient Descent Path')
ax.legend()

plt.show()


Iteration 0: Cost = 1028.9745, w = 2.2691, b = 0.1652
Iteration 100: Cost = 6.9422, w = 3.5975, b = 2.8642
Iteration 200: Cost = 4.6208, w = 3.3741, b = 4.3501
Iteration 300: Cost = 3.7627, w = 3.2382, b = 5.2534
Iteration 400: Cost = 3.4456, w = 3.1557, b = 5.8026
Iteration 500: Cost = 3.3284, w = 3.1054, b = 6.1365
Iteration 600: Cost = 3.2850, w = 3.0749, b = 6.3396
Iteration 700: Cost = 3.2690, w = 3.0564, b = 6.4630
Iteration 800: Cost = 3.2631, w = 3.0451, b = 6.5380
Iteration 900: Cost = 3.2609, w = 3.0382, b = 6.5836
Cost = 3.2601, Final parameters: w = 3.0341, b = 6.6111




# Gradient Descent for Multivariable Regression

The cost function for multivariable linear regression is defined as:


$ J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right)^2
\ $

where:

$
h_{\theta}(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \dots + \theta_n x_n = \bf{\theta}^T X
\ $


*   $ J(\theta) $ is the cost function,
*   $m $ is the number of training examples
*   $n $ is the number of features
*   $h_{\theta}(x^{(i)})$ is the hypothesis function
*   $y^{(i)}$ is the actual target value,
*   $X$ is the feature matrix including the bias term ($\theta_0$) [mxn]
*   $\theta$ is the parameter vector [1xn]
*   $\nabla J(\theta)$ is the gradient vector [nx1]


The gradient of $ J(\theta) $ with respect to $ \theta_j $ is:

$
\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^{m} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) x_j^{(i)}, j = 0 \ldots n-1
 $

Or in vectorized form:

$
\nabla J(\theta) = \frac{1}{m} X^T (X\theta - y)
 $

Repeat until convergence: $\{
\begin{equation}
    \theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^{m} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) \cdot x_j^{(i)}
    \quad \text{for } j := 0 \ldots n-1
\end{equation}
\} $

Check:

1. https://mrandri19.github.io/2019/04/01/deriving-gradient-descent-multivariate-linear-regression.html
2. https://medium.com/@IwriteDSblog/gradient-descent-for-multivariable-regression-in-python-d430eb5d2cd8

In [14]:
import numpy as np

def compute_cost(X, y, theta):
    """
    Compute the cost function for linear regression.
    """
    m = len(y)
    predictions = X.dot(theta)  # Matrix multiplication
    errors = predictions - y
    cost = (1 / (2 * m)) * np.sum(errors ** 2)
    return cost

def gradient_descent(X, y, theta, alpha, num_iters, print_every=100):
    """
    Perform gradient descent to optimize theta.
    """
    m = len(y)
    J_history = []

    for i in range(num_iters):
        gradient = (1 / m) * X.T.dot(X.dot(theta) - y)  # Vectorized gradient computation
        theta -= alpha * gradient
        J_history.append(compute_cost(X, y, theta))

        # Print results at specified intervals
        if (i + 1) % print_every == 0:
            print(f"Iteration {i+1}: Cost = {J_history[-1]:.6f}, Theta = {theta} \n")

    return theta, J_history

# Set random seed for reproducibility
np.random.seed(42)

# Generate a random dataset with m=4 samples and n=10 features
m, n = 10, 4
X = np.random.rand(m, n)

# Add bias term (theta_0) to X
X = np.c_[np.random.rand(m, 1), X]

# Generate random target values
y = np.random.rand(m)

# Initialize theta randomly
theta = np.random.rand(n + 1)

# Set hyperparameters
alpha = 0.1  # Learning rate
num_iters = 5000  # Number of iterations

print(f" X = {X} \n")
print(f" y = {y} \n")
print(f" theta = {theta} \n")

# Perform gradient descent
optimal_theta, cost_history = gradient_descent(X, y, theta, alpha, num_iters)

# Display final results
print("\nFinal Theta:", optimal_theta)
print("Final Cost:", cost_history[-1])



 X = [[0.12203823 0.37454012 0.95071431 0.73199394 0.59865848]
 [0.49517691 0.15601864 0.15599452 0.05808361 0.86617615]
 [0.03438852 0.60111501 0.70807258 0.02058449 0.96990985]
 [0.9093204  0.83244264 0.21233911 0.18182497 0.18340451]
 [0.25877998 0.30424224 0.52475643 0.43194502 0.29122914]
 [0.66252228 0.61185289 0.13949386 0.29214465 0.36636184]
 [0.31171108 0.45606998 0.78517596 0.19967378 0.51423444]
 [0.52006802 0.59241457 0.04645041 0.60754485 0.17052412]
 [0.54671028 0.06505159 0.94888554 0.96563203 0.80839735]
 [0.18485446 0.30461377 0.09767211 0.68423303 0.44015249]] 

 y = [0.96958463 0.77513282 0.93949894 0.89482735 0.59789998 0.92187424
 0.0884925  0.19598286 0.04522729 0.32533033] 

 theta = [0.38867729 0.27134903 0.82873751 0.35675333 0.28093451] 

Iteration 100: Cost = 0.053449, Theta = [ 0.33148146  0.51336996  0.29014402 -0.15460217  0.24508331] 

Iteration 200: Cost = 0.046753, Theta = [ 0.27096828  0.65826256  0.16396269 -0.23993096  0.37005219] 

Iteration 300: C