<a href="https://colab.research.google.com/github/ronbalanay/MAT-422/blob/main/MAT422_HW_3_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3.4 Logistic Regression

Logistic regression models the probability of a binary outcome, which we achieve through the sigmoid function, converting linear combinations of features into probabilities. Here, we define the cross_entropy_loss function to calculate the loss, which reflects the error in our predictions. The gradient function calculates the gradient of the cross-entropy loss, which helps us improve parameters. In each iteration of gradient_descent, we update our parameters by moving in the direction that reduces the loss, controlled by the learning rate. The loop continues until the gradient magnitude falls below a specified tolerance level or reaches the maximum allowed iterations, ensuring we converge to a solution. Finally, we print the matrix A, the vector b representing labels, the optimized parameters x, and the cross-entropy loss at this solution.

In [1]:
import numpy as np

# sigmoid function σ(t) = 1 / (1 + e^(-t))
def sigmoid(t):
    return 1 / (1 + np.exp(-t))

# cross-entropy loss ℓ(x; A, b) for logistic regression
def cross_entropy_loss(x, A, b):
    n = len(b)
    predictions = sigmoid(np.dot(A, x))
    # calculate the loss with cross-entropy formula
    loss = -np.mean(b * np.log(predictions) + (1 - b) * np.log(1 - predictions))
    return loss

# gradient of the cross-entropy loss function
def gradient(x, A, b):
    n = len(b)
    predictions = sigmoid(np.dot(A, x))
    # compute the gradient as derived
    grad = -np.dot(A.T, (b - predictions)) / n
    return grad

# gradient descent algorithm for logistic regression
def gradient_descent(A, b, starting_point, learning_rate, tolerance, max_iterations):
    x_k = starting_point
    for k in range(max_iterations):
        grad = gradient(x_k, A, b)
        # check if the gradient is close enough to zero
        if np.linalg.norm(grad) < tolerance:
            break
        x_k = x_k - learning_rate * grad  # update x_k using gradient descent step
    return x_k

# example data (matrix A of features and vector b of labels)
A = np.array([[0.5, 1.5], [1.0, 1.8], [1.5, 2.5], [2.0, 2.8], [2.5, 3.3]])  # sample features
b = np.array([0, 0, 1, 1, 1])  # sample labels (binary)

# parameters for gradient descent
starting_point = np.zeros(A.shape[1])  # initial guess (zero vector)
learning_rate = 0.1                    # step size
tolerance = 1e-6                       # convergence tolerance
max_iterations = 1000                  # maximum number of iterations

# execute gradient descent
optimal_x = gradient_descent(A, b, starting_point, learning_rate, tolerance, max_iterations)

# output results
print("matrix A:\n", A)
print("vector b:", b)
print("optimal parameters (x):", optimal_x)
print("cross-entropy loss at optimal x:", cross_entropy_loss(optimal_x, A, b))


matrix A:
 [[0.5 1.5]
 [1.  1.8]
 [1.5 2.5]
 [2.  2.8]
 [2.5 3.3]]
vector b: [0 0 1 1 1]
optimal parameters (x): [ 4.39880837 -2.34876124]
cross-entropy loss at optimal x: 0.31080199638903494
