# Logistic Regression

## Logistic Regression Vectorized Functions


### 1. Hypothesis Function in Vector Form
The hypothesis function calculates the predicted probability for all training examples in a vectorized manner using the sigmoid function:

$$
h = \sigma(X \theta)
$$

Where:
- \( h \): \( m \times 1 \) vector of predicted probabilities.
- \( X \): \( m \times n \) matrix of training examples, where each row represents an input vector \( x^{(i)} \) and \( m \) is the number of examples.
- \( \theta \): \( n \times 1 \) vector of parameters (weights and bias).
- \( \sigma(z) = \frac{1}{1 + e^{-z}} \): Sigmoid function that maps real-valued inputs into the range [0, 1].

The sigmoid function ensures that the output is between 0 and 1, which is interpreted as the probability that the output is 1 for a given input.

### 2. Cost Function in Vector Form
The cost function measures the error between the predicted probabilities and the actual labels for all training examples:

$$
J(\theta) = -\frac{1}{m} \left[ y^T \log(h) + (1 - y)^T \log(1 - h) \right]
$$

Where:
- \( J(\theta) \): Cost function, which is a scalar value representing the average error across all examples.
- \( y \): \( m \times 1 \) vector of actual labels (0 or 1).
- \( h \): \( m \times 1 \) vector of predicted probabilities, calculated from the hypothesis function.

The cost function computes the cross-entropy loss, which is used to evaluate how well the logistic regression model fits the data. It penalizes incorrect predictions more heavily, making it effective for training.

### 3. Gradient of the Cost Function in Vector Form
The gradient of the cost function is used to determine the direction to update the parameters \( \theta \):

$$
\nabla J(\theta) = \frac{1}{m} X^T (h - y)
$$

Where:
- \( \nabla J(\theta) \): Gradient vector, which is \( n \times 1 \) and represents the partial derivatives of the cost function with respect to each parameter \( \theta_j \).
- \( X^T \): Transpose of the input matrix \( X \), resulting in a \( n \times m \) matrix.
- \( h - y \): \( m \times 1 \) vector of errors between predicted probabilities and actual labels.

The gradient helps in updating the model parameters during gradient descent to reduce the overall cost.

### Parameters Explanation
- **\( X \)**: Input feature matrix with dimensions \( m \times n \), where \( m \) is the number of training examples and \( n \) is the number of features.
- **\( \theta \)**: Parameter vector with dimensions \( n \times 1 \), representing the weights and bias for the model.
- **\( y \)**: True labels vector with dimensions \( m \times 1 \), containing binary labels (0 or 1).
- **\( h \)**: Predicted probability vector with dimensions \( m \times 1 \), containing the probabilities calculated using the sigmoid function.

### Summary of Functions
- **Hypothesis Function**: Calculates the predicted probabilities for all training examples using a linear combination of features and a sigmoid function.
  
  $$
  h = \sigma(X \theta)
  $$

- **Cost Function**: Measures the average error across all examples using cross-entropy loss.
  
  $$
  J(\theta) = -\frac{1}{m} \left[ y^T \log(h) + (1 - y)^T \log(1 - h) \right]
  $$

- **Gradient of the Cost Function**: Provides the direction and magnitude of change needed to minimize the cost function.

  $$
  \nabla J(\theta) = \frac{1}{m} X^T (h - y)
  $$

These vectorized forms are efficient for implementation, especially when dealing with large datasets, as they take advantage of matrix operations for faster computation.



In [12]:
import numpy as np

# Define the LogisticRegression Class
class LogisticRegression():
    
    def __init__(self, lr=0.01, n_iters = 1000):
        self.lr = lr 
        self.n_iters = n_iters
        self.weights = None
        self.bias = None

    def sigmoid(self, x):
        x = np.clip(x, -500, 500)  # Clip values to prevent overflow
        return 1/(1+np.exp(-x))

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.n_iters):
            linear_comb = np.dot(X, self.weights) + self.bias
            predictions = self.sigmoid(linear_comb)

            # using gradient descent for parameter update
            dw = (1/n_samples) * np.dot(X.T, (predictions - y))
            db = (1/n_samples) * np.sum(predictions - y)

            # Updating the parameters
            self.weights = self.weights - self.lr * dw
            self.bias = self.bias - self.lr * db 

    def predict(self, X):
        linear_comb = np.dot(X, self.weights) + self.bias
        y_pred = self.sigmoid(linear_comb)

        # Using a thrashold of 0.5 to classify probabilities
        class_pred = [0 if y <= 0.5 else 1 for y in y_pred]
        return class_pred
    
    def accuracy(self, y_test, y_pred):
        
        return np.sum(y_pred == y_test) / len(y_test)


## Testing

In [13]:
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

df = datasets.load_breast_cancer()
X, y = df.data, df.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
model_accuracy = model.accuracy(y_test, y_pred)

print(f'the accuracy for the logistic regression model is:{model_accuracy}')

the accuracy for the logistic regression model is:0.9473684210526315
