# Gradient Descent on Cross Entropy
Sigmoid Function: $$h(x) = \frac{1}{1+e^{-(wx+b)}}$$
Loss Function(Cross Entropy): $$ - \frac{1}{N} \sum_{i=1}^{n}[y_i log(h(x_i)) + (1-y_i) log(1-h(x_i))]$$
$$dm = - \frac{1}{N} \sum_{i=1}^{n} x_i(y_i - yprediction)$$
$$db = - \frac{1}{N} \sum_{i=1}^{n} (y_i - yprediction)$$

In [15]:
import numpy as np


# Binary Classification

In [16]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

class LogisticRegression():

    def __init__(self, lr=0.001, n_iters=1000):
        self.lr = lr
        self.n_iters = n_iters
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.n_iters):
            linear_pred = np.dot(X, self.weights) + self.bias
            predictions = sigmoid(linear_pred)

            dw = (1/n_samples) * np.dot(X.T, (predictions - y))
            db = (1/n_samples) * np.sum(predictions-y)
            
            # giving low accuracy if use -1 in front of dw and db
            # dw = -(2/n_samples) * np.dot(X.T, (predictions - y))
            # db = -(2/n_samples) * np.sum(predictions-y)

            self.weights = self.weights - self.lr*dw
            self.bias = self.bias - self.lr*db


    def predict(self, X):
        linear_pred = np.dot(X, self.weights) + self.bias
        y_pred = sigmoid(linear_pred)
        class_pred = [0 if y<=0.5 else 1 for y in y_pred]
        return class_pred

In [17]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

clf = LogisticRegression(lr=0.01)
clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)

def accuracy(y_pred, y_test):
    return np.sum(y_pred==y_test)/len(y_test)

acc = accuracy(y_pred, y_test)
print(acc)

0.9210526315789473


  return 1/(1+np.exp(-x))


In logistic regression gradient descent, setting the learning rate to a negative value can cause the weights and biases to diverge instead of converging to the optimal values. This is because a negative learning rate will cause the weights and biases to move in the opposite direction of the gradient, which will make the loss function increase instead of decrease.

Therefore, it is important to set the learning rate to a positive value when performing gradient descent in logistic regression. In the code snippet you provided, setting dw and db to -1 will result in a negative learning rate, which will lead to poor performance and low accuracy.

Instead, you should choose a positive learning rate that is small enough to ensure that the algorithm converges to the optimal solution without overshooting or oscillating. A typical range of learning rates is between 0.01 and 0.1, but the optimal value depends on the specific problem and dataset. You may need to experiment with different learning rates to find the one that works best for your problem.