In [1]:
import numpy as np

# Logistic Regression

The goal of a binary logistic regression is to model the probability of a random variable $Y$ being 0 or 1 given experimental data. It may be trained on multiclass problems as well, but the focus will be on binary classes.

The logistic regression curve is defined by the sigmoid function, with equation presented below.

<center>
    $ \Pr(Y=1\mid X; \hat{\beta}) = {\dfrac {1}{1+e^{-X^{T}\hat{\beta} + \beta_0}}} = h_\hat{\beta}(X) \implies \Pr(Y=0\mid X; \hat{\beta}) = 1 - h_\hat{\beta}(X) $
</center>

Where $\hat{\beta}$ is the vector of coefficients, $\beta_0$ is the bias and $X$ is the vector of features.

Let's start by defining them.

In [5]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

In [12]:
def make_prediction(weights, features, bias):
    y_hat = sigmoid(np.matmul(weights, features) + bias)
    return y_hat

### Loss Function

For logistic regression, least squares estimation is not capable of producing minimum variance unbiased estimators for the actual parameters. In its place, maximum likelihood estimation / cross-entropy minimization is used to solve for the parameters that best fit the data.

The cross-entropy is defined by

<center>
    $ H(y, p) = - \sum\limits_{1}^n y_i\ln(p_i)+(1-y_i)\ln(1-p_i) = Error$
</center>

Where $y$ is the binary classification distribution and $p$ the predicted event probability.

In [18]:
def loss_function(y, p):
    y = np.array(y)
    p = np.array(p)
    ce = -y * np.log(p) - (1 - y) * np.log(1 - p)
    return ce

To minimize the loss function, gradient descent may be used, it works by updating the weights and bias proportionally to the derivative of the loss function. The cross entropy derivative is defined by

<center>
    $ \nabla Error =  -(y - \hat{y})(x_1, x_2, \dots, x_n, 1) $
</center>

Therefore, the weights and bias are updated in the following way:

<center>
    $ w_i^{\prime} \leftarrow w_i + \alpha (y - \hat{y}) x_i $
</center>

<center>
    $b^{\prime} \leftarrow b + \alpha (y - \hat{y})$
</center>

Where $\alpha$ is the learning rate.

In [21]:
def update_weights(x, y, y_hat, weights, bias, learning_rate):
    
    bias += learning_rate * (y - y_hat)
    weights += learning_rate * (y - y_hat) * x

    return weights, bias

Ok! We are ready to train a logistic regression. For simplicity batch gradient descent will be used, but for larger datasets stochastic gradient descent presents a faster convergence.

In [None]:
def fit():
    # WIP