# Logistic Regression

Logistic Regression is not a regression but a classification algorithm. It also happens to be one of the major stepping stones towards Neural Networks and therefore interesting to learn more about. This notebook demonstrates how Logistic Regression works.


# Data

We can use a classifier on the Wine data set by converting the target variable to a boolean: $y = 1$ if the quality >= 6 or 0 otherwise. The easiest way is to create a new column for our label whether the wine is 'good'.

In [1]:
from ml import *
from scipy.special import expit as logit # is more stable in case of overflows
data = wines_binary('quality', 'alcohol', 'pH', threshold=6)
data.plot2d()

In [2]:
# first we plot it so that the plot shows alcohol and pH and then add a bias.
data.bias = 1
data.column_y = True

# Cost Function and Update Rule

To estimate $\theta$ we use the vectorized version of the cost function:

$$ J(\theta) = -\frac{1}{m} \left[ y^T \cdot log \left( logit( X \cdot \theta ) \right) + (1 - y^T) \cdot log \left( 1 - logit( X \cdot \theta ) \right) \right] $$

and update rule:

$$ \theta := \theta - \frac{\alpha}{m} \cdot X^T \cdot (\ logit(X \cdot \theta) - y )$$

Next we write a function `fit_model` that uses Batch Gradient Descent to estimate $\theta$ by repeatedly applying the update rule for some number of iterations. We start by defining the hypothesis $h(X, \theta)$. Remember that the result of the logit can be interpreted as the likelihood that $y=1$.

In [3]:
def h(X, 𝜃):
    return logit(X @ 𝜃)

In [4]:
"""
Updates parameters theta for #iterations using the logistic regression update rule
X: n x m matrix containing the input for n training examples, each having m features
y: n x 1 matrix containing the correct class {0,1} for the n training examples
alpha: learning rate
iterations: number of iterations
returns: theta
"""
def fit_model(X, y, alpha=0.00001, iterations=50000):
    m = X.shape[1]            # het aantal coefficienten
    𝜃 = np.zeros((m, 1))  # initialiseer theta
    for iter in range(iterations):
        𝜃 -= (alpha / m) * X.T @ ( h(X, 𝜃) - y )
    return 𝜃

# Interpreting the coefficients

Now, we'll fit the model and look at the values for $\theta$. Our analysis of the logistic function told us that values of $\theta^T \cdot x > 0$ contribute to predicting class 1 and values of $\theta^T \cdot x < 0$ contribute to predicting class 1. Therefore, we can see from the sign of $\theta_1$ that a higher alcohol percentage is associated with good wine and from the sign of $\theta_2$ that a higher pH-value is associated with bad wine.

In [5]:
𝜃 = fit_model(data.train_X, data.train_y)
𝜃

array([[-1.12640594],
       [ 1.0476586 ],
       [-2.88445623]])

# Evaluation

# Prediction function

To use the trained model, we need a `predict` function that classifies a set of cases `X`. Since the outcome of the logistic function can be interpreted as the likelihood that $P(y = 1| x; \theta)$, we choose to return `True` if our model returns an estimation greater or equal than `0.5` and thus indicates $y=1$, or `False` otherwise indicating $y=0$.

In [6]:
"""
X: n x m matrix containing the input for n training examples, each having m features
theta: m x 1 matrix containing the coefficients for the model
Returns true if the hypothesis for a given x >= 0.5 otherwise false
"""
def predict(X, 𝜃):
    return h(X, 𝜃) >= 0.5

We can then evaluate our model by comparing the predictions on a set of test cases for which we verify if the prediction equals the True label.

In [7]:
"""
X: n x m matrix containing the input for n training examples, each having m features
y: n x 1 matrix containing the correct class {0,1} for the n training examples
theta: m x 1 matrix containing the coefficients for the model
Returns percentage correctly predicted cases in X
"""
def evaluate(𝜃, X, y):
    return sum( predict(X, 𝜃) == y ) / len(X)

Our model has an accuracy of 70.8%, in other words, the percentage of bottles for which correctly predicts whether it is good or bad.

In [8]:
evaluate(𝜃, data.train_X, data.train_y)

array([0.70758405])

# Adding features

So what can we do to improve the model's effectiveness? One of the possibilities is to use more features. Unfortunately, we cannot really visualize this in the same way as we did with two features, but trying it out is very easy. In the results below we see that using all features correctly classifies 72.9%.

In [9]:
data = wines_binary('quality', threshold=6, bias=True, column_y=True)

In [10]:
theta = fit_model(data.train_X, data.train_y)
evaluate(theta, data.train_X, data.train_y)

array([0.72869429])