# Log Loss and Gradient (Binary Classification)

In logistic regression, the model predicts a probability representing
confidence in class 1.

This notebook studies:
1. Why log loss is used to measure error in classification
2. How log loss punishes confident wrong predictions
3. How its gradient provides a clean learning signal

The focus is on understanding learning behavior, not memorizing formulas.


In [1]:
import numpy as np

def log_loss(y, p, eps=1e-15):
    p = np.clip(p, eps, 1 - eps)
    return -(y * np.log(p) + (1 - y) * np.log(1 - p))


In [2]:
y = 1
for p in [0.9, 0.5, 0.1, 0.01]:
    print(p, log_loss(y, p))


0.9 0.10536051565782628
0.5 0.6931471805599453
0.1 2.3025850929940455
0.01 4.605170185988091


Log loss penalizes predictions based on confidence.

Being wrong with high confidence results in a large loss,
while being correct with high confidence results in a very small loss.

This makes log loss suitable for probabilistic classification.


When combining sigmoid output with log loss, the gradient of the loss
with respect to the model’s score simplifies to:

gradient = predicted_probability − true_label

This provides a clean and stable learning signal.


In [3]:
def log_loss_gradient(y, p):
    return p - y


In [4]:
def numerical_gradient(y, z, eps=1e-5):
    def sigmoid(z):
        return 1 / (1 + np.exp(-z))
    
    p1 = sigmoid(z + eps)
    p2 = sigmoid(z - eps)
    
    loss1 = log_loss(y, p1)
    loss2 = log_loss(y, p2)
    
    return (loss1 - loss2) / (2 * eps)

# test
z = 0.7
y = 1
p = 1 / (1 + np.exp(-z))

print("Analytical:", log_loss_gradient(y, p))
print("Numerical :", numerical_gradient(y, z))


Analytical: -0.33181222783183384
Numerical : -0.33181222783562614


## Key Takeaway

Log loss measures how wrong a probability prediction is.

Its gradient simplifies to:
prediction − truth

This causes strong corrections for confident mistakes and weak corrections
for confident correct predictions, enabling stable learning in
logistic regression.
