# Setup and Loss Definition

In [6]:
# sample data
import numpy as np
y_true = np.array([1, 0, 1, 1, 0])
y_pred = np.array([0.9, 0.2, 0.8, 0.7, 0.1])

In [2]:
def cross_entropy(y_true, y_pred, lambda_reg=None):
    if lambda_reg:
        return -1*np.sum(
            y_true * np.log(y_pred) + (1-y_true)*np.log(1-y_pred)
        ) + lambda_reg*np.sum(y_pred**2)
    else:
        return -1*np.sum(
            y_true * np.log(y_pred) + (1-y_true)*np.log(1-y_pred)
        )

In [None]:
# unregularized log loss
cross_entropy(y_true, y_pred)

np.float64(1.0136830778828045)

In [9]:
# regularized log loss
cross_entropy(y_true, y_pred, .5)

np.float64(2.0086830778828046)

# Modify Predictions, Understand Impact to Regularized Loss

Let's try some different pred sets and see how it impacts the regularized loss. 

In [16]:
adjustments = np.arange(.1, 1, .1)
for adj in adjustments:
    y_pred_adj = y_pred*adj
    log_loss=cross_entropy(y_true, y_pred_adj)
    regularized_loss=cross_entropy(y_true, y_pred_adj, .5)
    print(f"Trying adj of {np.round(adj,2)}: Log loss {np.round(log_loss,2)}, Reguarlized Log loss {np.round(regularized_loss,2)}")

Trying adj of 0.1: Log loss 7.62, Reguarlized Log loss 7.63
Trying adj of 0.2: Log loss 5.57, Reguarlized Log loss 5.61
Trying adj of 0.3: Log loss 4.39, Reguarlized Log loss 4.48
Trying adj of 0.4: Log loss 3.56, Reguarlized Log loss 3.72
Trying adj of 0.5: Log loss 2.92, Reguarlized Log loss 3.17
Trying adj of 0.6: Log loss 2.41, Reguarlized Log loss 2.77
Trying adj of 0.7: Log loss 1.98, Reguarlized Log loss 2.47
Trying adj of 0.8: Log loss 1.61, Reguarlized Log loss 2.25
Trying adj of 0.9: Log loss 1.29, Reguarlized Log loss 2.1


## Discussion of results

What is clear to me here is that the further away from 1 the predictions are, the closer the unregularized log loss becomes to the regularized version. This tells us that the regularization is penalizing predictions close to 1, making the model a bit more conservative in terms of predicting the positive class. Normally this kind of regularization would be done on the models weights themselves rather than the predictions (to discourage overfitting and make the model a bit more sparse).