# Tutorial 14: Entropy, Cross-Entropy, and KL Divergence - Code

This notebook provides interactive code examples to accompany the theory and exercises.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme(style="whitegrid")

## 1. Implementing the Core Functions

First, let's implement the three core concepts from scratch. We'll use these to verify the examples in the exercises.

In [None]:
def entropy(P):
    """Calculate Shannon Entropy."""
    # Add a small epsilon to prevent log(0)
    P = P + 1e-9
    return -np.sum(P * np.log2(P))

def cross_entropy(P, Q):
    """Calculate Cross-Entropy."""
    P = P + 1e-9
    Q = Q + 1e-9
    return -np.sum(P * np.log2(Q))

def kl_divergence(P, Q):
    """Calculate KL Divergence."""
    P = P + 1e-9
    Q = Q + 1e-9
    return np.sum(P * np.log2(P / Q))

### Verification using the Weather Example

Let's test our functions with the weather example from Exercise A3:
-   True Distribution $P = [0.5, 0.5]$
-   Model Distribution $Q = [0.8, 0.2]$

In [None]:
P_weather = np.array([0.5, 0.5])
Q_weather = np.array([0.8, 0.2])

H_P = entropy(P_weather)
H_PQ = cross_entropy(P_weather, Q_weather)
D_KL_PQ = kl_divergence(P_weather, Q_weather)

print(f"Entropy H(P) = {H_P:.3f} bits")
print(f"Cross-Entropy H(P, Q) = {H_PQ:.3f} bits")
print(f"KL Divergence D_KL(P || Q) = {D_KL_PQ:.3f} bits")
print("-"*30)
print(f"H(P) + D_KL(P || Q) = {H_P + D_KL_PQ:.3f}")
print(f"Does H(P,Q) == H(P) + D_KL(P||Q)? {'Yes' if np.isclose(H_PQ, H_P + D_KL_PQ) else 'No'}")

## 2. Visualizing Cross-Entropy as a Loss Function

This visualization corresponds to Exercise B2. We can see how the loss behaves as our model's prediction for the correct class varies. The true label is `1` (or `[1, 0]` in one-hot form).

In [None]:
# The true distribution for a single data point where the correct class is the first one.
P_true = np.array([1.0, 0.0])

# A range of model predictions for the correct class.
pred_correct_prob = np.linspace(0.01, 0.99, 200)

# Calculate the cross-entropy loss for each prediction.
losses = [cross_entropy(P_true, np.array([p, 1-p])) for p in pred_correct_prob]

plt.figure(figsize=(12, 7))
plt.plot(pred_correct_prob, losses, label='Cross-Entropy Loss', color='darkred', lw=2.5)
plt.xlabel("Model's Predicted Probability for the Correct Class", fontsize=12)
plt.ylabel("Loss", fontsize=12)
plt.title("Behavior of Cross-Entropy Loss", fontsize=14)
plt.axvline(0.5, color='gray', linestyle='--', label='Uncertain Prediction (0.5)')
plt.ylim(0, 7) # Limit y-axis to see the curve shape better
plt.legend()
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.show()

**Insight from the plot:**

The loss is minimized as the predicted probability for the correct class approaches 1. Conversely, the loss grows exponentially as the prediction approaches 0. This steep penalty for being confidently wrong is what makes cross-entropy such an effective loss function for classification tasks.