### **Cross-Entropy Loss from scratch** in a **PyTorch-like way**:

1. Use **pure Python with PyTorch tensors**.
2. Support both **manual formula-based computation** and optional extension via `torch.autograd.Function` if you want full control over gradients.

---

## Cross-Entropy Loss – Quick Recap

For binary classification:

$$
\mathcal{L} = -[y \cdot \log(p) + (1 - y) \cdot \log(1 - p)]
$$

For multi-class classification (with softmax):

$$
\mathcal{L} = -\sum_{i} y_i \cdot \log(p_i)
$$

Where:

* $y_i$: one-hot encoded label
* $p_i$: predicted probability for class $i$

---

## ✅ 1. Cross-Entropy Loss (Multi-Class) – Manual Implementation

We assume:

* `inputs` = logits (raw outputs, shape `[batch_size, num_classes]`)
* `targets` = class indices (e.g., `[2, 0, 1]`)

### 📦 Implementation

In [1]:
import torch
import torch.nn as nn

class MyCrossEntropyLoss(nn.Module):
    def __init__(self):
        super(MyCrossEntropyLoss, self).__init__()

    def forward(self, logits, targets):
        # logits: [batch_size, num_classes]
        # targets: [batch_size], with class indices (not one-hot)
        
        # Step 1: Apply softmax to get probabilities
        probs = torch.softmax(logits, dim=1)
        
        # Step 2: Get the log of probabilities for the correct class
        batch_size = logits.shape[0]
        log_probs = torch.log(probs[range(batch_size), targets])
        
        # Step 3: Compute negative log likelihood
        loss = -torch.mean(log_probs)
        
        return loss

### Example Usage

In [2]:
logits = torch.tensor([[2.0, 1.0, 0.1], 
                       [0.5, 2.5, 0.3]], requires_grad=True)  # shape [2, 3]

targets = torch.tensor([0, 1])  # Ground-truth class indices

criterion = MyCrossEntropyLoss()
loss = criterion(logits, targets)

print("Cross-Entropy Loss:", loss.item())  # Scalar

Cross-Entropy Loss: 0.3185397684574127


## Notes:
* This mimics `nn.CrossEntropyLoss`, which **combines `LogSoftmax + NLLLoss`**.
* `torch.autograd` will handle the backward pass since we used PyTorch operations.
* You can call `loss.backward()` to get gradients with respect to logits.

---

## 🧠 Bonus: Binary Cross-Entropy (BCE) Version

For binary classification with sigmoid:

In [3]:
class MyBinaryCrossEntropyLoss(nn.Module):
    def __init__(self):
        super(MyBinaryCrossEntropyLoss, self).__init__()

    def forward(self, logits, targets):
        # logits: raw outputs [batch_size]
        # targets: [batch_size] with 0 or 1

        probs = torch.sigmoid(logits)
        loss = - (targets * torch.log(probs + 1e-8) + (1 - targets) * torch.log(1 - probs + 1e-8))
        return torch.mean(loss)