<a href="https://colab.research.google.com/github/shubhamgiri0905/assignment-demo/blob/main/ContinualLearningExample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install torch torchvision


Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

### **Documentation for Catastrophic Forgetting and Continual Learning**

This documentation explains the concept of **Catastrophic Forgetting** and how it can be tackled using **Continual Learning**. We will break down the theory and provide a detailed explanation of the code used to demonstrate these concepts. Even for non-technical readers, the document will cover the core ideas in a way that is easy to understand.

---

### **1. What is Catastrophic Forgetting?**

**Catastrophic Forgetting** refers to the phenomenon in which a neural network forgets previously learned information when trained on new data. Imagine that a neural network learns to recognize digits 0-4 (Task A). Once it starts learning digits 5-9 (Task B), it might perform very well on Task B but completely forget what it learned about Task A (digits 0-4). This happens because neural networks typically adjust their weights during training, and new information overwrites the learned knowledge, causing previous knowledge to be lost.

### **2. What is Continual Learning?**

**Continual Learning** is an approach used to address the problem of Catastrophic Forgetting. The idea is to allow a model to learn continuously from new tasks without forgetting previously learned tasks. One of the most popular methods to achieve this is by **replaying** past examples (or even using specialized algorithms that regularize learning to avoid forgetting).

In this document, we demonstrate a basic example of Catastrophic Forgetting using the MNIST dataset and a simple neural network (CNN), and we show how Continual Learning using **Naïve Replay** can help mitigate this issue.

---

### **3. Overview of the Code**

The provided Python code does the following:

1. **Demonstrates Catastrophic Forgetting**: We train a neural network on Task A (digits 0–4), then continue training on Task B (digits 5–9). We then show how the accuracy on Task A drops as the model learns Task B, demonstrating **Catastrophic Forgetting**.

2. **Introduces Continual Learning**: We introduce a simple **Naïve Replay** strategy, where we store a small number of samples from Task A in a **replay buffer**. When training on Task B, we mix the replayed samples from Task A with the current data from Task B. This helps the model "remember" Task A while learning Task B.

Let’s break down the core sections of the code:

---

### **4. Data Preparation (Task A & Task B)**

We use the **MNIST dataset**, which consists of images of handwritten digits (0-9). The task is to train the neural network on two separate groups of digits (0–4 for Task A and 5–9 for Task B).

* **Task A**: Digits 0, 1, 2, 3, 4
* **Task B**: Digits 5, 6, 7, 8, 9

#### **Why do we separate the data like this?**

By dividing the MNIST data into two tasks, we can simulate a situation where the neural network has to learn one set of data (Task A) and then learn a completely new set of data (Task B) without forgetting the first set. This demonstrates how catastrophic forgetting happens.

---

### **5. Model Architecture**

We define a simple **Convolutional Neural Network (CNN)** using PyTorch. The CNN has the following components:

1. **Convolutional Layers**: These layers learn spatial patterns in images (e.g., edges, corners, shapes).
2. **MaxPool Layers**: These reduce the dimensionality of the images, helping the network focus on important features.
3. **Fully Connected Layers**: These are the final layers that output class scores for each digit (0-9).

#### **Why CNN?**

CNNs are commonly used in image classification tasks because they are very efficient at learning patterns from images, such as edges and textures, which are important for recognizing digits in the MNIST dataset.

---

### **6. Training the Model on Task A and Task B**

* **Task A Training**: We first train the model on digits 0–4 (Task A). During this phase, the model learns to recognize the patterns in these digits.

* **Task B Training**: After the model is trained on Task A, we then continue training it on digits 5–9 (Task B). **Catastrophic Forgetting** happens at this stage, and the model may lose its ability to correctly classify Task A digits.

#### **Why sequential training?**

Sequential training simulates a real-world scenario where models often need to learn new tasks over time. For example, a model that learns to identify objects in one domain might later need to learn to recognize objects in a completely different domain.

---

### **7. Evaluating the Model for Forgetting**

After training on Task A and then on Task B, we evaluate the model on Task A again. We compare the accuracy before and after training on Task B:

* **Before Task B Training**: The model’s accuracy on Task A is high because it has just finished learning Task A.
* **After Task B Training**: The model’s accuracy on Task A decreases, demonstrating **Catastrophic Forgetting**.

#### **Why do we evaluate again?**

We want to observe how well the model retains knowledge of Task A after being trained on Task B. The difference in accuracy shows the impact of catastrophic forgetting.

---

### **8. Introducing Naïve Replay for Continual Learning**

We implement a **Naïve Replay** strategy to tackle Catastrophic Forgetting:

1. **Replay Buffer**: We randomly store 200 samples from Task A in a buffer.
2. **Replay Training**: When training on Task B, we mix the stored Task A samples with the new Task B samples. This ensures the model retains some memory of Task A while learning Task B.

#### **Why Naïve Replay?**

Naïve Replay is a simple and effective technique. By re-exposing the model to examples from previous tasks, we prevent it from forgetting what it has learned. It simulates how human memory works: by constantly recalling past experiences while learning new ones.

---

### **9. Final Evaluation and Comparison**

We compare the model’s performance:

* **Task A Before Task B Training**: High accuracy.
* **Task A After Task B Training (No Replay)**: Decreased accuracy, demonstrating catastrophic forgetting.
* **Task A After Task B Training (With Replay)**: The accuracy improves, showing that replay helps mitigate forgetting.

#### **Why is this comparison important?**

It clearly demonstrates the effectiveness of Continual Learning with Naïve Replay in preventing catastrophic forgetting. By using a simple buffer of previous examples, the model can retain knowledge across multiple tasks.

---

### **10. Conclusion**

**Catastrophic Forgetting** is a major challenge in machine learning, especially when dealing with sequential tasks. However, using techniques like **Naïve Replay**, where past examples are replayed during training on new tasks, we can significantly reduce the impact of forgetting.

This code serves as a basic demonstration of how this phenomenon works and how it can be mitigated. The model’s performance improves after incorporating replay, demonstrating that even simple solutions can help overcome catastrophic forgetting in a continual learning setup.

---

### **11. Real Case Scenario To Relate**

Imagine you are using a recommendation system on an e-commerce website. Over time, the system needs to learn your preferences for different types of products. If the system learns about new products but forgets the old ones, it could start recommending irrelevant items. The replay strategy in this code ensures that the system continues to recommend both new and old products effectively, helping it “remember” past preferences while incorporating new ones.

This type of continual learning ensures that machines can keep learning without losing important information, which is essential for real-world applications such as customer recommendations, autonomous driving, and healthcare monitoring systems.



In [2]:
# continual_forgetting_demo.py

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Subset, ConcatDataset
import numpy as np
import random

# ----------------------------
# 1. Hyperparameters & Setup
# ----------------------------
batch_size = 64
epochs = 5
replay_buffer_size = 200    # number of samples to store from Task A
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# ----------------------------
# 2. Data Preparation
# ----------------------------
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

full_train = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
full_test  = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Task A: digits 0–4
idx_a_train = [i for i, (_, y) in enumerate(full_train) if y < 5]
idx_a_test  = [i for i, (_, y) in enumerate(full_test)  if y < 5]
taskA_train = Subset(full_train, idx_a_train)
taskA_test  = Subset(full_test,  idx_a_test)

# Task B: digits 5–9
idx_b_train = [i for i, (_, y) in enumerate(full_train) if y >= 5]
idx_b_test  = [i for i, (_, y) in enumerate(full_test)  if y >= 5]
taskB_train = Subset(full_train, idx_b_train)
taskB_test  = Subset(full_test,  idx_b_test)

loader_A_train = DataLoader(taskA_train, batch_size=batch_size, shuffle=True)
loader_A_test  = DataLoader(taskA_test,  batch_size=batch_size, shuffle=False)
loader_B_train = DataLoader(taskB_train, batch_size=batch_size, shuffle=True)
loader_B_test  = DataLoader(taskB_test,  batch_size=batch_size, shuffle=False)

# ----------------------------
# 3. Model Definition
# ----------------------------
class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 16, 3, 1, 1), nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(16, 32, 3,1,1),  nn.ReLU(),
            nn.MaxPool2d(2),
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32*7*7, 100),
            nn.ReLU(),
            nn.Linear(100, num_classes),
        )
    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)

def train(model, optimizer, criterion, loader):
    model.train()
    running_loss = 0.0
    for x, y in loader:
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        out = model(x)
        loss = criterion(out, y)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * x.size(0)
    return running_loss / len(loader.dataset)

def evaluate(model, loader):
    model.eval()
    correct = total = 0
    with torch.no_grad():
        for x, y in loader:
            x, y = x.to(device), y.to(device)
            out = model(x)
            preds = out.argmax(dim=1)
            correct += (preds == y).sum().item()
            total += y.size(0)
    return 100 * correct / total

# ----------------------------
# 4. Phase I: Sequential Training (Shows Forgetting)
# ----------------------------
model = SimpleCNN().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

print("\n== Training on Task A (digits 0–4) ==")
for ep in range(1, epochs+1):
    loss = train(model, optimizer, criterion, loader_A_train)
    acc  = evaluate(model, loader_A_test)
    print(f"Epoch {ep}/{epochs}  Loss: {loss:.4f}  TaskA Acc: {acc:.2f}%")

print("\n== Evaluate on Task A BEFORE Task B Training ==")
acc_A_before = evaluate(model, loader_A_test)
print(f"Accuracy on Task A: {acc_A_before:.2f}%")

print("\n== Continue Training on Task B (digits 5–9) ==")
for ep in range(1, epochs+1):
    loss = train(model, optimizer, criterion, loader_B_train)
    accB = evaluate(model, loader_B_test)
    print(f"Epoch {ep}/{epochs}  Loss: {loss:.4f}  TaskB Acc: {accB:.2f}%")

print("\n== Evaluate on Task A AFTER Task B Training ==")
acc_A_after = evaluate(model, loader_A_test)
print(f"Accuracy on Task A: {acc_A_after:.2f}%")
print(f"Catastrophic forgetting: ΔAcc = {acc_A_before - acc_A_after:.2f}%")

# ----------------------------
# 5. Phase II: Continual Learning with Naïve Replay
# ----------------------------
print("\n\n== Continual Learning: Replay Buffer ==")
# 5.1 Build replay buffer from Task A
buffer_indices = random.sample(idx_a_train, replay_buffer_size)
replay_buffer = Subset(full_train, buffer_indices)
replay_loader = DataLoader(replay_buffer, batch_size=batch_size, shuffle=True)

# 5.2 New model for fair comparison
model_replay = SimpleCNN().to(device)
optimizer_replay = optim.SGD(model_replay.parameters(), lr=0.01)

# 5.3 Train Task A (again)
for ep in range(1, epochs+1):
    train(model_replay, optimizer_replay, criterion, loader_A_train)

# 5.4 Joint training on Task B + replayed Task A
joint_loader = DataLoader(
    ConcatDataset([replay_buffer, taskB_train]),
    batch_size=batch_size, shuffle=True
)

for ep in range(1, epochs+1):
    loss = train(model_replay, optimizer_replay, criterion, joint_loader)
    accB = evaluate(model_replay, loader_B_test)
    print(f"Epoch {ep}/{epochs}  Loss: {loss:.4f}  TaskB Acc: {accB:.2f}%")

# 5.5 Evaluate on Task A again
acc_A_replay = evaluate(model_replay, loader_A_test)
print(f"\nPost-replay Accuracy on Task A: {acc_A_replay:.2f}%")
print(f"Reduced forgetting: ΔAcc = {acc_A_before - acc_A_replay:.2f}% (vs {acc_A_before - acc_A_after:.2f}% before)")

# ----------------------------
# 6. Final Summary
# ----------------------------
print("\nSummary:")
print(f"- TaskA before TaskB: {acc_A_before:.2f}%")
print(f"- TaskA after TaskB (no replay): {acc_A_after:.2f}%")
print(f"- TaskA after TaskB (with replay): {acc_A_replay:.2f}%")
print("You can see how a small replay buffer helps mitigate catastrophic forgetting.")


Using device: cuda


100%|██████████| 9.91M/9.91M [00:00<00:00, 16.8MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 499kB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 4.56MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 7.99MB/s]



== Training on Task A (digits 0–4) ==
Epoch 1/5  Loss: 0.3664  TaskA Acc: 97.31%
Epoch 2/5  Loss: 0.1075  TaskA Acc: 98.31%
Epoch 3/5  Loss: 0.0833  TaskA Acc: 98.15%
Epoch 4/5  Loss: 0.0691  TaskA Acc: 98.46%
Epoch 5/5  Loss: 0.0571  TaskA Acc: 98.99%

== Evaluate on Task A BEFORE Task B Training ==
Accuracy on Task A: 98.99%

== Continue Training on Task B (digits 5–9) ==
Epoch 1/5  Loss: 0.3037  TaskB Acc: 95.64%
Epoch 2/5  Loss: 0.1054  TaskB Acc: 96.56%
Epoch 3/5  Loss: 0.0781  TaskB Acc: 97.80%
Epoch 4/5  Loss: 0.0637  TaskB Acc: 96.81%
Epoch 5/5  Loss: 0.0545  TaskB Acc: 98.50%

== Evaluate on Task A AFTER Task B Training ==
Accuracy on Task A: 0.35%
Catastrophic forgetting: ΔAcc = 98.64%


== Continual Learning: Replay Buffer ==
Epoch 1/5  Loss: 0.3365  TaskB Acc: 95.41%
Epoch 2/5  Loss: 0.1352  TaskB Acc: 96.40%
Epoch 3/5  Loss: 0.1012  TaskB Acc: 96.05%
Epoch 4/5  Loss: 0.0829  TaskB Acc: 97.57%
Epoch 5/5  Loss: 0.0725  TaskB Acc: 98.07%

Post-replay Accuracy on Task A: 70.8