# Day 74: Unlearning and Forgetting

Machine Unlearning seeks to remove specific training data or concepts from a model without retraining from scratch. This is crucial for privacy (Right to be Forgotten) and safety (removing dangerous knowledge).

In this lab, we simulate:
1. **Knowledge Erasure**: Decreasing model confidence in a 'forget set'.
2. **Utility Preservation**: Ensuring unrelated knowledge (the 'retain set') stays intact.
3. **Evaluations**: Quantifying the success of the unlearning process.

In [None]:
import sys
import os
import numpy as np

# Add root directory to sys.path
sys.path.append(os.path.abspath('../../'))

from src.alignment.unlearning import UnlearningManager

## 1. Initial Knowledge State

Observe the model's confidence in harmful vs. benign concepts.

In [None]:
manager = UnlearningManager()

concepts = ["physics", "harmful_chemistry", "general_knowledge"]
for c in concepts:
    print(f"{c.capitalize()} Confidence: {manager.predict_confidence(c):.2f}")

## 2. Perform Unlearning

We target `harmful_chemistry` for removal while trying to retain `physics`.

In [None]:
print("--- Executing Unlearning Steps ---")
for _ in range(3):
    manager.unlearn_step(forget_set_concepts=["harmful_chemistry"], retain_set_concepts=["physics"])
    print("Unlearning step complete.")

## 3. Verify Results

Did we forget what we intended without breaking everything else?

In [None]:
print("Final Knowledge State:")
for c in concepts:
    confidence = manager.predict_confidence(c)
    status = "FORGOTTEN" if confidence < 0.3 else "RETAINED"
    print(f"{c.capitalize():<18}: {confidence:.2f} ({status})")

success = manager.evaluate_unlearning("harmful_chemistry")
print(f"\nUnlearning Success: {success}")

## Key Challenge: Catastrophic Forgetting

Notice that `physics` knowledge slightly decreased. This is 'catastrophic forgetting' or 'collateral damage'. True unlearning aims to achieve a surgical strike on specific knowledge while leaving the rest of the model's utility at 100%.