# Noise-Based Machine Unlearning Methods for Tabular Data

This notebook implements various machine unlearning techniques that add noise to model parameters.

## Methods Implemented:
1. **Gaussian Noise Injection** - Add Gaussian noise to weights
2. **Laplacian Noise Injection** - Add Laplacian noise for differential privacy
3. **Adaptive Noise Scaling** - Scale noise based on parameter importance
4. **Layer-wise Noise Injection** - Different noise levels per layer
5. **Gradient-based Noise** - Noise proportional to gradient magnitudes

## Evaluation Metrics:
- Forget set accuracy (should decrease)
- Retain set accuracy (should maintain)
- Test set accuracy (overall performance)
- Parameter distance from original model


In [1]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader, Subset
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
from copy import deepcopy
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
print(f"PyTorch version: {torch.__version__}")


Using device: cpu
PyTorch version: 2.7.1


## 1. Generate Tabular Dataset


In [None]:
# Generate synthetic tabular dataset for binary classification
n_samples = 2000
n_features = 20
n_informative = 15
n_redundant = 3

X, y = make_classification(
    n_samples=n_samples,
    n_features=n_features,
    n_informative=n_informative,
    n_redundant=n_redundant,
    n_classes=2,
    random_state=42,
    flip_y=0.1  # Add some noise
)

print(f"Dataset shape: {X.shape}")
print(f"Number of features: {n_features}")
print(f"Class distribution: {np.bincount(y)}")

# Split into train, forget, retain, and test sets
# Train set will be split into forget and retain sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Further split training set into forget (10%) and retain (90%) sets
forget_ratio = 0.1
X_retain, X_forget, y_retain, y_forget = train_test_split(
    X_train, y_train, test_size=forget_ratio, random_state=42, stratify=y_train
)

print(f"\nData splits:")
print(f"Retain set: {X_retain.shape[0]} samples")
print(f"Forget set: {X_forget.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")

# Standardize features
scaler = StandardScaler()
X_retain_scaled = scaler.fit_transform(X_retain)
X_forget_scaled = scaler.transform(X_forget)
X_test_scaled = scaler.transform(X_test)

# Convert to PyTorch tensors
X_retain_tensor = torch.FloatTensor(X_retain_scaled).to(device)
y_retain_tensor = torch.LongTensor(y_retain).to(device)
X_forget_tensor = torch.FloatTensor(X_forget_scaled).to(device)
y_forget_tensor = torch.LongTensor(y_forget).to(device)
X_test_tensor = torch.FloatTensor(X_test_scaled).to(device)
y_test_tensor = torch.LongTensor(y_test).to(device)

print("\nData preparation complete!")


Dataset shape: (2000, 20)
Number of features: 20
Class distribution: [ 986 1014]

Data splits:
Retain set: 1440 samples
Forget set: 160 samples
Test set: 400 samples

Data preparation complete!


## 2. Define Neural Network Model


In [None]:
class TabularClassifier(nn.Module):
    """Neural network for tabular binary classification"""
    
    def __init__(self, input_size, hidden_sizes=[64, 32, 16], num_classes=2, dropout=0.3):
        super(TabularClassifier, self).__init__()
        
        layers = []
        prev_size = input_size
        
        # Build hidden layers
        for hidden_size in hidden_sizes:
            layers.append(nn.Linear(prev_size, hidden_size))
            layers.append(nn.BatchNorm1d(hidden_size))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(dropout))
            prev_size = hidden_size
        
        # Output layer
        layers.append(nn.Linear(prev_size, num_classes))
        
        self.network = nn.Sequential(*layers)
        
    def forward(self, x):
        return self.network(x)
    
    def get_layer_names(self):
        """Get names of linear layers for targeted noise injection"""
        return [name for name, module in self.named_modules() if isinstance(module, nn.Linear)]

# Initialize model
model = TabularClassifier(
    input_size=n_features,
    hidden_sizes=[64, 32, 16],
    num_classes=2,
    dropout=0.3
).to(device)

print("Model Architecture:")
print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters())}")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")


## 3. Training and Evaluation Functions


In [None]:
def train_model(model, X_train, y_train, epochs=50, batch_size=32, lr=0.001, verbose=True):
    """Train the model on given data"""
    model.train()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    # Create DataLoader
    train_dataset = TensorDataset(X_train, y_train)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    
    losses = []
    
    for epoch in range(epochs):
        epoch_loss = 0.0
        
        for batch_X, batch_y in train_loader:
            optimizer.zero_grad()
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()
            
            epoch_loss += loss.item()
        
        avg_loss = epoch_loss / len(train_loader)
        losses.append(avg_loss)
        
        if verbose and (epoch + 1) % 10 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}')
    
    return losses

def evaluate_model(model, X, y):
    """Evaluate model accuracy"""
    model.eval()
    with torch.no_grad():
        outputs = model(X)
        _, predicted = torch.max(outputs.data, 1)
        accuracy = 100 * (predicted == y).sum().item() / y.size(0)
    return accuracy

def get_model_state_dict_copy(model):
    """Get a deep copy of model's state dict"""
    return {name: param.clone().detach() for name, param in model.state_dict().items()}

def calculate_parameter_distance(state_dict1, state_dict2):
    """Calculate L2 distance between two model state dicts"""
    distance = 0.0
    for key in state_dict1.keys():
        if 'weight' in key or 'bias' in key:
            distance += torch.norm(state_dict1[key] - state_dict2[key]).item() ** 2
    return np.sqrt(distance)

print("Training and evaluation functions defined!")


## 4. Train Original Model


In [None]:
# Train on full training set (retain + forget)
X_full_train = torch.cat([X_retain_tensor, X_forget_tensor], dim=0)
y_full_train = torch.cat([y_retain_tensor, y_forget_tensor], dim=0)

print("Training original model on full training set...")
train_losses = train_model(model, X_full_train, y_full_train, epochs=100, lr=0.001)

# Evaluate original model
retain_acc = evaluate_model(model, X_retain_tensor, y_retain_tensor)
forget_acc = evaluate_model(model, X_forget_tensor, y_forget_tensor)
test_acc = evaluate_model(model, X_test_tensor, y_test_tensor)

print("\n=== Original Model Performance ===")
print(f"Retain Set Accuracy: {retain_acc:.2f}%")
print(f"Forget Set Accuracy: {forget_acc:.2f}%")
print(f"Test Set Accuracy: {test_acc:.2f}%")

# Save original model state
original_model_state = get_model_state_dict_copy(model)

# Plot training loss
plt.figure(figsize=(10, 4))
plt.plot(train_losses)
plt.title('Original Model Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid(True, alpha=0.3)
plt.show()


## 5. Machine Unlearning Methods

### Method 1: Gaussian Noise Injection


In [None]:
def gaussian_noise_unlearning(model, sigma=0.01):
    """
    Add Gaussian noise to all model parameters.
    
    Args:
        model: PyTorch model
        sigma: Standard deviation of Gaussian noise
    """
    unlearned_model = deepcopy(model)
    
    with torch.no_grad():
        for name, param in unlearned_model.named_parameters():
            if 'weight' in name or 'bias' in name:
                # Add Gaussian noise
                noise = torch.randn_like(param) * sigma
                param.add_(noise)
    
    return unlearned_model

# Test different noise levels
sigma_values = [0.001, 0.005, 0.01, 0.05, 0.1]
gaussian_results = []

print("Testing Gaussian Noise Unlearning...\n")

for sigma in sigma_values:
    # Apply unlearning
    unlearned_model = gaussian_noise_unlearning(model, sigma=sigma)
    
    # Evaluate
    retain_acc = evaluate_model(unlearned_model, X_retain_tensor, y_retain_tensor)
    forget_acc = evaluate_model(unlearned_model, X_forget_tensor, y_forget_tensor)
    test_acc = evaluate_model(unlearned_model, X_test_tensor, y_test_tensor)
    
    # Calculate parameter distance
    param_dist = calculate_parameter_distance(
        original_model_state,
        get_model_state_dict_copy(unlearned_model)
    )
    
    gaussian_results.append({
        'sigma': sigma,
        'retain_acc': retain_acc,
        'forget_acc': forget_acc,
        'test_acc': test_acc,
        'param_dist': param_dist
    })
    
    print(f"Sigma={sigma:.3f}: Retain={retain_acc:.2f}%, Forget={forget_acc:.2f}%, Test={test_acc:.2f}%, Dist={param_dist:.4f}")

# Convert to DataFrame for easier analysis
gaussian_df = pd.DataFrame(gaussian_results)
print("\nGaussian Noise Unlearning Results:")
print(gaussian_df)


### Method 2: Laplacian Noise Injection


In [None]:
def laplacian_noise_unlearning(model, scale=0.01):
    """
    Add Laplacian noise to model parameters (better for differential privacy).
    
    Args:
        model: PyTorch model
        scale: Scale parameter of Laplacian distribution
    """
    unlearned_model = deepcopy(model)
    
    with torch.no_grad():
        for name, param in unlearned_model.named_parameters():
            if 'weight' in name or 'bias' in name:
                # Add Laplacian noise
                noise = torch.from_numpy(
                    np.random.laplace(0, scale, param.shape)
                ).float().to(param.device)
                param.add_(noise)
    
    return unlearned_model

# Test different scale values
scale_values = [0.001, 0.005, 0.01, 0.05, 0.1]
laplacian_results = []

print("Testing Laplacian Noise Unlearning...\n")

for scale in scale_values:
    # Apply unlearning
    unlearned_model = laplacian_noise_unlearning(model, scale=scale)
    
    # Evaluate
    retain_acc = evaluate_model(unlearned_model, X_retain_tensor, y_retain_tensor)
    forget_acc = evaluate_model(unlearned_model, X_forget_tensor, y_forget_tensor)
    test_acc = evaluate_model(unlearned_model, X_test_tensor, y_test_tensor)
    
    # Calculate parameter distance
    param_dist = calculate_parameter_distance(
        original_model_state,
        get_model_state_dict_copy(unlearned_model)
    )
    
    laplacian_results.append({
        'scale': scale,
        'retain_acc': retain_acc,
        'forget_acc': forget_acc,
        'test_acc': test_acc,
        'param_dist': param_dist
    })
    
    print(f"Scale={scale:.3f}: Retain={retain_acc:.2f}%, Forget={forget_acc:.2f}%, Test={test_acc:.2f}%, Dist={param_dist:.4f}")

laplacian_df = pd.DataFrame(laplacian_results)
print("\nLaplacian Noise Unlearning Results:")
print(laplacian_df)


### Method 3: Adaptive Noise Scaling (Parameter Importance-Based)


In [None]:
def adaptive_noise_unlearning(model, X_forget, y_forget, base_sigma=0.01, importance_weight=2.0):
    """
    Add noise scaled by parameter importance (gradient magnitude on forget set).
    Parameters with larger gradients on forget set get more noise.
    
    Args:
        model: PyTorch model
        X_forget: Forget set features
        y_forget: Forget set labels
        base_sigma: Base noise level
        importance_weight: How much to weight importance in noise scaling
    """
    unlearned_model = deepcopy(model)
    unlearned_model.eval()
    
    # Compute gradients on forget set to measure parameter importance
    criterion = nn.CrossEntropyLoss()
    unlearned_model.zero_grad()
    
    outputs = unlearned_model(X_forget)
    loss = criterion(outputs, y_forget)
    loss.backward()
    
    # Store gradient magnitudes
    grad_magnitudes = {}
    for name, param in unlearned_model.named_parameters():
        if param.grad is not None and ('weight' in name or 'bias' in name):
            grad_magnitudes[name] = param.grad.abs().mean().item()
    
    # Normalize gradient magnitudes
    max_grad = max(grad_magnitudes.values()) if grad_magnitudes else 1.0
    
    # Add adaptive noise
    with torch.no_grad():
        for name, param in unlearned_model.named_parameters():
            if 'weight' in name or 'bias' in name:
                # Scale noise by gradient magnitude (importance)
                if name in grad_magnitudes:
                    importance = grad_magnitudes[name] / max_grad
                    adaptive_sigma = base_sigma * (1 + importance_weight * importance)
                else:
                    adaptive_sigma = base_sigma
                
                noise = torch.randn_like(param) * adaptive_sigma
                param.add_(noise)
    
    return unlearned_model

# Test different base sigma values
base_sigma_values = [0.001, 0.005, 0.01, 0.05]
adaptive_results = []

print("Testing Adaptive Noise Unlearning...\n")

for base_sigma in base_sigma_values:
    # Apply unlearning
    unlearned_model = adaptive_noise_unlearning(
        model, X_forget_tensor, y_forget_tensor,
        base_sigma=base_sigma, importance_weight=2.0
    )
    
    # Evaluate
    retain_acc = evaluate_model(unlearned_model, X_retain_tensor, y_retain_tensor)
    forget_acc = evaluate_model(unlearned_model, X_forget_tensor, y_forget_tensor)
    test_acc = evaluate_model(unlearned_model, X_test_tensor, y_test_tensor)
    
    # Calculate parameter distance
    param_dist = calculate_parameter_distance(
        original_model_state,
        get_model_state_dict_copy(unlearned_model)
    )
    
    adaptive_results.append({
        'base_sigma': base_sigma,
        'retain_acc': retain_acc,
        'forget_acc': forget_acc,
        'test_acc': test_acc,
        'param_dist': param_dist
    })
    
    print(f"Base Sigma={base_sigma:.3f}: Retain={retain_acc:.2f}%, Forget={forget_acc:.2f}%, Test={test_acc:.2f}%, Dist={param_dist:.4f}")

adaptive_df = pd.DataFrame(adaptive_results)
print("\nAdaptive Noise Unlearning Results:")
print(adaptive_df)


### Method 4: Layer-wise Noise Injection


In [None]:
def layerwise_noise_unlearning(model, layer_sigmas=None):
    """
    Add different noise levels to different layers.
    Typically, add more noise to later layers (closer to output).
    
    Args:
        model: PyTorch model
        layer_sigmas: Dict mapping layer indices to sigma values
                     If None, uses increasing sigma for later layers
    """
    unlearned_model = deepcopy(model)
    
    # Get all linear layers
    linear_layers = [(name, module) for name, module in unlearned_model.named_modules() 
                     if isinstance(module, nn.Linear)]
    
    num_layers = len(linear_layers)
    
    # Default: increasing noise for later layers
    if layer_sigmas is None:
        layer_sigmas = {i: 0.005 * (i + 1) for i in range(num_layers)}
    
    with torch.no_grad():
        for layer_idx, (name, layer) in enumerate(linear_layers):
            sigma = layer_sigmas.get(layer_idx, 0.01)
            
            # Add noise to weights
            if layer.weight is not None:
                noise = torch.randn_like(layer.weight) * sigma
                layer.weight.add_(noise)
            
            # Add noise to bias
            if layer.bias is not None:
                noise = torch.randn_like(layer.bias) * sigma
                layer.bias.add_(noise)
    
    return unlearned_model

# Test different layer-wise strategies
strategies = [
    {'name': 'Uniform', 'sigmas': {0: 0.01, 1: 0.01, 2: 0.01, 3: 0.01}},
    {'name': 'Increasing', 'sigmas': {0: 0.005, 1: 0.01, 2: 0.015, 3: 0.02}},
    {'name': 'Decreasing', 'sigmas': {0: 0.02, 1: 0.015, 2: 0.01, 3: 0.005}},
    {'name': 'Output-heavy', 'sigmas': {0: 0.005, 1: 0.005, 2: 0.01, 3: 0.03}},
]

layerwise_results = []

print("Testing Layer-wise Noise Unlearning...\n")

for strategy in strategies:
    # Apply unlearning
    unlearned_model = layerwise_noise_unlearning(model, layer_sigmas=strategy['sigmas'])
    
    # Evaluate
    retain_acc = evaluate_model(unlearned_model, X_retain_tensor, y_retain_tensor)
    forget_acc = evaluate_model(unlearned_model, X_forget_tensor, y_forget_tensor)
    test_acc = evaluate_model(unlearned_model, X_test_tensor, y_test_tensor)
    
    # Calculate parameter distance
    param_dist = calculate_parameter_distance(
        original_model_state,
        get_model_state_dict_copy(unlearned_model)
    )
    
    layerwise_results.append({
        'strategy': strategy['name'],
        'retain_acc': retain_acc,
        'forget_acc': forget_acc,
        'test_acc': test_acc,
        'param_dist': param_dist
    })
    
    print(f"{strategy['name']:15s}: Retain={retain_acc:.2f}%, Forget={forget_acc:.2f}%, Test={test_acc:.2f}%, Dist={param_dist:.4f}")

layerwise_df = pd.DataFrame(layerwise_results)
print("\nLayer-wise Noise Unlearning Results:")
print(layerwise_df)


### Method 5: Gradient-based Noise (Noise Proportional to Gradient)


In [None]:
def gradient_based_noise_unlearning(model, X_forget, y_forget, noise_multiplier=0.1):
    """
    Add noise proportional to gradient magnitude on forget set.
    This targets parameters most responsible for remembering the forget set.
    
    Args:
        model: PyTorch model
        X_forget: Forget set features
        y_forget: Forget set labels
        noise_multiplier: Multiplier for gradient-based noise
    """
    unlearned_model = deepcopy(model)
    unlearned_model.eval()
    
    # Compute gradients on forget set
    criterion = nn.CrossEntropyLoss()
    unlearned_model.zero_grad()
    
    outputs = unlearned_model(X_forget)
    loss = criterion(outputs, y_forget)
    loss.backward()
    
    # Add noise proportional to gradient
    with torch.no_grad():
        for name, param in unlearned_model.named_parameters():
            if param.grad is not None and ('weight' in name or 'bias' in name):
                # Noise magnitude proportional to gradient magnitude
                grad_magnitude = param.grad.abs()
                noise = torch.randn_like(param) * grad_magnitude * noise_multiplier
                param.add_(noise)
    
    return unlearned_model

# Test different noise multipliers
noise_multipliers = [0.05, 0.1, 0.2, 0.5, 1.0]
gradient_results = []

print("Testing Gradient-based Noise Unlearning...\n")

for multiplier in noise_multipliers:
    # Apply unlearning
    unlearned_model = gradient_based_noise_unlearning(
        model, X_forget_tensor, y_forget_tensor,
        noise_multiplier=multiplier
    )
    
    # Evaluate
    retain_acc = evaluate_model(unlearned_model, X_retain_tensor, y_retain_tensor)
    forget_acc = evaluate_model(unlearned_model, X_forget_tensor, y_forget_tensor)
    test_acc = evaluate_model(unlearned_model, X_test_tensor, y_test_tensor)
    
    # Calculate parameter distance
    param_dist = calculate_parameter_distance(
        original_model_state,
        get_model_state_dict_copy(unlearned_model)
    )
    
    gradient_results.append({
        'multiplier': multiplier,
        'retain_acc': retain_acc,
        'forget_acc': forget_acc,
        'test_acc': test_acc,
        'param_dist': param_dist
    })
    
    print(f"Multiplier={multiplier:.2f}: Retain={retain_acc:.2f}%, Forget={forget_acc:.2f}%, Test={test_acc:.2f}%, Dist={param_dist:.4f}")

gradient_df = pd.DataFrame(gradient_results)
print("\nGradient-based Noise Unlearning Results:")
print(gradient_df)


## 6. Comparison and Visualization


In [None]:
# Create comparison plots
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
fig.suptitle('Comparison of Noise-Based Machine Unlearning Methods', fontsize=16, y=1.00)

# 1. Gaussian Noise
ax = axes[0, 0]
ax.plot(gaussian_df['sigma'], gaussian_df['retain_acc'], 'o-', label='Retain', linewidth=2)
ax.plot(gaussian_df['sigma'], gaussian_df['forget_acc'], 's-', label='Forget', linewidth=2)
ax.plot(gaussian_df['sigma'], gaussian_df['test_acc'], '^-', label='Test', linewidth=2)
ax.set_xlabel('Sigma')
ax.set_ylabel('Accuracy (%)')
ax.set_title('Gaussian Noise')
ax.legend()
ax.grid(True, alpha=0.3)

# 2. Laplacian Noise
ax = axes[0, 1]
ax.plot(laplacian_df['scale'], laplacian_df['retain_acc'], 'o-', label='Retain', linewidth=2)
ax.plot(laplacian_df['scale'], laplacian_df['forget_acc'], 's-', label='Forget', linewidth=2)
ax.plot(laplacian_df['scale'], laplacian_df['test_acc'], '^-', label='Test', linewidth=2)
ax.set_xlabel('Scale')
ax.set_ylabel('Accuracy (%)')
ax.set_title('Laplacian Noise')
ax.legend()
ax.grid(True, alpha=0.3)

# 3. Adaptive Noise
ax = axes[0, 2]
ax.plot(adaptive_df['base_sigma'], adaptive_df['retain_acc'], 'o-', label='Retain', linewidth=2)
ax.plot(adaptive_df['base_sigma'], adaptive_df['forget_acc'], 's-', label='Forget', linewidth=2)
ax.plot(adaptive_df['base_sigma'], adaptive_df['test_acc'], '^-', label='Test', linewidth=2)
ax.set_xlabel('Base Sigma')
ax.set_ylabel('Accuracy (%)')
ax.set_title('Adaptive Noise')
ax.legend()
ax.grid(True, alpha=0.3)

# 4. Layer-wise Noise
ax = axes[1, 0]
x_pos = np.arange(len(layerwise_df))
width = 0.25
ax.bar(x_pos - width, layerwise_df['retain_acc'], width, label='Retain')
ax.bar(x_pos, layerwise_df['forget_acc'], width, label='Forget')
ax.bar(x_pos + width, layerwise_df['test_acc'], width, label='Test')
ax.set_xlabel('Strategy')
ax.set_ylabel('Accuracy (%)')
ax.set_title('Layer-wise Noise')
ax.set_xticks(x_pos)
ax.set_xticklabels(layerwise_df['strategy'], rotation=45, ha='right')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# 5. Gradient-based Noise
ax = axes[1, 1]
ax.plot(gradient_df['multiplier'], gradient_df['retain_acc'], 'o-', label='Retain', linewidth=2)
ax.plot(gradient_df['multiplier'], gradient_df['forget_acc'], 's-', label='Forget', linewidth=2)
ax.plot(gradient_df['multiplier'], gradient_df['test_acc'], '^-', label='Test', linewidth=2)
ax.set_xlabel('Noise Multiplier')
ax.set_ylabel('Accuracy (%)')
ax.set_title('Gradient-based Noise')
ax.legend()
ax.grid(True, alpha=0.3)

# 6. Parameter Distance Comparison
ax = axes[1, 2]
methods = ['Gaussian\n(σ=0.01)', 'Laplacian\n(s=0.01)', 'Adaptive\n(σ=0.01)', 
           'Layer-wise\n(Increasing)', 'Gradient\n(m=0.1)']
distances = [
    gaussian_df[gaussian_df['sigma'] == 0.01]['param_dist'].values[0],
    laplacian_df[laplacian_df['scale'] == 0.01]['param_dist'].values[0],
    adaptive_df[adaptive_df['base_sigma'] == 0.01]['param_dist'].values[0],
    layerwise_df[layerwise_df['strategy'] == 'Increasing']['param_dist'].values[0],
    gradient_df[gradient_df['multiplier'] == 0.1]['param_dist'].values[0]
]
ax.bar(methods, distances, color=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'])
ax.set_ylabel('Parameter Distance')
ax.set_title('Parameter Distance from Original')
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()


## 7. Summary Table and Metrics


In [None]:
# Create summary table with best configuration from each method
summary_data = [
    {
        'Method': 'Original Model',
        'Retain Acc': evaluate_model(model, X_retain_tensor, y_retain_tensor),
        'Forget Acc': evaluate_model(model, X_forget_tensor, y_forget_tensor),
        'Test Acc': evaluate_model(model, X_test_tensor, y_test_tensor),
        'Param Dist': 0.0,
        'Config': 'N/A'
    },
    {
        'Method': 'Gaussian Noise',
        'Retain Acc': gaussian_df.iloc[2]['retain_acc'],
        'Forget Acc': gaussian_df.iloc[2]['forget_acc'],
        'Test Acc': gaussian_df.iloc[2]['test_acc'],
        'Param Dist': gaussian_df.iloc[2]['param_dist'],
        'Config': f"σ={gaussian_df.iloc[2]['sigma']}"
    },
    {
        'Method': 'Laplacian Noise',
        'Retain Acc': laplacian_df.iloc[2]['retain_acc'],
        'Forget Acc': laplacian_df.iloc[2]['forget_acc'],
        'Test Acc': laplacian_df.iloc[2]['test_acc'],
        'Param Dist': laplacian_df.iloc[2]['param_dist'],
        'Config': f"scale={laplacian_df.iloc[2]['scale']}"
    },
    {
        'Method': 'Adaptive Noise',
        'Retain Acc': adaptive_df.iloc[2]['retain_acc'],
        'Forget Acc': adaptive_df.iloc[2]['forget_acc'],
        'Test Acc': adaptive_df.iloc[2]['test_acc'],
        'Param Dist': adaptive_df.iloc[2]['param_dist'],
        'Config': f"base_σ={adaptive_df.iloc[2]['base_sigma']}"
    },
    {
        'Method': 'Layer-wise Noise',
        'Retain Acc': layerwise_df.iloc[1]['retain_acc'],
        'Forget Acc': layerwise_df.iloc[1]['forget_acc'],
        'Test Acc': layerwise_df.iloc[1]['test_acc'],
        'Param Dist': layerwise_df.iloc[1]['param_dist'],
        'Config': layerwise_df.iloc[1]['strategy']
    },
    {
        'Method': 'Gradient-based Noise',
        'Retain Acc': gradient_df.iloc[1]['retain_acc'],
        'Forget Acc': gradient_df.iloc[1]['forget_acc'],
        'Test Acc': gradient_df.iloc[1]['test_acc'],
        'Param Dist': gradient_df.iloc[1]['param_dist'],
        'Config': f"mult={gradient_df.iloc[1]['multiplier']}"
    }
]

summary_df = pd.DataFrame(summary_data)

print("\n" + "="*80)
print("SUMMARY: Noise-Based Machine Unlearning Methods")
print("="*80)
print(summary_df.to_string(index=False))
print("="*80)

# Calculate unlearning metrics
print("\nUnlearning Effectiveness Metrics:")
print("-" * 80)
original_forget_acc = summary_df.iloc[0]['Forget Acc']

for idx, row in summary_df.iterrows():
    if idx == 0:
        continue
    
    forget_drop = original_forget_acc - row['Forget Acc']
    retain_drop = summary_df.iloc[0]['Retain Acc'] - row['Retain Acc']
    test_drop = summary_df.iloc[0]['Test Acc'] - row['Test Acc']
    
    # Unlearning quality score: high forget drop, low retain drop
    quality_score = forget_drop - retain_drop
    
    print(f"{row['Method']:25s} | Forget Drop: {forget_drop:6.2f}% | Retain Drop: {retain_drop:6.2f}% | Quality: {quality_score:6.2f}")


## 8. Retrain from Scratch (Gold Standard Baseline)


In [None]:
# Train a new model from scratch on only the retain set
# This is the gold standard for machine unlearning
retrain_model = TabularClassifier(
    input_size=n_features,
    hidden_sizes=[64, 32, 16],
    num_classes=2,
    dropout=0.3
).to(device)

print("Training model from scratch on retain set only...")
retrain_losses = train_model(retrain_model, X_retain_tensor, y_retain_tensor, 
                             epochs=100, lr=0.001, verbose=False)

# Evaluate retrained model
retrain_retain_acc = evaluate_model(retrain_model, X_retain_tensor, y_retain_tensor)
retrain_forget_acc = evaluate_model(retrain_model, X_forget_tensor, y_forget_tensor)
retrain_test_acc = evaluate_model(retrain_model, X_test_tensor, y_test_tensor)

print("\n=== Retrained Model (Gold Standard) ===")
print(f"Retain Set Accuracy: {retrain_retain_acc:.2f}%")
print(f"Forget Set Accuracy: {retrain_forget_acc:.2f}%")
print(f"Test Set Accuracy: {retrain_test_acc:.2f}%")

# Add to summary
print("\n" + "="*80)
print("Comparison with Gold Standard (Retrain from Scratch):")
print("="*80)
print(f"{'Method':<25s} | {'Retain Acc':>10s} | {'Forget Acc':>10s} | {'Test Acc':>10s}")
print("-" * 80)
print(f"{'Retrain (Gold Standard)':<25s} | {retrain_retain_acc:>9.2f}% | {retrain_forget_acc:>9.2f}% | {retrain_test_acc:>9.2f}%")
for idx, row in summary_df.iterrows():
    if idx == 0:
        continue
    print(f"{row['Method']:<25s} | {row['Retain Acc']:>9.2f}% | {row['Forget Acc']:>9.2f}% | {row['Test Acc']:>9.2f}%")


## 9. Unlearning Effectiveness Visualization


In [None]:
# Create a comprehensive visualization of unlearning effectiveness
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Unlearning Effectiveness Analysis', fontsize=16)

# 1. Forget vs Retain Accuracy Trade-off
ax = axes[0, 0]
for idx, row in summary_df.iterrows():
    if idx == 0:
        ax.scatter(row['Retain Acc'], row['Forget Acc'], s=200, marker='*', 
                  c='red', label='Original', zorder=5, edgecolors='black', linewidth=2)
    else:
        ax.scatter(row['Retain Acc'], row['Forget Acc'], s=100, 
                  label=row['Method'], alpha=0.7, edgecolors='black', linewidth=1)

ax.scatter(retrain_retain_acc, retrain_forget_acc, s=200, marker='D', 
          c='green', label='Retrain (Gold)', zorder=5, edgecolors='black', linewidth=2)
ax.set_xlabel('Retain Set Accuracy (%)', fontsize=11)
ax.set_ylabel('Forget Set Accuracy (%)', fontsize=11)
ax.set_title('Forget vs Retain Accuracy Trade-off\n(Lower-Right is Better)', fontsize=12)
ax.legend(fontsize=8, loc='best')
ax.grid(True, alpha=0.3)

# 2. Unlearning Quality Score
ax = axes[0, 1]
methods_list = []
quality_scores = []
colors_list = []

for idx, row in summary_df.iterrows():
    if idx == 0:
        continue
    forget_drop = summary_df.iloc[0]['Forget Acc'] - row['Forget Acc']
    retain_drop = summary_df.iloc[0]['Retain Acc'] - row['Retain Acc']
    quality = forget_drop - retain_drop
    methods_list.append(row['Method'])
    quality_scores.append(quality)
    colors_list.append('#2ca02c' if quality > 0 else '#d62728')

bars = ax.barh(methods_list, quality_scores, color=colors_list, alpha=0.7, edgecolor='black')
ax.axvline(x=0, color='black', linestyle='--', linewidth=1)
ax.set_xlabel('Quality Score (Forget Drop - Retain Drop)', fontsize=11)
ax.set_title('Unlearning Quality Score\n(Higher is Better)', fontsize=12)
ax.grid(True, alpha=0.3, axis='x')

# 3. Accuracy Changes from Original
ax = axes[1, 0]
x_pos = np.arange(len(methods_list))
width = 0.35

forget_drops = [summary_df.iloc[0]['Forget Acc'] - summary_df.iloc[i+1]['Forget Acc'] for i in range(len(methods_list))]
retain_drops = [summary_df.iloc[0]['Retain Acc'] - summary_df.iloc[i+1]['Retain Acc'] for i in range(len(methods_list))]

ax.bar(x_pos - width/2, forget_drops, width, label='Forget Acc Drop', color='#ff7f0e', alpha=0.7, edgecolor='black')
ax.bar(x_pos + width/2, retain_drops, width, label='Retain Acc Drop', color='#1f77b4', alpha=0.7, edgecolor='black')

ax.set_ylabel('Accuracy Drop (%)', fontsize=11)
ax.set_title('Accuracy Changes from Original Model\n(Higher Forget Drop is Better)', fontsize=12)
ax.set_xticks(x_pos)
ax.set_xticklabels(methods_list, rotation=45, ha='right', fontsize=9)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
ax.axhline(y=0, color='black', linestyle='-', linewidth=0.5)

# 4. Parameter Distance vs Unlearning Effectiveness
ax = axes[1, 1]
param_dists = [summary_df.iloc[i+1]['Param Dist'] for i in range(len(methods_list))]
forget_accs = [summary_df.iloc[i+1]['Forget Acc'] for i in range(len(methods_list))]

scatter = ax.scatter(param_dists, forget_accs, s=100, c=quality_scores, 
                    cmap='RdYlGn', alpha=0.7, edgecolors='black', linewidth=1)

# Add method labels
for i, method in enumerate(methods_list):
    ax.annotate(method.split()[0], (param_dists[i], forget_accs[i]), 
               fontsize=8, ha='center', va='bottom')

ax.set_xlabel('Parameter Distance from Original', fontsize=11)
ax.set_ylabel('Forget Set Accuracy (%)', fontsize=11)
ax.set_title('Parameter Distance vs Forget Accuracy\n(Lower Forget Acc is Better)', fontsize=12)
ax.grid(True, alpha=0.3)
plt.colorbar(scatter, ax=ax, label='Quality Score')

plt.tight_layout()
plt.show()


## 10. Key Findings and Recommendations

### Understanding the Results:

1. **Forget Set Accuracy**: Should decrease after unlearning (model "forgets" this data)
2. **Retain Set Accuracy**: Should remain high (model maintains knowledge of retained data)
3. **Test Set Accuracy**: Overall model performance indicator
4. **Parameter Distance**: How much the model changed from original

### Method Characteristics:

- **Gaussian Noise**: Simple, uniform perturbation across all parameters. Easy to implement and tune.
- **Laplacian Noise**: Better for differential privacy guarantees due to heavier tails. Provides formal privacy bounds.
- **Adaptive Noise**: Targets important parameters (high gradients on forget set) more aggressively. Better trade-off between forgetting and retaining.
- **Layer-wise Noise**: Allows fine-grained control per layer. Can focus noise on output layers that directly influence predictions.
- **Gradient-based Noise**: Focuses on parameters most responsible for forget set. Noise magnitude scales with parameter importance.

### Trade-offs:

- **More noise** → Better forgetting but worse retain/test performance
- **Less noise** → Better retain/test performance but less effective forgetting
- **Adaptive methods** → Better balance between forgetting and retaining
- **Computational cost**: Gradient-based and adaptive methods require forward/backward pass on forget set

### Best Practices:

1. **Start small**: Begin with small noise levels and increase gradually
2. **Monitor both sets**: Track both forget and retain set performance
3. **Use adaptive methods**: For better trade-offs, consider gradient-based or adaptive noise
4. **Compare to baseline**: Always compare against retrain-from-scratch baseline
5. **Consider privacy**: Use Laplacian noise if differential privacy guarantees are needed
6. **Layer targeting**: Focus noise on later layers (closer to output) for more targeted forgetting

### When to Use Each Method:

- **Gaussian Noise**: Quick experiments, baseline comparisons
- **Laplacian Noise**: When differential privacy is required
- **Adaptive Noise**: When you want to minimize impact on retained data
- **Layer-wise Noise**: When you know which layers are most important
- **Gradient-based Noise**: When you want targeted, efficient unlearning

### Limitations:

1. Noise-based methods are **approximate** - they don't guarantee complete removal of information
2. May require careful tuning of noise levels
3. Can degrade overall model performance
4. No formal guarantees about what information is removed
5. May not work well for very small forget sets or highly correlated data
