# SPINN - Structured Pruning Workflow

This notebook implements TRUE structured pruning for achieving 2-3x GPU speedup.

**Key difference from before:**
- ‚ùå Old: Unstructured pruning (zeros in weights) ‚Üí 0.09x speedup (FAILED)
- ‚úÖ New: Structured pruning (remove neurons) ‚Üí 2-3x speedup (EXPECTED)

**Timeline:** 
- Cells 1-3: Setup & data loading (5 min)
- Cell 4: Train dense baseline (30-40 min) - OR load existing
- Cell 5: Structured pruning (60-90 min)
- Cell 6-7: Convert & benchmark (5 min)

**IMPORTANT:** Run `git pull` in Jupyter terminal first to get new files!

## Cell 1: Environment Setup

In [None]:
# Navigate to SPINN directory and pull latest code
import os
os.chdir('/home/jupyter-ksenthilkumar/SPINN')

# Pull latest changes (includes structured_pruning.py)
!git pull origin main

# Verify new file exists
!ls -la models/structured_pruning.py

print("\n‚úÖ Ready to proceed!")

## Cell 2: Import Libraries

In [None]:
import sys
sys.path.append('/home/jupyter-ksenthilkumar/SPINN')

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, mean_squared_error
from torch.utils.data import DataLoader, TensorDataset

from models.dense_pinn import DensePINN
from models.structured_pruning import structured_prune_and_finetune
from models.sparse_pinn import convert_dense_to_sparse

# Device setup
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Device: {device}")
if device == 'cuda':
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## Cell 3: Load and Preprocess Data

In [None]:
# Load NASA milling dataset
data_path = '/home/jupyter-ksenthilkumar/SPINN/data/mill.csv'
df = pd.read_csv(data_path)

print(f"Dataset shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")

# Select features and targets
feature_cols = ['X1_ActualPosition', 'X1_ActualVelocity', 'X1_ActualAcceleration',
                'X1_CommandPosition', 'X1_CommandVelocity', 'X1_CommandAcceleration',
                'X1_CurrentFeedback', 'X1_DCBusVoltage', 'X1_OutputCurrent',
                'Y1_ActualPosition', 'Y1_ActualVelocity', 'Y1_ActualAcceleration',
                'Y1_CommandPosition', 'Y1_CommandVelocity', 'Y1_CommandAcceleration',
                'Y1_CurrentFeedback', 'Y1_DCBusVoltage', 'Y1_OutputCurrent']

target_cols = ['Z1_ActualPosition', 'Z1_CurrentFeedback']

X = df[feature_cols].values
y = df[target_cols].values

# Train/val/test split
X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.133, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.164, random_state=42)

# Normalize
scaler_X = StandardScaler()
scaler_y = StandardScaler()

X_train = scaler_X.fit_transform(X_train)
X_val = scaler_X.transform(X_val)
X_test = scaler_X.transform(X_test)

y_train = scaler_y.fit_transform(y_train)
y_val = scaler_y.transform(y_val)
y_test = scaler_y.transform(y_test)

# Convert to tensors
X_train_tensor = torch.FloatTensor(X_train).to(device)
y_train_tensor = torch.FloatTensor(y_train).to(device)
X_val_tensor = torch.FloatTensor(X_val).to(device)
y_val_tensor = torch.FloatTensor(y_val).to(device)
X_test_tensor = torch.FloatTensor(X_test).to(device)
y_test_tensor = torch.FloatTensor(y_test).to(device)

# Create data loaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)

train_loader = DataLoader(train_dataset, batch_size=256, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=256, shuffle=False)

print(f"\n‚úÖ Data loaded successfully")
print(f"Train: {X_train.shape[0]} samples")
print(f"Val: {X_val.shape[0]} samples")
print(f"Test: {X_test.shape[0]} samples")

## Cell 4: Load Dense Baseline Model

**Option A:** Load existing trained model (RECOMMENDED - saves 30-40 min)  
**Option B:** Train from scratch (if you don't have saved model)

In [None]:
# Option A: Load existing model (RECOMMENDED)
dense_model_path = '/home/jupyter-ksenthilkumar/SPINN/models/saved/dense_pinn.pth'

try:
    dense_model = torch.load(dense_model_path)
    dense_model = dense_model.to(device)
    print("‚úÖ Loaded existing dense model")
    
    # Verify
    dense_model.eval()
    with torch.no_grad():
        val_pred = dense_model(X_val_tensor)
        val_r2 = r2_score(y_val_tensor.cpu().numpy(), val_pred.cpu().numpy())
    print(f"Dense model R¬≤: {val_r2:.4f}")
    
except FileNotFoundError:
    print("‚ö†Ô∏è Model not found. Use Option B below to train from scratch.")

# Option B: Train from scratch (UNCOMMENT IF NEEDED)
# dense_model = DensePINN(input_dim=18, hidden_dims=[512, 512, 512, 256], output_dim=2).to(device)
# optimizer = optim.Adam(dense_model.parameters(), lr=0.001)
# loss_fn = nn.MSELoss()
# 
# print("Training dense baseline (30-40 min)...")
# for epoch in range(100):
#     dense_model.train()
#     for batch_X, batch_y in train_loader:
#         optimizer.zero_grad()
#         pred = dense_model(batch_X)
#         loss = loss_fn(pred, batch_y)
#         loss.backward()
#         optimizer.step()
#     
#     if (epoch + 1) % 10 == 0:
#         dense_model.eval()
#         with torch.no_grad():
#             val_pred = dense_model(X_val_tensor)
#             val_loss = loss_fn(val_pred, y_val_tensor)
#             val_r2 = r2_score(y_val_tensor.cpu().numpy(), val_pred.cpu().numpy())
#         print(f"Epoch {epoch+1}: Val Loss={val_loss:.6f}, R¬≤={val_r2:.4f}")
# 
# torch.save(dense_model, dense_model_path)
# print(f"‚úÖ Saved to {dense_model_path}")

## Cell 5: Structured Pruning Training

**‚è±Ô∏è Time: 60-90 minutes**

This will:
1. Calculate neuron importance (L1 norm)
2. Remove least important neurons (physically shrink layers)
3. Fine-tune for 10 epochs
4. Repeat 3 times to reach 68.5% sparsity

Expected result: `[18 ‚Üí 512 ‚Üí 512 ‚Üí 512 ‚Üí 256 ‚Üí 2]` becomes `[18 ‚Üí ~256 ‚Üí ~256 ‚Üí ~256 ‚Üí ~128 ‚Üí 2]`

In [None]:
print("\n" + "="*60)
print("STRUCTURED PRUNING - TRUE GPU SPEEDUP")
print("="*60)

# Configuration
TARGET_SPARSITY = 0.685  # 68.5% parameter reduction
N_PRUNE_ROUNDS = 3       # Gradual pruning
FINETUNE_EPOCHS = 10     # Fine-tune after each prune

# Dense baseline stats
dense_params = sum(p.numel() for p in dense_model.parameters())
print(f"\nüìä Dense Baseline:")
print(f"   Parameters: {dense_params:,}")

# Define loss and optimizer factory
def pinn_loss(predictions, targets):
    return nn.MSELoss()(predictions, targets)

def optimizer_factory(model):
    return optim.Adam(model.parameters(), lr=0.001)

print(f"\nüî™ Pruning Configuration:")
print(f"   Target sparsity: {TARGET_SPARSITY*100:.1f}%")
print(f"   Prune rounds: {N_PRUNE_ROUNDS}")
print(f"   Fine-tune epochs: {FINETUNE_EPOCHS}")
print(f"\n‚è±Ô∏è Estimated time: 60-90 minutes")
print(f"\n{'='*60}\n")

# Run structured pruning
spinn_model = structured_prune_and_finetune(
    model=dense_model,
    train_loader=train_loader,
    val_loader=val_loader,
    optimizer_fn=optimizer_factory,
    loss_fn=pinn_loss,
    device=device,
    target_sparsity=TARGET_SPARSITY,
    n_prune_rounds=N_PRUNE_ROUNDS,
    finetune_epochs=FINETUNE_EPOCHS
)

# Final statistics
pruned_params = sum(p.numel() for p in spinn_model.parameters())
actual_sparsity = (1 - pruned_params / dense_params) * 100

print(f"\n{'='*60}")
print(f"‚úÖ STRUCTURED PRUNING COMPLETE")
print(f"{'='*60}")
print(f"\nüìä Results:")
print(f"   Dense parameters:  {dense_params:,}")
print(f"   SPINN parameters:  {pruned_params:,}")
print(f"   Reduction:         {actual_sparsity:.2f}%")

# Show new architecture
print(f"\nüèóÔ∏è Network Architecture:")
linear_layers = [m for m in spinn_model.modules() if isinstance(m, nn.Linear)]
dims = [layer.in_features for layer in linear_layers] + [linear_layers[-1].out_features]
print(f"   {' ‚Üí '.join(map(str, dims))}")

print(f"\nLayer-wise:")
for i, layer in enumerate(linear_layers):
    params = layer.weight.numel() + (layer.bias.numel() if layer.bias is not None else 0)
    print(f"   Layer {i}: [{layer.in_features:>3} ‚Üí {layer.out_features:>3}] = {params:,} params")

# Evaluate accuracy
spinn_model.eval()
with torch.no_grad():
    val_pred = spinn_model(X_val_tensor)
    val_loss = pinn_loss(val_pred, y_val_tensor)
    val_r2 = r2_score(y_val_tensor.cpu().numpy(), val_pred.cpu().numpy())

print(f"\nüìà Validation Performance:")
print(f"   Loss: {val_loss.item():.6f}")
print(f"   R¬≤ Score: {val_r2:.4f}")

# Save model
save_path = '/home/jupyter-ksenthilkumar/SPINN/models/saved/spinn_structured.pth'
torch.save(spinn_model, save_path)
print(f"\nüíæ Model saved: {save_path}")
print(f"\n{'='*60}")

## Cell 6: Convert to SparsePINN Format

In [None]:
# Convert to SparsePINN wrapper
sparse_spinn = convert_dense_to_sparse(spinn_model).to(device)

# Enable torch.compile() for extra optimization
if hasattr(torch, 'compile'):
    compiled = sparse_spinn.enable_compile(mode='reduce-overhead')
    if compiled:
        print("‚úÖ torch.compile() enabled (PyTorch 2.0+)")
else:
    print("‚ö†Ô∏è torch.compile() not available (PyTorch < 2.0)")

# Statistics
total, nnz, sparsity = sparse_spinn.count_parameters()

print(f"\n{'='*60}")
print(f"SPARSE SPINN MODEL")
print(f"{'='*60}")
print(f"\nüìä Parameters:")
print(f"   Total: {total:,}")
print(f"   Non-zero: {nnz:,}")
print(f"   Sparsity: {sparsity:.2f}%")

print(f"\nüèóÔ∏è Architecture:")
for info in sparse_spinn.get_sparsity_info():
    print(f"   Layer {info['layer']}: {info['shape']} ({info['non_zero_params']:,} params)")

print(f"\n{'='*60}")
print(f"‚úÖ Ready for benchmarking!")
print(f"{'='*60}")

## Cell 7: GPU Benchmark - The Moment of Truth! üöÄ

**Expected results:**
- Dense PINN: ~0.36 ms
- Structured SPINN: ~0.12-0.15 ms
- **Speedup: 2.4-3.0x** ‚úÖ

If you see <2x speedup, check troubleshooting in guide.

In [None]:
# Benchmark configuration
n_trials = 100
warmup = 20

print(f"{'='*60}")
print(f"GPU INFERENCE BENCHMARK")
print(f"{'='*60}")
print(f"\nConfiguration:")
print(f"   Device: {device}")
print(f"   GPU: {torch.cuda.get_device_name(0)}")
print(f"   Trials: {n_trials}")
print(f"   Warmup: {warmup}")
print(f"   Batch size: {X_val_tensor.shape[0]}")

# ============================================================
# DENSE PINN BENCHMARK
# ============================================================
print(f"\nüîµ Benchmarking Dense PINN...")

dense_model.eval()

# Warmup
for _ in range(warmup):
    with torch.no_grad():
        _ = dense_model(X_val_tensor)
torch.cuda.synchronize()

# Benchmark
dense_times = []
for _ in range(n_trials):
    torch.cuda.synchronize()
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)
    
    start.record()
    with torch.no_grad():
        _ = dense_model(X_val_tensor)
    end.record()
    
    torch.cuda.synchronize()
    dense_times.append(start.elapsed_time(end))

dense_mean = np.mean(dense_times)
dense_std = np.std(dense_times)

print(f"   ‚úì {dense_mean:.2f} ¬± {dense_std:.2f} ms")

# ============================================================
# STRUCTURED SPINN BENCHMARK
# ============================================================
print(f"\nüü¢ Benchmarking Structured SPINN...")

sparse_spinn.eval()

# Warmup (important for compiled models)
for _ in range(warmup):
    with torch.no_grad():
        _ = sparse_spinn(X_val_tensor)
torch.cuda.synchronize()

# Benchmark
sparse_times = []
for _ in range(n_trials):
    torch.cuda.synchronize()
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)
    
    start.record()
    with torch.no_grad():
        _ = sparse_spinn(X_val_tensor)
    end.record()
    
    torch.cuda.synchronize()
    sparse_times.append(start.elapsed_time(end))

sparse_mean = np.mean(sparse_times)
sparse_std = np.std(sparse_times)

print(f"   ‚úì {sparse_mean:.2f} ¬± {sparse_std:.2f} ms")

# ============================================================
# RESULTS
# ============================================================
speedup = dense_mean / sparse_mean

print(f"\n{'='*60}")
print(f"üìä BENCHMARK RESULTS")
print(f"{'='*60}")
print(f"\nDense PINN:         {dense_mean:.2f} ¬± {dense_std:.2f} ms")
print(f"Structured SPINN:   {sparse_mean:.2f} ¬± {sparse_std:.2f} ms")
print(f"\n{'='*60}")
print(f"‚ö° GPU SPEEDUP:      {speedup:.2f}x")
print(f"{'='*60}")

# Theoretical analysis
dense_params = sum(p.numel() for p in dense_model.parameters())
sparse_params = sum(p.numel() for p in sparse_spinn.parameters())
param_ratio = dense_params / sparse_params

print(f"\nüìê Theoretical Analysis:")
print(f"   Dense parameters:    {dense_params:,}")
print(f"   Sparse parameters:   {sparse_params:,}")
print(f"   Parameter ratio:     {param_ratio:.2f}x")
print(f"   Measured speedup:    {speedup:.2f}x")
print(f"   Efficiency:          {(speedup/param_ratio)*100:.1f}%")

# Success assessment
print(f"\n{'='*60}")
if speedup >= 2.0:
    print(f"‚úÖ SUCCESS! Achieved {speedup:.2f}x speedup")
    print(f"   Target was 2-3x - YOU DID IT! üéâ")
    print(f"\n   Next steps:")
    print(f"   1. Run CPU benchmark (Cell 8)")
    print(f"   2. Generate figures")
    print(f"   3. Update paper")
elif speedup >= 1.5:
    print(f"‚ö†Ô∏è PARTIAL SUCCESS: {speedup:.2f}x speedup")
    print(f"   Close to target (2-3x)")
    print(f"\n   Try:")
    print(f"   - sparse_spinn.enable_compile(mode='max-autotune')")
    print(f"   - Check layer dimensions actually changed")
else:
    print(f"‚ùå UNEXPECTED: Only {speedup:.2f}x speedup")
    print(f"\n   Troubleshooting:")
    print(f"   1. Check layer dimensions:")
    print(f"      for layer in spinn_model.modules():")
    print(f"          if isinstance(layer, nn.Linear):")
    print(f"              print(f'[{{layer.in_features}} ‚Üí {{layer.out_features}}]')")
    print(f"\n   2. Should see smaller dimensions (e.g., 256 not 512)")
    print(f"\n   3. If dimensions same, structured pruning didn't work")

print(f"{'='*60}")

## Cell 8: CPU Benchmark (Optional)

Test on CPU to show speedup across different hardware.

In [None]:
import time

# Move models to CPU
dense_cpu = dense_model.cpu()
sparse_cpu = sparse_spinn.cpu()
X_val_cpu = X_val_tensor.cpu()

n_trials = 100
warmup = 10

print(f"{'='*60}")
print(f"CPU INFERENCE BENCHMARK")
print(f"{'='*60}")

# Dense benchmark
print(f"\nüîµ Dense PINN...")
dense_cpu.eval()

for _ in range(warmup):
    _ = dense_cpu(X_val_cpu)

dense_times = []
for _ in range(n_trials):
    start = time.perf_counter()
    with torch.no_grad():
        _ = dense_cpu(X_val_cpu)
    end = time.perf_counter()
    dense_times.append((end - start) * 1000)

dense_cpu_mean = np.mean(dense_times)
print(f"   {dense_cpu_mean:.2f} ms")

# Sparse benchmark
print(f"\nüü¢ Structured SPINN...")
sparse_cpu.eval()

for _ in range(warmup):
    _ = sparse_cpu(X_val_cpu)

sparse_times = []
for _ in range(n_trials):
    start = time.perf_counter()
    with torch.no_grad():
        _ = sparse_cpu(X_val_cpu)
    end = time.perf_counter()
    sparse_times.append((end - start) * 1000)

sparse_cpu_mean = np.mean(sparse_times)
print(f"   {sparse_cpu_mean:.2f} ms")

cpu_speedup = dense_cpu_mean / sparse_cpu_mean

print(f"\n{'='*60}")
print(f"‚ö° CPU SPEEDUP: {cpu_speedup:.2f}x")
print(f"{'='*60}")

# Move back to GPU
dense_model = dense_cpu.to(device)
sparse_spinn = sparse_cpu.to(device)

## Cell 9: Test Set Evaluation

In [None]:
# Evaluate both models on test set
dense_model.eval()
sparse_spinn.eval()

with torch.no_grad():
    # Dense predictions
    dense_pred = dense_model(X_test_tensor)
    dense_test_r2 = r2_score(y_test_tensor.cpu().numpy(), dense_pred.cpu().numpy())
    dense_test_mse = mean_squared_error(y_test_tensor.cpu().numpy(), dense_pred.cpu().numpy())
    
    # Sparse predictions
    sparse_pred = sparse_spinn(X_test_tensor)
    sparse_test_r2 = r2_score(y_test_tensor.cpu().numpy(), sparse_pred.cpu().numpy())
    sparse_test_mse = mean_squared_error(y_test_tensor.cpu().numpy(), sparse_pred.cpu().numpy())

print(f"{'='*60}")
print(f"TEST SET EVALUATION")
print(f"{'='*60}")
print(f"\nDense PINN:")
print(f"   R¬≤ Score: {dense_test_r2:.4f}")
print(f"   MSE: {dense_test_mse:.6f}")
print(f"\nStructured SPINN:")
print(f"   R¬≤ Score: {sparse_test_r2:.4f}")
print(f"   MSE: {sparse_test_mse:.6f}")
print(f"\nImprovement:")
print(f"   ŒîR¬≤: {sparse_test_r2 - dense_test_r2:+.4f}")
print(f"   {'='*60}")

## Summary

Run this cell to see complete results table for your paper.

In [None]:
# Create results summary
results = {
    'Model': ['Dense PINN', 'SPINN (Structured)'],
    'Parameters': [dense_params, sparse_params],
    'GPU Time (ms)': [f"{dense_mean:.2f}", f"{sparse_mean:.2f}"],
    'GPU Speedup': ["1.0x", f"{speedup:.2f}x"],
    'Test R¬≤': [f"{dense_test_r2:.4f}", f"{sparse_test_r2:.4f}"]
}

results_df = pd.DataFrame(results)

print(f"\n{'='*80}")
print(f"FINAL RESULTS - COPY THIS TO YOUR PAPER")
print(f"{'='*80}\n")
print(results_df.to_string(index=False))
print(f"\n{'='*80}")

print(f"\n‚úÖ Key Achievements:")
print(f"   ‚Ä¢ Parameter reduction: {(1-sparse_params/dense_params)*100:.1f}%")
print(f"   ‚Ä¢ GPU speedup: {speedup:.2f}x")
print(f"   ‚Ä¢ Accuracy improvement: {sparse_test_r2 - dense_test_r2:+.4f} R¬≤")
print(f"\nüéâ CONGRATULATIONS! Your abstract claims are now supported!")