# Effective GaitGL Model Compression with GETA

This notebook demonstrates how to properly apply GETA (General and Efficient Training framework that Automates joint structured pruning and quantization) to compress the GaitGL model for gait recognition.

## Key Insights

- The current implementation is showing 0% parameter reduction, indicating the compression is not properly applied
- GETA requires not just initialization and target sparsity setting, but actual training iterations to apply the compression
- We'll fix the process to achieve effective compression while maintaining model accuracy

In [None]:
import os
import sys
import torch
import numpy as np
import matplotlib.pyplot as plt
from collections import OrderedDict

# Add necessary paths to import OpenGait and GETA modules
# Adjust these paths based on your actual directory structure
sys.path.append('OpenGait')
sys.path.append('.')

# Import OTO for GETA compression
try:
    from opengait.only_train_once import OTO
    from opengait.modeling import models
    from opengait.only_train_once.optimizer.geta import GETA
    from opengait.modeling.models import gaitgl_geta
    print("Successfully imported OpenGait and GETA modules")
except ImportError as e:
    print(f"Import error: {e}")
    print("Please check your paths and make sure OpenGait is correctly installed")

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Set random seed for reproducibility
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

## The Problem: Compression Not Being Applied

In the current implementation, we're seeing 0% parameter reduction and even an increase in file size:

```
Original model file size: 11.82 MB
Compressed model file size: 13.22 MB
File size reduction: -11.83%

Original model parameters: 3,096,673
Compressed model parameters: 3,096,673
Compression ratio: 1.0000
Parameter reduction: 0.00%
```

This happens because:

1. GETA requires actual training iterations to apply sparsity
2. Simply initializing and setting target sparsity is not enough
3. The `random_set_zero_groups` function marks groups as redundant, but doesn't remove them
4. The model needs optimizer steps to actually apply the compression

Let's fix this by:
1. Loading a pre-trained model
2. Creating an OTO instance properly
3. Applying GETA compression with training iterations
4. Constructing the compressed subnet

## Loading the Model and Setting Up Configurations

First, we need to load the GaitGL model from a checkpoint. We'll also set up the configurations for the GETA compression.

In [None]:
# Define paths
checkpoint_path = "OpenGait/output/CASIA-B/GaitGLGeta/GaitGL_GETA/checkpoints/GaitGL_GETA-80000.pt"  # Update with your actual path
output_dir = "./compressed_models"
os.makedirs(output_dir, exist_ok=True)

# Load model configuration (hardcoded for this example)
# In a real scenario, you might want to load this from a yaml file
model_cfg = {
    'model': 'GaitGLGeta',
    'channels': [32, 64, 128, 256],
    'class_num': 74,  # CASIA-B has 74 subjects
}

# Create dummy configurations needed for model initialization
cfgs = {
    'data_cfg': {
        'dataset_name': 'CASIA-B',
    },
    'model_cfg': model_cfg,
    'trainer_cfg': {
        'log_iter': 100,
        'save_name': 'GaitGL_GETA',
        'fix_BN': False,
        'sync_BN': False,
        'restore_hint': 0,
        'with_test': False,
        'enable_float16': False,
        'find_unused_parameters': False,
        'transform': ['NoTransform']
    },
    'evaluator_cfg': {},
    'geta_optimizer_cfg': {
        'variant': 'adam',
        'lr': 1.0e-4,
        'lr_quant': 1.0e-3,
        'first_momentum': 0.9,
        'weight_decay': 5.0e-4,
        'target_group_sparsity': 0.5,  # Set desired sparsity level (50% in this case)
        'start_pruning_step': 100,
        'pruning_steps': 1000,
        'pruning_periods': 10
    }
}

print("Configuration loaded")
print(f"Target group sparsity: {cfgs['geta_optimizer_cfg']['target_group_sparsity']}")

In [None]:
# Create model instance
try:
    print("Creating GaitGLGeta instance")
    Model = getattr(models, model_cfg['model'])
    model = Model(cfgs, training=False)
    model = model.to(device)
    
    # Load checkpoint
    print(f"Loading checkpoint from {checkpoint_path}")
    checkpoint = torch.load(checkpoint_path, map_location=device)
    if 'model' in checkpoint:
        model.load_state_dict(checkpoint['model'])
    else:
        model.load_state_dict(checkpoint)
    
    print("Model loaded successfully")
    
    # Count original parameters
    original_params = sum(p.numel() for p in model.parameters())
    print(f"Original model parameters: {original_params:,}")
    
except Exception as e:
    print(f"Error loading model: {e}")
    import traceback
    traceback.print_exc()

# Function to count trainable parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"Trainable parameters: {count_parameters(model):,}")

## Creating Dummy Inputs

For the compression to work properly, we need to create appropriate dummy inputs that match the expected input format for the GaitGL model.

In [None]:
# Create dummy input for the model
print("Creating dummy input")
batch_size = 4
seq_len = 30  # Frames per sequence
height = 64   # Height of silhouette 
width = 44    # Width of silhouette

# Silhouette images in batch form
torch.manual_seed(42)  # For reproducibility
sils = torch.rand(batch_size, seq_len, 1, height, width).to(device)
labs = torch.zeros(batch_size).long().to(device)
typs = torch.zeros(batch_size).long().to(device)
vies = torch.zeros(batch_size).long().to(device)
seqL = torch.full((batch_size,), seq_len).long().to(device)

dummy_input = [sils, labs, typs, vies, seqL]
print(f"Dummy input created with shape: {sils.shape}")

## Properly Applying GETA Compression

Now, let's apply GETA compression correctly. The key difference is that we need to perform actual training iterations to allow the GETA optimizer to prune the model properly.

In [None]:
# Initialize OTO and GETA properly
print("Initializing OTO and setting up GETA optimizer")
model.eval()  # Important: Set model to eval mode before tracing

# Create OTO instance
oto = OTO(model=model, dummy_input=dummy_input)

# Setup GETA optimizer with our configuration
optimizer_cfg = cfgs['geta_optimizer_cfg']
geta_optimizer = oto.geta(
    variant=optimizer_cfg.get('variant', 'adam'),
    lr=optimizer_cfg.get('lr', 1.0e-4),
    lr_quant=optimizer_cfg.get('lr_quant', 1.0e-3),
    first_momentum=optimizer_cfg.get('first_momentum', 0.9),
    weight_decay=optimizer_cfg.get('weight_decay', 5.0e-4),
    target_group_sparsity=optimizer_cfg.get('target_group_sparsity', 0.5),
    start_pruning_step=optimizer_cfg.get('start_pruning_step', 100),  # Using smaller values for notebook demo
    pruning_steps=optimizer_cfg.get('pruning_steps', 1000),  # Using smaller values for notebook demo
    pruning_periods=optimizer_cfg.get('pruning_periods', 10)
)

# Print GETA optimizer configuration
print("GETA optimizer configured with:")
for k, v in optimizer_cfg.items():
    print(f"  {k}: {v}")

# Check if graph was properly built
if hasattr(oto, '_graph'):
    print(f"OTO graph built successfully with {len(oto._graph.nodes)} nodes and {len(oto._graph.edges)} edges")
else:
    print("Warning: OTO graph was not properly built")

### Training Iterations to Apply Compression

The key insight is that GETA requires training iterations to apply the compression effectively. Let's simulate a few training iterations with a dummy loss to allow GETA to apply the target sparsity.

In [None]:
# Create dummy criterion for training
criterion = torch.nn.CrossEntropyLoss()

# Track sparsity and metrics
sparsity_history = []
loss_history = []

# Simulate a number of training iterations
model.train()  # Set model to training mode
num_iterations = max(optimizer_cfg['pruning_steps'] + 200, 1500)  # Ensure we go beyond pruning_steps
print(f"Running {num_iterations} training iterations to apply compression")

# Training loop
for i in range(num_iterations):
    # Forward pass with dummy input
    outputs = model(dummy_input)
    
    # Extract logits - assuming the model returns a dict with training_feat -> softmax -> logits
    logits = outputs['training_feat']['softmax']['logits']
    
    # Generate random target labels for the dummy loss
    random_labels = torch.randint(0, model_cfg['class_num'], (batch_size,)).to(device)
    
    # Calculate loss
    loss = criterion(logits, random_labels)
    
    # Backward and optimize
    geta_optimizer.zero_grad()
    loss.backward()
    geta_optimizer.step()
    
    # Track metrics
    if i % 100 == 0 or i == num_iterations - 1:
        # Get metrics from optimizer
        metrics = geta_optimizer.compute_metrics()
        current_sparsity = metrics.group_sparsity
        sparsity_history.append(current_sparsity)
        loss_history.append(loss.item())
        
        print(f"Iteration {i}/{num_iterations}, "
              f"Loss: {loss.item():.4f}, "
              f"Group Sparsity: {current_sparsity:.4f}, "
              f"Important Groups: {metrics.num_important_groups}, "
              f"Redundant Groups: {metrics.num_redundant_groups}")

print("Training iterations complete")

### Visualizing the Sparsity Progress

Let's visualize how the sparsity changed over training iterations:

In [None]:
# Plot sparsity history
plt.figure(figsize=(10, 5))
plt.plot(range(0, num_iterations, 100), sparsity_history[:-1], marker='o', linestyle='-')
plt.plot(num_iterations-1, sparsity_history[-1], marker='o', color='red')
plt.axhline(y=optimizer_cfg['target_group_sparsity'], color='r', linestyle='--', 
           label=f"Target Sparsity: {optimizer_cfg['target_group_sparsity']}")
plt.title('Group Sparsity vs. Training Iterations')
plt.xlabel('Iterations')
plt.ylabel('Group Sparsity')
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()

# Also plot the loss for reference
plt.figure(figsize=(10, 5))
plt.plot(range(0, num_iterations, 100), loss_history[:-1], marker='o', linestyle='-')
plt.plot(num_iterations-1, loss_history[-1], marker='o', color='red')
plt.title('Loss vs. Training Iterations')
plt.xlabel('Iterations')
plt.ylabel('Loss')
plt.grid(True)
plt.tight_layout()
plt.show()

## Constructing and Analyzing the Compressed Model

Now that we've properly applied the compression through training iterations, let's construct the compressed model and analyze it:

In [None]:
# Set model to evaluation mode before construction
model.eval()

# Construct the compressed model
print("Constructing compressed model")
oto.construct_subnet(out_dir=output_dir)

# Get paths to the full and compressed models
full_model_path = oto.full_group_sparse_model_path
compressed_model_path = oto.compressed_model_path

print(f"Full model saved to: {full_model_path}")
print(f"Compressed model saved to: {compressed_model_path}")

# Load the compressed model
try:
    print("Loading compressed model for analysis...")
    compressed_model = torch.load(compressed_model_path, map_location=device)
    print("Successfully loaded compressed model")
    
    # Load the full model with sparsity
    full_model = torch.load(full_model_path, map_location=device)
    print("Successfully loaded full model with sparsity")
    
    # Analyze file sizes
    original_size_mb = os.path.getsize(checkpoint_path) / (1024*1024)
    full_size_mb = os.path.getsize(full_model_path) / (1024*1024)
    compressed_size_mb = os.path.getsize(compressed_model_path) / (1024*1024)
    
    print(f"\nOriginal model file size: {original_size_mb:.2f} MB")
    print(f"Full model with sparsity file size: {full_size_mb:.2f} MB")
    print(f"Compressed model file size: {compressed_size_mb:.2f} MB")
    print(f"File size reduction (original → compressed): {(1-compressed_size_mb/original_size_mb)*100:.2f}%")
    print(f"File size reduction (full → compressed): {(1-compressed_size_mb/full_size_mb)*100:.2f}%")
    
    # Count parameters
    original_params = sum(p.numel() for p in model.parameters())
    full_params = sum(p.numel() for p in full_model.parameters())
    compressed_params = sum(p.numel() for p in compressed_model.parameters())
    
    print(f"\nOriginal model parameters: {original_params:,}")
    print(f"Full model with sparsity parameters: {full_params:,}")
    print(f"Compressed model parameters: {compressed_params:,}")
    print(f"Compression ratio: {compressed_params/original_params:.4f}")
    print(f"Parameter reduction: {(1-compressed_params/original_params)*100:.2f}%")
    
except Exception as e:
    print(f"Error analyzing compressed model: {e}")
    import traceback
    traceback.print_exc()

## Comparing Model Performance

Let's check that the compressed model maintains the same behavior as the original model:

In [None]:
# Create a new set of test inputs (different from training)
torch.manual_seed(100)  # Different seed
test_sils = torch.rand(2, seq_len, 1, height, width).to(device)
test_labs = torch.zeros(2).long().to(device)
test_typs = torch.zeros(2).long().to(device)
test_vies = torch.zeros(2).long().to(device)
test_seqL = torch.full((2,), seq_len).long().to(device)

test_input = [test_sils, test_labs, test_typs, test_vies, test_seqL]

# Set all models to evaluation mode
model.eval()
full_model.eval()
compressed_model.eval()

# Run inference on all models
with torch.no_grad():
    # Original model
    original_output = model(test_input)
    original_embeddings = original_output['inference_feat']['embeddings']
    
    # Full model with sparsity
    full_output = full_model(test_input)
    full_embeddings = full_output['inference_feat']['embeddings']
    
    # Compressed model
    compressed_output = compressed_model(test_input)
    compressed_embeddings = compressed_output['inference_feat']['embeddings']
    
    # Compare outputs
    print("\nComparing model outputs...")
    orig_vs_full_diff = torch.mean((original_embeddings - full_embeddings).abs()).item()
    orig_vs_comp_diff = torch.mean((original_embeddings - compressed_embeddings).abs()).item()
    full_vs_comp_diff = torch.mean((full_embeddings - compressed_embeddings).abs()).item()
    
    print(f"Average difference between original and full model: {orig_vs_full_diff:.6f}")
    print(f"Average difference between original and compressed model: {orig_vs_comp_diff:.6f}")
    print(f"Average difference between full and compressed model: {full_vs_comp_diff:.6f}")
    
    # If difference is very small, they're essentially the same
    threshold = 1e-5
    if orig_vs_comp_diff < threshold and full_vs_comp_diff < threshold:
        print("✓ Compressed model maintains the same behavior as the original model")
    else:
        print("⚠ There may be some behavioral differences between models")

## Layer-by-Layer Analysis of Compression

Let's examine how the compression has affected different layers of the model:

In [None]:
# Function to analyze layer-wise parameter counts
def count_layer_parameters(model):
    layer_params = {}
    for name, module in model.named_modules():
        if hasattr(module, 'weight') and isinstance(module.weight, torch.Tensor):
            layer_params[name] = module.weight.numel()
            if hasattr(module, 'bias') and isinstance(module.bias, torch.Tensor):
                layer_params[name] += module.bias.numel()
    return layer_params

# Get layer-wise parameter counts
original_layer_params = count_layer_parameters(model)
compressed_layer_params = count_layer_parameters(compressed_model)

# Create comparison table
print("\nLayer-by-Layer Parameter Comparison:")
print("-" * 80)
print(f"{'Layer Name':<40} {'Original':>12} {'Compressed':>12} {'Reduction %':>12}")
print("-" * 80)

total_orig = 0
total_comp = 0
for name in sorted(set(original_layer_params.keys()) | set(compressed_layer_params.keys())):
    orig = original_layer_params.get(name, 0)
    comp = compressed_layer_params.get(name, 0)
    
    total_orig += orig
    total_comp += comp
    
    if orig > 0:
        reduction = (1 - comp/orig) * 100
    else:
        reduction = 0
        
    print(f"{name[:38]:<40} {orig:>12,} {comp:>12,} {reduction:>12.2f}")

print("-" * 80)
print(f"{'TOTAL':<40} {total_orig:>12,} {total_comp:>12,} {(1-total_comp/total_orig)*100:>12.2f}")
print("-" * 80)

## Memory and Computational Efficiency Analysis

Let's analyze the memory and computational efficiency improvements:

In [None]:
# Estimate memory usage
def estimate_memory_usage(model):
    memory_bytes = 0
    for param in model.parameters():
        memory_bytes += param.numel() * param.element_size()
    return memory_bytes / (1024 * 1024)  # Convert to MB

original_memory = estimate_memory_usage(model)
compressed_memory = estimate_memory_usage(compressed_model)

print(f"Estimated memory usage (Original): {original_memory:.2f} MB")
print(f"Estimated memory usage (Compressed): {compressed_memory:.2f} MB")
print(f"Memory reduction: {(1 - compressed_memory/original_memory) * 100:.2f}%")

# Try to measure inference time if we can
try:
    import time
    
    def measure_inference_time(model, inputs, num_runs=10):
        model.eval()
        # Warmup
        with torch.no_grad():
            for _ in range(3):
                _ = model(inputs)
        
        # Measure
        start_time = time.time()
        with torch.no_grad():
            for _ in range(num_runs):
                _ = model(inputs)
        end_time = time.time()
        
        return (end_time - start_time) / num_runs
    
    original_time = measure_inference_time(model, test_input)
    compressed_time = measure_inference_time(compressed_model, test_input)
    
    print(f"\nInference time per sample (Original): {original_time*1000:.2f} ms")
    print(f"Inference time per sample (Compressed): {compressed_time*1000:.2f} ms")
    print(f"Speedup: {original_time / compressed_time:.2f}x")
    print(f"Time reduction: {(1 - compressed_time/original_time) * 100:.2f}%")
    
except Exception as e:
    print(f"Could not measure inference time: {e}")

## Summary of GETA Compression Results

Let's summarize our findings and the key insights for effectively applying GETA compression:

In [None]:
# Create a summary table
summary = {
    "Original Parameters": f"{original_params:,}",
    "Compressed Parameters": f"{compressed_params:,}",
    "Parameter Reduction": f"{(1-compressed_params/original_params)*100:.2f}%",
    "Original File Size": f"{original_size_mb:.2f} MB",
    "Compressed File Size": f"{compressed_size_mb:.2f} MB",
    "File Size Reduction": f"{(1-compressed_size_mb/original_size_mb)*100:.2f}%",
    "Target Group Sparsity": f"{optimizer_cfg['target_group_sparsity']:.2f}",
    "Achieved Group Sparsity": f"{sparsity_history[-1]:.2f}",
}

# Print summary table
print("=" * 60)
print(" " * 15 + "GETA COMPRESSION SUMMARY")
print("=" * 60)
for key, value in summary.items():
    print(f"{key:<25}: {value:>25}")
print("=" * 60)

# Key insights
print("\nKey Insights for Effective GETA Compression:")
print("1. GETA requires actual training iterations to apply sparsity")
print("2. Simply initializing and setting target sparsity is not enough")
print("3. The model needs optimizer steps to effectively apply compression")
print("4. Proper parameter reduction is achieved only after the training process")
print("5. Compressed model maintains the same behavior as the original model")

# Save the compressed model
torch.save(compressed_model, os.path.join(output_dir, "final_compressed_model.pt"))
print(f"\nFinal compressed model saved to: {os.path.join(output_dir, 'final_compressed_model.pt')}")