# üî¨ Expansion Rate Documentation - GLU Width Pruning

**Research Paper:** [Width Pruning in GLU-MLP Layers](https://doi.org/10.31219/osf.io/qgxea)

[![Paper](https://img.shields.io/badge/OSF-Paper-blue?logo=osf&logoColor=white)](https://doi.org/10.31219/osf.io/qgxea)
[![GitHub](https://img.shields.io/badge/‚≠ê_Star-OptiPFair-orange?logo=github&logoColor=white)](https://github.com/peremartra/optipfair)
[![PyPI](https://img.shields.io/pypi/v/optipfair?logo=python&logoColor=white&label=v)](https://pypi.org/project/optipfair/)

**Repository:** [github.com/peremartra/llama-glu-expansion-pruning](https://github.com/peremartra/llama-glu-expansion-pruning)

---

## üìã Notebook Objective

This notebook documents the **actual expansion rates** achieved after applying width pruning to GLU-MLP layers across all model configurations. It serves as a reference for:

1. **Verifying pruning accuracy:** Confirm that each `pruning_pct` produces the expected expansion rate
2. **Architecture documentation:** Record detailed layer dimensions for reproducibility
3. **Parameter reduction analysis:** Calculate exact parameter savings per model
4. **Cross-model comparison:** Compare expansion rates across 1B, 3B, and Instruct variants

### Key Features:
- ‚úÖ **Automated calculation:** Uses OptIFAIR to recreate all pruned models on-the-fly
- ‚úÖ **Complete documentation:** Records all architecture details in structured JSON
- ‚úÖ **No external dependencies:** Self-contained, no need for pre-existing model files
- ‚úÖ **Reproducibility:** Establishes baseline for all future experiments

### Output:
- `expansion_rates.json`: Complete architecture details for all 18 model configurations
- Summary table with expansion rates and parameter reductions

---

**Colab Environment:** CPU sufficient (no GPU needed for architecture inspection)

**Estimated Runtime:** ~30-45 minutes (depends on download speeds)

---

# 1. Setup & Installation

In [None]:
# Install OptIFAIR library for structured GLU pruning
!pip install -q optipfair

print("‚úÖ OptIFAIR installed successfully")

In [None]:
# Download utils.py from GitHub repository
!wget -q https://raw.githubusercontent.com/peremartra/llama-glu-expansion-pruning/main/utils.py

# Verify download
import os
if os.path.exists('utils.py'):
    print("‚úÖ utils.py downloaded successfully")
else:
    print("‚ùå Failed to download utils.py")

In [None]:
# Standard imports
import json
import torch
from datetime import datetime
from transformers import AutoModelForCausalLM
from optipfair import prune_model
import pandas as pd
from tqdm.auto import tqdm

# Import experiment configuration
from utils import EXPERIMENT_CONFIG

print(f"‚úÖ Loaded {len(EXPERIMENT_CONFIG)} model configurations")
print(f"üìÖ Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

# 2. Configuration & Constants

In [None]:
# Output configuration
OUTPUT_FILE = "expansion_rates.json"

# OptIFAIR pruning parameters (matching paper methodology)
PRUNING_CONFIG = {
    "pruning_type": "MLP_GLU",
    "neuron_selection_method": "MAW",  # Maximum Absolute Weight (optimal for GLU)
    "return_stats": True
}

# Model loading configuration
MODEL_LOAD_CONFIG = {
    "torch_dtype": torch.bfloat16,
    "device_map": "auto",
    "low_cpu_mem_usage": True
}

print("‚öôÔ∏è Configuration:")
print(f"   Output file: {OUTPUT_FILE}")
print(f"   Pruning method: {PRUNING_CONFIG['neuron_selection_method']}")
print(f"   Pruning type: {PRUNING_CONFIG['pruning_type']}")

# 3. Core Functions

In [None]:
def get_model_architecture_info(model):
    """
    Extract architecture information from a Llama model.
    
    Args:
        model: HuggingFace model instance
        
    Returns:
        dict: Architecture details including dimensions and expansion rate
    """
    config = model.config
    
    # Get dimensions from model config
    hidden_size = config.hidden_size
    intermediate_size = config.intermediate_size
    num_hidden_layers = config.num_hidden_layers
    
    # Calculate expansion rate
    expansion_rate = intermediate_size / hidden_size
    
    # Count total parameters
    total_params = sum(p.numel() for p in model.parameters())
    
    return {
        "hidden_size": hidden_size,
        "intermediate_size": intermediate_size,
        "num_hidden_layers": num_hidden_layers,
        "expansion_rate": round(expansion_rate, 2),
        "expansion_rate_percentage": round(expansion_rate * 100, 1),
        "total_parameters": total_params,
        "total_parameters_millions": round(total_params / 1e6, 2)
    }


def calculate_expansion_rate_for_config(config_entry):
    """
    Load model, apply pruning, and extract expansion rate information.
    
    Args:
        config_entry: Dictionary from EXPERIMENT_CONFIG
        
    Returns:
        dict: Complete model information including original and pruned architectures
    """
    base_model_name = config_entry["base_model"]
    pruning_pct = config_entry["pruning_pct"]
    
    print(f"\n{'='*80}")
    print(f"Processing: {base_model_name} @ {pruning_pct}% pruning")
    print(f"{'='*80}")
    
    try:
        # Load base model
        print(f"üì• Loading base model: {base_model_name}...")
        model = AutoModelForCausalLM.from_pretrained(
            base_model_name,
            **MODEL_LOAD_CONFIG
        )
        
        # Get original architecture info
        print("üìä Analyzing original architecture...")
        original_arch = get_model_architecture_info(model)
        print(f"   Original expansion rate: {original_arch['expansion_rate_percentage']}%")
        print(f"   Original parameters: {original_arch['total_parameters_millions']}M")
        
        # Apply pruning
        print(f"‚úÇÔ∏è Applying {pruning_pct}% pruning with MAW method...")
        pruned_model, stats = prune_model(
            model=model,
            pruning_percentage=pruning_pct,
            **PRUNING_CONFIG
        )
        
        # Get pruned architecture info
        print("üìä Analyzing pruned architecture...")
        pruned_arch = get_model_architecture_info(pruned_model)
        print(f"   Pruned expansion rate: {pruned_arch['expansion_rate_percentage']}%")
        print(f"   Pruned parameters: {pruned_arch['total_parameters_millions']}M")
        
        # Calculate reductions
        param_reduction_pct = (
            (original_arch['total_parameters'] - pruned_arch['total_parameters']) 
            / original_arch['total_parameters'] * 100
        )
        
        expansion_reduction_pct = (
            (original_arch['expansion_rate'] - pruned_arch['expansion_rate'])
            / original_arch['expansion_rate'] * 100
        )
        
        print(f"üìâ Parameter reduction: {param_reduction_pct:.2f}%")
        print(f"üìâ Expansion reduction: {expansion_reduction_pct:.2f}%")
        
        # Clean up to free memory
        del model
        del pruned_model
        torch.cuda.empty_cache()
        
        # Return complete info
        return {
            "base_model": base_model_name,
            "pruning_pct": pruning_pct,
            "original_architecture": original_arch,
            "pruned_architecture": pruned_arch,
            "reductions": {
                "parameter_reduction_pct": round(param_reduction_pct, 2),
                "expansion_reduction_pct": round(expansion_reduction_pct, 2),
                "parameters_saved_millions": round(
                    (original_arch['total_parameters'] - pruned_arch['total_parameters']) / 1e6, 2
                )
            },
            "optipfair_stats": stats,
            "status": "success"
        }
        
    except Exception as e:
        print(f"‚ùå Error processing {base_model_name} @ {pruning_pct}%: {str(e)}")
        return {
            "base_model": base_model_name,
            "pruning_pct": pruning_pct,
            "status": "failed",
            "error": str(e)
        }

print("‚úÖ Functions defined")

# 4. Process All Models

In [None]:
# Initialize results storage
all_results = []

print(f"\nüöÄ Starting expansion rate calculation for {len(EXPERIMENT_CONFIG)} models...\n")

# Process each configuration
for i, config in enumerate(tqdm(EXPERIMENT_CONFIG, desc="Processing models")):
    print(f"\n[{i+1}/{len(EXPERIMENT_CONFIG)}] Processing configuration...")
    
    result = calculate_expansion_rate_for_config(config)
    all_results.append(result)
    
    # Brief status update
    if result['status'] == 'success':
        print(f"‚úÖ Success: {result['pruned_architecture']['expansion_rate_percentage']}% expansion rate")
    else:
        print(f"‚ùå Failed: {result.get('error', 'Unknown error')}")

print(f"\n\n{'='*80}")
print(f"‚úÖ Completed processing all {len(EXPERIMENT_CONFIG)} models")
print(f"{'='*80}\n")

# 5. Save Results to JSON

In [None]:
# Prepare final JSON structure
output_data = {
    "metadata": {
        "generated_at": datetime.now().isoformat(),
        "optipfair_version": "latest",  # Could get actual version if needed
        "total_models": len(all_results),
        "successful": sum(1 for r in all_results if r['status'] == 'success'),
        "failed": sum(1 for r in all_results if r['status'] == 'failed'),
        "pruning_method": PRUNING_CONFIG['neuron_selection_method'],
        "pruning_type": PRUNING_CONFIG['pruning_type']
    },
    "models": all_results
}

# Save to JSON file
with open(OUTPUT_FILE, 'w') as f:
    json.dump(output_data, f, indent=2)

print(f"\nüíæ Results saved to: {OUTPUT_FILE}")
print(f"   Total models: {output_data['metadata']['total_models']}")
print(f"   Successful: {output_data['metadata']['successful']}")
print(f"   Failed: {output_data['metadata']['failed']}")

# 6. Summary Analysis

In [None]:
# Create summary dataframe for successful models
successful_results = [r for r in all_results if r['status'] == 'success']

summary_data = []
for result in successful_results:
    # Extract model family (1B, 3B, 1B-I, 3B-I)
    base_name = result['base_model'].split('/')[-1]
    if '1B-Instruct' in base_name:
        family = '1B-Instruct'
    elif '3B-Instruct' in base_name:
        family = '3B-Instruct'
    elif '1B' in base_name:
        family = '1B'
    else:
        family = '3B'
    
    summary_data.append({
        'Model Family': family,
        'Pruning %': result['pruning_pct'],
        'Original Expansion %': result['original_architecture']['expansion_rate_percentage'],
        'Pruned Expansion %': result['pruned_architecture']['expansion_rate_percentage'],
        'Expansion Reduction %': result['reductions']['expansion_reduction_pct'],
        'Original Params (M)': result['original_architecture']['total_parameters_millions'],
        'Pruned Params (M)': result['pruned_architecture']['total_parameters_millions'],
        'Param Reduction %': result['reductions']['parameter_reduction_pct'],
        'Params Saved (M)': result['reductions']['parameters_saved_millions']
    })

summary_df = pd.DataFrame(summary_data)

# Sort by model family and pruning percentage
summary_df = summary_df.sort_values(['Model Family', 'Pruning %'])

print("\n" + "="*100)
print("üìä EXPANSION RATE SUMMARY")
print("="*100)
print(summary_df.to_string(index=False))
print("="*100)

# 7. Model Family Comparison

In [None]:
# Group by model family for comparison
print("\n" + "="*100)
print("üìä COMPARISON BY MODEL FAMILY")
print("="*100)

for family in sorted(summary_df['Model Family'].unique()):
    family_df = summary_df[summary_df['Model Family'] == family]
    
    print(f"\nüîπ {family} Model Family:")
    print(f"   Base expansion rate: {family_df.iloc[0]['Original Expansion %']}%")
    print(f"   Base parameters: {family_df.iloc[0]['Original Params (M)']}M")
    print(f"\n   Pruned variants:")
    
    for _, row in family_df.iterrows():
        print(f"      {row['Pruning %']:2d}% pruning ‚Üí {row['Pruned Expansion %']:5.1f}% expansion "
              f"({row['Pruned Params (M):5.0f']}M params, {row['Param Reduction %']:.1f}% reduction)")

print("\n" + "="*100)

# 8. Target Expansion Rate Analysis

In [None]:
# Identify which models achieve the target 140% expansion rate
TARGET_EXPANSION = 140.0

print(f"\n{'='*100}")
print(f"üéØ TARGET EXPANSION RATE: {TARGET_EXPANSION}%")
print(f"{'='*100}\n")

# Find models closest to target
summary_df['Distance from Target'] = abs(summary_df['Pruned Expansion %'] - TARGET_EXPANSION)

for family in sorted(summary_df['Model Family'].unique()):
    family_df = summary_df[summary_df['Model Family'] == family]
    closest = family_df.loc[family_df['Distance from Target'].idxmin()]
    
    print(f"üîπ {family}:")
    print(f"   Closest model: {closest['Pruning %']}% pruning")
    print(f"   Achieved expansion: {closest['Pruned Expansion %']}%")
    print(f"   Distance from target: {closest['Distance from Target']:.1f}%")
    print()

print(f"{'='*100}\n")

# 9. Export Summary to CSV

In [None]:
# Save summary table to CSV for easy reference
csv_filename = "expansion_rates_summary.csv"
summary_df.drop('Distance from Target', axis=1, inplace=True)  # Remove helper column
summary_df.to_csv(csv_filename, index=False)

print(f"üìä Summary table exported to: {csv_filename}")
print(f"\n‚úÖ All analysis complete!")
print(f"\nüìÅ Output files:")
print(f"   - {OUTPUT_FILE} (complete architecture details)")
print(f"   - {csv_filename} (summary table)")