<a href="https://colab.research.google.com/github/peremartra/optipfair/blob/main/examples/depth_pruning_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#OptiPFair Notebook Series – Example: Depth Pruning

![optiPfair Logo](https://github.com/peremartra/optipfair/blob/main/images/optiPfair.png?raw=true)


This notebook demonstrates how to use [OptiPFair](https://github.com/peremartra/optipfair) for depth pruning of transformer models by removing entire layers.  
This is a more aggressive pruning strategy that can lead to significant efficiency gains.

##Recommended Environment

- **Platform**: [Google Colab](https://colab.research.google.com)  
- **Hardware**: GPU runtime (recommended: T4 or better for 1B–3B models)  
- **Dependencies**: Installed automatically in the first cell (optipfair, transformers, torch)

##by Pere Martra.

- [LinkedIn](https://www.linkedin.com/in/pere-martra)  
- [GitHub](https://github.com/peremartra)  
- [X / Twitter](https://x.com/peremartra)

---

> If you find this useful, please ⭐ the [repository](https://github.com/peremartra/optipfair) and share it!
---
If you want your favorite LLM to create code with optiPfair, you just need to provide it with the file: [**optipfair_llm_reference_manual.txt**](https://github.com/peremartra/optipfair/blob/main/optipfair_llm_reference_manual.txt), which contains all the necessary information for the LLM to become an expert in using the library.

# Depth Pruning Example

This notebook demonstrates how to use OptiPFair for depth pruning of language models.
Depth pruning removes entire transformer layers, which is more aggressive than neuron-level pruning
but can lead to significant efficiency gains with proper fine-tuning.

Author: Pere Martra

Designed for Google Colab - GPU runtime recommended

---
## Installation and Setup

In [None]:
!pip install -q transformers optipfair torch


## Import Libraries and Check GPU

In [None]:
import torch
import os
import gc
from transformers import AutoModelForCausalLM, AutoTokenizer
from optipfair.pruning.depth import prune_model_depth

# Check device availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

## Configuration

In [None]:
# List of models to test - you can add more models here
# Note: For Colab, stick to smaller models due to memory constraints
MODELS_TO_TEST = [
    "meta-llama/Llama-3.2-1B",
    # "google/gemma-2-2b",  # Uncomment if you have enough GPU memory
    # Add more models here as needed
]

# Depth pruning configuration - modify these values as needed
NUM_LAYERS_TO_REMOVE = 4  # Number of layers to remove
DEPTH_PRUNING_PERCENTAGE = 25  # Alternative: percentage of layers to remove (0-100)
CUSTOM_LAYER_INDICES = [12, 13, 14, 15]  # Alternative: specific layers to remove

# Test prompts for evaluation
TEST_PROMPTS = [
    "Paris is the capital of",
    "The theory of relativity states that",
    "Machine learning is a field of",
]

print("Configuration set successfully!")
print(f"Models to test: {MODELS_TO_TEST}")
print(f"Layers to remove: {NUM_LAYERS_TO_REMOVE}")
print(f"Depth pruning percentage: {DEPTH_PRUNING_PERCENTAGE}%")
print(f"Custom layer indices: {CUSTOM_LAYER_INDICES}")

## Introduction to Depth Pruning

---
This example demonstrates depth pruning of transformer models.

Depth pruning removes entire transformer layers, which is more aggressive than neuron-level pruning but can lead to significant efficiency gains. This approach maintains the model architecture while reducing the total number of layers.

## Utility Functions

In [None]:
def count_parameters(model):
    """Count total parameters in model"""
    return sum(p.numel() for p in model.parameters())

def count_layers(model):
    """Count the number of transformer layers in the model"""
    from optipfair.pruning.utils import get_model_layers
    layers = get_model_layers(model)
    return len(layers) if layers else 0

def test_model_generation(model, tokenizer, prompt, max_length=50):
    """Test text generation with the model"""
    inputs = tokenizer(prompt, return_tensors='pt').to(device)

    with torch.no_grad():
        outputs = model.generate(
            inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_length=max_length,
            num_return_sequences=1,
            pad_token_id=tokenizer.pad_token_id,
            do_sample=False,
            num_beams=3,
            early_stopping=True,
            no_repeat_ngram_size=2
        )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

def print_model_info(model, model_name, stage=""):
    """Print basic model information"""
    param_count = count_parameters(model)
    layer_count = count_layers(model)
    print(f"{stage} Model: {model_name}")
    print(f"Parameters: {param_count:,}")
    print(f"Layers: {layer_count}")
    return param_count, layer_count

def cleanup_memory():
    """Clean up GPU memory - important for Colab"""
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()

print("Utility functions defined successfully!")


## Depth Pruning Parameters Explanation
• **model**: The model to be pruned

• **num_layers_to_remove**: Number of layers to remove (mutually exclusive with other options)

• **layer_indices**: Specific layer indices to remove (mutually exclusive with other options)

• **depth_pruning_percentage**: Percentage of layers to remove (0-100) (mutually exclusive with other options)

• **layer_selection_method**: Method for selecting layers:
  - 'last': Remove the last N layers (recommended for maintaining performance)
  - 'custom': Remove specific layers (requires layer_indices)
  
• **show_progress**: Display progress bar during pruning

## Example 1 - Depth Pruning by Number of Layers

In [None]:
def example_depth_pruning_by_count(model, tokenizer, model_name):
    """Example of depth pruning by removing a specific number of layers"""
    print(f"=== Example 1: Removing {NUM_LAYERS_TO_REMOVE} layers from {model_name} ===")

    # Get original model info
    original_params, original_layers = print_model_info(model, model_name, "Original")

    # Test original model
    print("\n--- Original Model Generation ---")
    for prompt in TEST_PROMPTS[:2]:  # Test first 2 prompts
        generated = test_model_generation(model, tokenizer, prompt)
        print(f"Prompt: '{prompt}'")
        print(f"Generated: {generated}")
        print()

    # Apply depth pruning by number of layers
    pruned_model = prune_model_depth(
        model=model,
        num_layers_to_remove=NUM_LAYERS_TO_REMOVE,
        layer_selection_method="last",  # Remove the last layers
        show_progress=True
    )

    # Get pruned model info
    pruned_params, pruned_layers = print_model_info(pruned_model, model_name, "\n--- Pruned")

    # Calculate reduction
    param_reduction = original_params - pruned_params
    param_reduction_pct = (param_reduction / original_params) * 100
    layer_reduction = original_layers - pruned_layers
    
    print(f"\n--- Pruning Results ---")
    print(f"Parameter reduction: {param_reduction:,} ({param_reduction_pct:.2f}%)")
    print(f"Layer reduction: {layer_reduction} layers ({layer_reduction / original_layers * 100:.2f}%)")

    # Test pruned model
    print("\n--- Pruned Model Generation ---")
    for prompt in TEST_PROMPTS[:2]:
        generated = test_model_generation(pruned_model, tokenizer, prompt)
        print(f"Prompt: '{prompt}'")
        print(f"Generated: {generated}")
        print()

    return pruned_model, {
        'original_parameters': original_params,
        'pruned_parameters': pruned_params,
        'parameter_reduction': param_reduction,
        'parameter_reduction_pct': param_reduction_pct,
        'original_layers': original_layers,
        'pruned_layers': pruned_layers,
        'layer_reduction': layer_reduction
    }

print("Example 1 function defined!")


## Example 2 - Depth Pruning by Percentage

In [None]:
def example_depth_pruning_by_percentage(model, tokenizer, model_name):
    """Example of depth pruning by removing a percentage of layers"""
    print(f"=== Example 2: Removing {DEPTH_PRUNING_PERCENTAGE}% of layers from {model_name} ===")

    # Get original model info
    original_params, original_layers = print_model_info(model, model_name, "Original")

    # Apply depth pruning by percentage
    pruned_model = prune_model_depth(
        model=model,
        depth_pruning_percentage=DEPTH_PRUNING_PERCENTAGE,
        layer_selection_method="last",  # Remove the last layers
        show_progress=True
    )

    # Get pruned model info
    pruned_params, pruned_layers = print_model_info(pruned_model, model_name, "\n--- Pruned")

    # Calculate reduction
    param_reduction = original_params - pruned_params
    param_reduction_pct = (param_reduction / original_params) * 100
    layer_reduction = original_layers - pruned_layers
    
    print(f"\n--- Pruning Results ---")
    print(f"Target layer reduction: {DEPTH_PRUNING_PERCENTAGE}%")
    print(f"Actual layer reduction: {layer_reduction} layers ({layer_reduction / original_layers * 100:.2f}%)")
    print(f"Parameter reduction: {param_reduction:,} ({param_reduction_pct:.2f}%)")

    # Test pruned model with one prompt
    print("\n--- Pruned Model Generation ---")
    prompt = TEST_PROMPTS[0]
    generated = test_model_generation(pruned_model, tokenizer, prompt)
    print(f"Prompt: '{prompt}'")
    print(f"Generated: {generated}")
    print()

    return pruned_model, {
        'original_parameters': original_params,
        'pruned_parameters': pruned_params,
        'parameter_reduction': param_reduction,
        'parameter_reduction_pct': param_reduction_pct,
        'original_layers': original_layers,
        'pruned_layers': pruned_layers,
        'layer_reduction': layer_reduction
    }

print("Example 2 function defined!")


## Example 3 - Custom Layer Selection

In [None]:
def example_depth_pruning_custom_layers(model, tokenizer, model_name):
    """Example of depth pruning by removing specific layers"""
    print(f"=== Example 3: Removing custom layers {CUSTOM_LAYER_INDICES} from {model_name} ===")

    # Get original model info
    original_params, original_layers = print_model_info(model, model_name, "Original")
    
    # Validate custom indices
    valid_indices = [i for i in CUSTOM_LAYER_INDICES if 0 <= i < original_layers]
    if len(valid_indices) != len(CUSTOM_LAYER_INDICES):
        print(f"Warning: Some custom indices are invalid. Using valid indices: {valid_indices}")
    
    if not valid_indices:
        print("Error: No valid layer indices to remove")
        return None, None

    # Apply depth pruning with custom layer indices
    pruned_model = prune_model_depth(
        model=model,
        layer_indices=valid_indices,
        show_progress=True
    )

    # Get pruned model info
    pruned_params, pruned_layers = print_model_info(pruned_model, model_name, "\n--- Pruned")

    # Calculate reduction
    param_reduction = original_params - pruned_params
    param_reduction_pct = (param_reduction / original_params) * 100
    layer_reduction = len(valid_indices)
    
    print(f"\n--- Pruning Results ---")
    print(f"Removed layers: {valid_indices}")
    print(f"Layer reduction: {layer_reduction} layers ({layer_reduction / original_layers * 100:.2f}%)")
    print(f"Parameter reduction: {param_reduction:,} ({param_reduction_pct:.2f}%)")

    # Test pruned model with one prompt
    print("\n--- Pruned Model Generation ---")
    prompt = TEST_PROMPTS[0]
    generated = test_model_generation(pruned_model, tokenizer, prompt)
    print(f"Prompt: '{prompt}'")
    print(f"Generated: {generated}")
    print()

    return pruned_model, {
        'original_parameters': original_params,
        'pruned_parameters': pruned_params,
        'parameter_reduction': param_reduction,
        'parameter_reduction_pct': param_reduction_pct,
        'original_layers': original_layers,
        'pruned_layers': pruned_layers,
        'layer_reduction': layer_reduction,
        'removed_indices': valid_indices
    }

print("Example 3 function defined!")

## Run Example 1 - Depth Pruning by Layer Count

In [None]:
print("Starting Example 1: Depth Pruning by Layer Count")
print("=" * 50)

# Process the first model in the list
model_name = MODELS_TO_TEST[0]
print(f"Loading model: {model_name}")

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16 if device.type == 'cuda' else torch.float32,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Set pad token if not present
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Run Example 1
pruned_model_1, stats_1 = example_depth_pruning_by_count(model, tokenizer, model_name)

# Store stats for summary
results = [{
    'model': model_name,
    'method': 'Layer Count',
    'param_reduction': stats_1['parameter_reduction_pct'],
    'layer_reduction': stats_1['layer_reduction'],
    'layers_removed': f"{stats_1['layer_reduction']}/{stats_1['original_layers']}"
}]

print(f"\nExample 1 completed! Parameter reduction: {stats_1['parameter_reduction_pct']:.2f}%")

## Run Example 2 - Depth Pruning by Percentage

In [None]:
print("Starting Example 2: Depth Pruning by Percentage")
print("=" * 50)

# Clean up memory from previous example
cleanup_memory()

# Reload model for second example (since first one was modified)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16 if device.type == 'cuda' else torch.float32,
    device_map="auto"
)

# Run Example 2
pruned_model_2, stats_2 = example_depth_pruning_by_percentage(model, tokenizer, model_name)

# Add to results
results.append({
    'model': model_name,
    'method': 'Percentage',
    'param_reduction': stats_2['parameter_reduction_pct'],
    'layer_reduction': stats_2['layer_reduction'],
    'layers_removed': f"{stats_2['layer_reduction']}/{stats_2['original_layers']}"
})

print(f"\nExample 2 completed! Parameter reduction: {stats_2['parameter_reduction_pct']:.2f}%")

## Run Example 3 - Custom Layer Selection

In [None]:
print("Starting Example 3: Custom Layer Selection")
print("=" * 50)

# Clean up memory from previous example
cleanup_memory()

# Reload model for third example (since previous ones were modified)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16 if device.type == 'cuda' else torch.float32,
    device_map="auto"
)

# Run Example 3
pruned_model_3, stats_3 = example_depth_pruning_custom_layers(model, tokenizer, model_name)

if stats_3 is not None:
    # Add to results
    results.append({
        'model': model_name,
        'method': 'Custom Layers',
        'param_reduction': stats_3['parameter_reduction_pct'],
        'layer_reduction': stats_3['layer_reduction'],
        'layers_removed': f"{stats_3['layer_reduction']}/{stats_3['original_layers']}",
        'custom_indices': stats_3['removed_indices']
    })
    
    print(f"\nExample 3 completed! Parameter reduction: {stats_3['parameter_reduction_pct']:.2f}%")
else:
    print("\nExample 3 skipped due to invalid layer indices.")

## Results Summary

In [None]:
print("\n" + "="*70)
print("DEPTH PRUNING RESULTS SUMMARY")
print("="*70)
print(f"{'Model':<30} {'Method':<15} {'Param Reduction':<15} {'Layers Removed':<15}")
print("-" * 70)

for result in results:
    print(f"{result['model']:<30} {result['method']:<15} {result['param_reduction']:<15.2f}% {result['layers_removed']:<15}")
    if 'custom_indices' in result:
        print(f"{'':>30} {'':>15} Custom indices: {result['custom_indices']}")

print(f"\nTotal examples tested: {len(results)}")
print("Depth pruning examples completed successfully!")

---
## ✅ Success! What's Next?

Congratulations! You've successfully performed depth pruning on a transformer model and seen how OptiPFair makes layer removal simple and configurable.

**Key takeaways from depth pruning:**
- Depth pruning removes entire layers, leading to significant parameter reduction
- The "last" layer selection method is generally recommended for maintaining performance
- Custom layer selection gives you fine-grained control over which layers to remove
- Percentage-based pruning allows for consistent reduction ratios across different models

If you found this notebook useful, the best way to support the OptiPFair project is by **starring it on GitHub**. Your support is a huge help in boosting the project's visibility and reaching more developers and researchers.

### ➡️ [**Star OptiPFair on GitHub**](https://github.com/peremartra/optipfair)

---
You can also follow my work and new projects on:

* **[LinkedIn](https://www.linkedin.com/in/pere-martra/)**
* **[X / Twitter](https://twitter.com/PereMartra)**