# 🔧 Adapter Tuning Basics

This notebook introduces the basics of Adapter Tuning - a parameter-efficient fine-tuning method.

## What is Adapter Tuning?

Adapter Tuning inserts small neural network modules (adapters) into pre-trained models:
- **Freezes** the original model parameters
- **Trains** only the adapter parameters
- **Achieves** comparable performance with much fewer parameters

## Architecture

```
Original Layer: Input → Transformer → Output
With Adapter:   Input → Transformer → Adapter → Output
```

Each Adapter is a bottleneck module:
```
Input → Down-projection → Activation → Up-projection → Residual → Output
```

## 1. Setup Environment

In [None]:
# Install required packages (run once)
# !pip install -r ../requirements.txt

import sys
sys.path.append('..')

import torch
import numpy as np
from datasets import load_dataset

# Import our adapter tuning modules
from config import ModelConfig, AdapterConfig, TrainingConfig
from adapters import AdapterModel, BottleneckAdapter
from data import TextClassificationPreprocessor
from training import AdapterTrainer
from inference import AdapterInferencePipeline

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

## 2. Understanding Adapter Architecture

In [None]:
# Create a simple adapter to understand the architecture
input_size = 768  # BERT hidden size
adapter_size = 64  # Bottleneck size

adapter = BottleneckAdapter(
    input_size=input_size,
    adapter_size=adapter_size,
    dropout=0.1,
    activation="relu"
)

print("Adapter Architecture:")
print(adapter)

# Count parameters
total_params = sum(p.numel() for p in adapter.parameters())
print(f"\nAdapter parameters: {total_params:,}")
print(f"Compression ratio: {input_size / adapter_size:.1f}x")

# Test forward pass
batch_size, seq_len = 2, 10
dummy_input = torch.randn(batch_size, seq_len, input_size)

output = adapter(dummy_input)
print(f"\nInput shape: {dummy_input.shape}")
print(f"Output shape: {output.shape}")
print(f"Residual connection works: {torch.allclose(dummy_input + torch.zeros_like(dummy_input), output, atol=1e-3)}")

## 3. Create Adapter Model

In [None]:
# Configuration for a small model (for demo)
model_config = ModelConfig(
    model_name_or_path="distilbert-base-uncased",
    num_labels=2,
    max_length=128,
    task_type="classification"
)

adapter_config = AdapterConfig(
    adapter_size=32,  # Small adapter for demo
    adapter_dropout=0.1,
    adapter_activation="relu",
    adapter_location="both",  # Add to both attention and feedforward
    freeze_base_model=True
)

# Create adapter model
print("Creating adapter model...")
adapter_model = AdapterModel(model_config, adapter_config)

# Print model information
adapter_model.print_adapter_info()

## 4. Load and Prepare Data

In [None]:
# Load a small dataset for demo
print("Loading dataset...")
dataset = load_dataset("imdb")

# Take small subsets for quick demo
train_dataset = dataset["train"].select(range(100))
eval_dataset = dataset["test"].select(range(50))

print(f"Train examples: {len(train_dataset)}")
print(f"Eval examples: {len(eval_dataset)}")

# Show example
example = train_dataset[0]
print(f"\nExample:")
print(f"Text: {example['text'][:200]}...")
print(f"Label: {example['label']} ({'Positive' if example['label'] == 1 else 'Negative'})")

## 5. Quick Training Demo

In [None]:
# Quick training configuration
training_config = TrainingConfig(
    output_dir="./demo_results",
    num_train_epochs=1,  # Just 1 epoch for demo
    per_device_train_batch_size=8,
    learning_rate=2e-3,
    evaluation_strategy="epoch",
    logging_steps=10,
    save_strategy="no",  # Don't save for demo
    freeze_base_model=True,
    train_adapters_only=True
)

# Create trainer
trainer = AdapterTrainer(
    model_config=model_config,
    adapter_config=adapter_config,
    training_config=training_config,
    task_type="classification"
)

# Setup preprocessor
adapter_model = trainer.setup_model()
tokenizer = trainer.tokenizer

preprocessor = TextClassificationPreprocessor(
    tokenizer=tokenizer,
    text_column="text",
    label_column="label",
    max_length=128
)

print("Starting quick training demo...")
train_result = trainer.train(
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    preprocessor=preprocessor
)

print(f"Training completed!")
print(f"Final loss: {train_result.training_loss:.4f}")

## 6. Test Inference

In [None]:
# Test the trained adapter
test_texts = [
    "This movie is absolutely fantastic!",
    "Terrible movie, complete waste of time.",
    "The movie was okay, nothing special.",
    "Amazing acting and great storyline!"
]

# Create inference pipeline
inference_pipeline = AdapterInferencePipeline(
    model_path="./demo_results",
    model_config=model_config,
    adapter_config=adapter_config
)

# Get predictions
predictions = inference_pipeline.classify_text(
    test_texts,
    return_all_scores=True
)

print("Inference Results:")
for text, pred in zip(test_texts, predictions):
    sentiment = "Positive" if pred[0]["label"] == "LABEL_1" else "Negative"
    confidence = pred[0]["score"]
    
    print(f"Text: {text}")
    print(f"Prediction: {sentiment} (confidence: {confidence:.3f})")
    print("-" * 50)

## 7. Adapter Efficiency Analysis

In [None]:
# Analyze adapter efficiency
adapter_info = adapter_model.adapter_info

print("Adapter Efficiency Analysis:")
print(f"Total model parameters: {adapter_info['total_params']:,}")
print(f"Base model parameters: {adapter_info['base_params']:,}")
print(f"Adapter parameters: {adapter_info['total_adapter_params']:,}")
print(f"Adapter percentage: {adapter_info['adapter_percentage']:.2f}%")

# Compare with full fine-tuning
full_finetuning_params = adapter_info['total_params']
adapter_params = adapter_info['total_adapter_params']
reduction_factor = full_finetuning_params / adapter_params

print(f"\nParameter Reduction:")
print(f"Full fine-tuning would train: {full_finetuning_params:,} parameters")
print(f"Adapter tuning trains only: {adapter_params:,} parameters")
print(f"Reduction factor: {reduction_factor:.1f}x fewer parameters")

# Memory estimation
bytes_per_param = 4  # float32
full_memory_mb = (full_finetuning_params * bytes_per_param) / (1024 * 1024)
adapter_memory_mb = (adapter_params * bytes_per_param) / (1024 * 1024)

print(f"\nMemory Usage (approximate):")
print(f"Full fine-tuning: {full_memory_mb:.1f} MB")
print(f"Adapter tuning: {adapter_memory_mb:.1f} MB")
print(f"Memory savings: {full_memory_mb - adapter_memory_mb:.1f} MB")

## 8. Adapter Visualization

In [None]:
import matplotlib.pyplot as plt

# Visualize adapter placement in the model
def visualize_adapter_placement(adapter_model):
    """Visualize where adapters are placed in the model"""
    
    # Count adapters in each layer
    adapter_counts = []
    layer_names = []
    
    # Get encoder layers
    if hasattr(adapter_model.base_model, 'distilbert'):
        encoder = adapter_model.base_model.distilbert.transformer
    elif hasattr(adapter_model.base_model, 'bert'):
        encoder = adapter_model.base_model.bert.encoder
    else:
        print("Model architecture not supported for visualization")
        return
    
    for i, layer in enumerate(encoder.layer):
        count = 0
        if hasattr(layer, 'attention_adapter'):
            count += 1
        if hasattr(layer, 'feedforward_adapter'):
            count += 1
        
        adapter_counts.append(count)
        layer_names.append(f"Layer {i}")
    
    # Create visualization
    plt.figure(figsize=(12, 6))
    
    # Bar plot of adapter counts
    plt.subplot(1, 2, 1)
    bars = plt.bar(layer_names, adapter_counts, color='skyblue', alpha=0.7)
    plt.title('Adapters per Layer')
    plt.xlabel('Transformer Layers')
    plt.ylabel('Number of Adapters')
    plt.xticks(rotation=45)
    
    # Add value labels on bars
    for bar, count in zip(bars, adapter_counts):
        if count > 0:
            plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.05,
                    str(count), ha='center', va='bottom')
    
    # Pie chart of parameter distribution
    plt.subplot(1, 2, 2)
    sizes = [adapter_info['base_params'], adapter_info['total_adapter_params']]
    labels = ['Base Model', 'Adapters']
    colors = ['lightcoral', 'skyblue']
    
    plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=90)
    plt.title('Parameter Distribution')
    
    plt.tight_layout()
    plt.show()
    
    print(f"Total layers: {len(layer_names)}")
    print(f"Layers with adapters: {sum(1 for count in adapter_counts if count > 0)}")
    print(f"Total adapters: {sum(adapter_counts)}")

# Create visualization
visualize_adapter_placement(adapter_model)

## 9. Key Takeaways

From this notebook, you learned:

1. **Adapter Architecture**: Small bottleneck modules inserted into transformer layers
2. **Parameter Efficiency**: Only 0.5-3% additional parameters needed
3. **Training Process**: Freeze base model, train only adapters
4. **Performance**: Comparable results to full fine-tuning
5. **Memory Benefits**: Significant reduction in memory usage

## Next Steps

- Try different adapter sizes and configurations
- Experiment with different tasks (NER, QA, etc.)
- Explore multi-task learning with multiple adapters
- Compare adapter methods (LoRA vs Adapters)

## Resources

- [Original Adapter Paper](https://arxiv.org/abs/1902.00751)
- [AdapterHub](https://adapterhub.ml/)
- [Parameter-Efficient Transfer Learning Survey](https://arxiv.org/abs/2106.04647)