# LayerWise-QAT: Sensitivity-Ordered Quantization

This notebook implements LayerWise-QAT, an extension of EfficientQAT with sensitivity-based layer ordering.

## Key Features:
- Sensitivity-ordered block training
- Adaptive learning rate scaling
- Multiple sensitivity metrics (Fisher, Gradient, Hessian)
- Memory-optimized for A100-40GB

In [None]:
# Check GPU and setup environment
import torch
print(f'CUDA Available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')
    print(f'GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f}GB')
else:
    print('No GPU available - this will not work!')

# Install required packages
!pip install -q transformers accelerate datasets lm_eval triton sentencepiece protobuf

In [None]:
# Clone the repository (if needed)
import os
if not os.path.exists('/content/EfficientQAT'):
    !git clone https://github.com/OpenGVLab/EfficientQAT.git
    %cd EfficientQAT
else:
    %cd /content/EfficientQAT

# Install remaining requirements
!pip install -r requirements.txt

## Step 1: Test Original EfficientQAT Baseline

First, let's validate that the original EfficientQAT works in this environment.

In [None]:
# Quick baseline test with small dataset
!python main_block_ap.py \
    --model meta-llama/Llama-2-7b-hf \
    --wbits 3 \
    --group_size 128 \
    --calib_dataset redpajama \
    --train_size 64 \
    --val_size 16 \
    --epochs 1 \
    --output_dir ./test_baseline \
    --max_memory "35GiB" \
    --eval_ppl

## Step 2: Test LayerWise-QAT with Sensitivity Ordering

Now let's test our new sensitivity-ordered training.

In [None]:
# Test LayerWise-QAT with gradient sensitivity (fastest)
!python main_block_ap.py \
    --model meta-llama/Llama-2-7b-hf \
    --wbits 3 \
    --group_size 128 \
    --calib_dataset redpajama \
    --train_size 64 \
    --val_size 16 \
    --epochs 1 \
    --layer_ordering sensitivity \
    --sensitivity_metric gradient \
    --sensitivity_samples 16 \
    --output_dir ./test_layerwise_gradient \
    --max_memory "35GiB" \
    --eval_ppl

In [None]:
# Test LayerWise-QAT with Fisher sensitivity
!python main_block_ap.py \
    --model meta-llama/Llama-2-7b-hf \
    --wbits 3 \
    --group_size 128 \
    --calib_dataset redpajama \
    --train_size 64 \
    --val_size 16 \
    --epochs 1 \
    --layer_ordering sensitivity \
    --sensitivity_metric fisher \
    --sensitivity_samples 16 \
    --output_dir ./test_layerwise_fisher \
    --max_memory "35GiB" \
    --eval_ppl

In [None]:
# Test with adaptive learning rate scaling
!python main_block_ap.py \
    --model meta-llama/Llama-2-7b-hf \
    --wbits 2 \
    --group_size 64 \
    --calib_dataset redpajama \
    --train_size 128 \
    --val_size 32 \
    --epochs 1 \
    --layer_ordering sensitivity \
    --sensitivity_metric gradient \
    --sensitivity_samples 32 \
    --adaptive_lr_scaling \
    --output_dir ./test_adaptive_lr \
    --max_memory "35GiB" \
    --eval_ppl

## Step 3: Full Experiment Comparison

Run a more comprehensive comparison with larger datasets.

In [None]:
# Run comparison script
!python compare_methods.py --model meta-llama/Llama-2-7b-hf

## Step 4: Analyze Results

Let's analyze the performance improvements.

In [None]:
import json
import pandas as pd

# Load and display results
try:
    with open('./comparison_results/comparison_results.json', 'r') as f:
        results = json.load(f)
    
    # Create results dataframe
    df_data = []
    for result in results:
        if result.get('success', False):
            row = {
                'Method': result.get('method', 'Unknown'),
                'Duration (s)': result.get('duration', 0),
                'WikiText2 PPL': result.get('wikitext2_ppl', 'N/A'),
                'Avg Accuracy (%)': result.get('avg_accuracy', 'N/A')
            }
            df_data.append(row)
    
    if df_data:
        df = pd.DataFrame(df_data)
        print("=== LayerWise-QAT Results Comparison ===")
        print(df.to_string(index=False))
        
        # Calculate improvements
        baseline = next((r for r in df_data if 'Baseline' in r['Method']), None)
        if baseline:
            print("\n=== Improvements over Baseline ===")
            for row in df_data:
                if 'Baseline' not in row['Method']:
                    method = row['Method']
                    if isinstance(row['Duration (s)'], (int, float)) and isinstance(baseline['Duration (s)'], (int, float)):
                        speedup = baseline['Duration (s)'] / row['Duration (s)']
                        print(f"{method}: {speedup:.2f}x speedup")
    else:
        print("No successful results found")
        
except FileNotFoundError:
    print("Results file not found. Run the comparison experiments first.")