# üéØ Model Compression Practice: From Theory to Implementation

## Table of Contents
1. [Knowledge Distillation](#practice-1-knowledge-distillation)
2. [Temperature Scaling](#practice-2-temperature-scaling)
3. [INT8 Quantization](#practice-3-int8-quantization)
4. [Magnitude Pruning](#practice-4-magnitude-pruning)
5. [Complete Pipeline: Compress a Medical Image Classifier](#practice-5-complete-pipeline)

## Installing and Importing Essential Libraries

In [None]:
# Import essential libraries
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
import torchvision
import torchvision.transforms as transforms
from torchvision import datasets, models
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"‚úÖ Using device: {device}")
print(f"‚úÖ PyTorch version: {torch.__version__}")

---
## Practice 1: Knowledge Distillation

### üéØ Learning Objectives
- Understand the Teacher-Student framework
- Implement soft target training
- Learn how knowledge transfers from large to small models

### üìñ Key Concepts
**Knowledge Distillation:** Transfer knowledge from a large "Teacher" model to a smaller "Student" model
- **Teacher Model:** Large, accurate model (pre-trained)
- **Student Model:** Small, efficient model (to be trained)
- **Soft Targets:** Probability distributions from teacher (richer than hard labels)

In [None]:
# 1.1 Prepare a simple dataset (CIFAR-10 subset for speed)
def prepare_data(num_samples=1000):
    """Prepare a small CIFAR-10 dataset for quick practice"""
    
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])
    
    # Load CIFAR-10
    trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
    
    # Use subset for quick training
    train_subset = torch.utils.data.Subset(trainset, range(num_samples))
    test_subset = torch.utils.data.Subset(testset, range(num_samples // 5))
    
    train_loader = DataLoader(train_subset, batch_size=64, shuffle=True)
    test_loader = DataLoader(test_subset, batch_size=64, shuffle=False)
    
    print(f"‚úÖ Training samples: {len(train_subset)}")
    print(f"‚úÖ Test samples: {len(test_subset)}")
    
    return train_loader, test_loader

train_loader, test_loader = prepare_data()

In [None]:
# 1.2 Define Teacher and Student models
class TeacherModel(nn.Module):
    """Large teacher model (ResNet-like)"""
    def __init__(self):
        super(TeacherModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, 3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, 3, padding=1)
        self.conv3 = nn.Conv2d(128, 128, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(128 * 4 * 4, 256)
        self.fc2 = nn.Linear(256, 10)
        self.dropout = nn.Dropout(0.2)
        
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 128 * 4 * 4)
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.fc2(x)
        return x

class StudentModel(nn.Module):
    """Small student model (MobileNet-like)"""
    def __init__(self):
        super(StudentModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 8 * 8, 64)
        self.fc2 = nn.Linear(64, 10)
        
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 32 * 8 * 8)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Create models and show size comparison
teacher = TeacherModel().to(device)
student = StudentModel().to(device)

teacher_params = sum(p.numel() for p in teacher.parameters())
student_params = sum(p.numel() for p in student.parameters())

print(f"\nüìä Model Size Comparison:")
print(f"Teacher model: {teacher_params:,} parameters")
print(f"Student model: {student_params:,} parameters")
print(f"Compression ratio: {teacher_params / student_params:.1f}x smaller")

---
## Practice 2: Temperature Scaling

### üéØ Learning Objectives
- Understand temperature parameter T in softmax
- Visualize how temperature affects probability distributions
- Implement distillation loss with temperature

### üìñ Key Concepts
**Temperature Scaling:** $p_i = \frac{\exp(z_i/T)}{\sum_j \exp(z_j/T)}$
- **T = 1:** Normal softmax (sharp distribution)
- **T = 3~5:** Optimal for distillation (soft distribution)
- **T ‚Üí ‚àû:** Uniform distribution

In [None]:
# 2.1 Visualize temperature effects
def visualize_temperature_effect():
    """Show how temperature affects softmax output"""
    
    # Sample logits
    logits = torch.tensor([2.0, 1.0, 0.5, 0.1])
    temperatures = [1, 3, 5, 10]
    
    fig, axes = plt.subplots(1, 4, figsize=(16, 4))
    
    for idx, T in enumerate(temperatures):
        # Apply temperature scaling
        scaled_logits = logits / T
        probs = F.softmax(scaled_logits, dim=0).numpy()
        
        # Plot
        axes[idx].bar(range(4), probs, color=['#1E64C8', '#51cf66', '#ffa500', '#ff6b6b'])
        axes[idx].set_title(f'T = {T}', fontsize=14, fontweight='bold')
        axes[idx].set_ylim([0, 1])
        axes[idx].set_ylabel('Probability')
        axes[idx].set_xlabel('Class')
        
        # Add value labels
        for i, v in enumerate(probs):
            axes[idx].text(i, v + 0.02, f'{v:.3f}', ha='center', fontsize=10)
    
    plt.tight_layout()
    plt.savefig('/mnt/user-data/outputs/temperature_effect.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("\nüìä Observation:")
    print("  ‚Ä¢ T=1: Sharp distribution (one class dominates)")
    print("  ‚Ä¢ T=3~5: Softer distribution (preserves class relationships)")
    print("  ‚Ä¢ T=10: Nearly uniform (loses discriminative information)")

visualize_temperature_effect()

In [None]:
# 2.2 Implement distillation loss
def distillation_loss(student_logits, teacher_logits, labels, T=3, alpha=0.7):
    """
    Calculate distillation loss combining soft and hard targets
    
    Args:
        student_logits: Student model outputs
        teacher_logits: Teacher model outputs
        labels: True labels
        T: Temperature for softening distributions
        alpha: Weight for soft loss (1-alpha for hard loss)
    """
    # Soft targets loss (KL divergence with temperature)
    soft_targets = F.softmax(teacher_logits / T, dim=1)
    soft_student = F.log_softmax(student_logits / T, dim=1)
    soft_loss = F.kl_div(soft_student, soft_targets, reduction='batchmean') * (T * T)
    
    # Hard targets loss (standard cross-entropy)
    hard_loss = F.cross_entropy(student_logits, labels)
    
    # Combined loss
    total_loss = alpha * soft_loss + (1 - alpha) * hard_loss
    
    return total_loss, soft_loss, hard_loss

# Test the loss function
dummy_student = torch.randn(4, 10)
dummy_teacher = torch.randn(4, 10)
dummy_labels = torch.tensor([1, 3, 5, 7])

total, soft, hard = distillation_loss(dummy_student, dummy_teacher, dummy_labels)
print(f"\nüìä Distillation Loss Components:")
print(f"  Soft loss (from teacher): {soft.item():.4f}")
print(f"  Hard loss (from labels): {hard.item():.4f}")
print(f"  Total loss: {total.item():.4f}")

---
## Practice 3: INT8 Quantization

### üéØ Learning Objectives
- Understand quantization from FP32 to INT8
- Implement post-training quantization
- Measure model size reduction

### üìñ Key Concepts
**Quantization:** Convert floating-point to integers
- **FP32:** 32 bits (4 bytes) - Default training precision
- **INT8:** 8 bits (1 byte) - 75% memory reduction
- **Benefits:** Faster inference, lower memory, better battery life

In [None]:
# 3.1 Implement simple quantization
def quantize_model_simple(model):
    """
    Apply PyTorch dynamic quantization
    (Simple post-training quantization without calibration)
    """
    # Create a copy
    quantized_model = torch.quantization.quantize_dynamic(
        model,
        {nn.Linear, nn.Conv2d},  # Layers to quantize
        dtype=torch.qint8
    )
    
    return quantized_model

# Apply quantization to student model
student_fp32 = StudentModel().to('cpu')  # Quantization requires CPU
student_int8 = quantize_model_simple(student_fp32)

# Compare sizes
def get_model_size(model):
    """Calculate model size in MB"""
    torch.save(model.state_dict(), "/tmp/temp_model.pth")
    size_mb = os.path.getsize("/tmp/temp_model.pth") / (1024 * 1024)
    return size_mb

import os
size_fp32 = get_model_size(student_fp32)
size_int8 = get_model_size(student_int8)

print(f"\nüìä Quantization Results:")
print(f"  FP32 model size: {size_fp32:.2f} MB")
print(f"  INT8 model size: {size_int8:.2f} MB")
print(f"  Size reduction: {(1 - size_int8/size_fp32)*100:.1f}%")
print(f"  Compression ratio: {size_fp32/size_int8:.1f}x smaller")

In [None]:
# 3.2 Visualize weight distributions before and after quantization
def visualize_quantization():
    """Compare weight distributions"""
    
    # Get weights from first conv layer
    fp32_weights = student_fp32.conv1.weight.data.flatten().numpy()
    
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    # FP32 distribution
    axes[0].hist(fp32_weights, bins=50, color='#1E64C8', alpha=0.7, edgecolor='black')
    axes[0].set_title('FP32 Weights Distribution', fontsize=14, fontweight='bold')
    axes[0].set_xlabel('Weight Value')
    axes[0].set_ylabel('Frequency')
    axes[0].grid(alpha=0.3)
    
    # Simulated INT8 (quantized to 256 levels)
    min_val, max_val = fp32_weights.min(), fp32_weights.max()
    scale = (max_val - min_val) / 255
    int8_weights = np.round((fp32_weights - min_val) / scale)
    int8_weights = int8_weights * scale + min_val
    
    axes[1].hist(int8_weights, bins=50, color='#51cf66', alpha=0.7, edgecolor='black')
    axes[1].set_title('INT8 Weights Distribution', fontsize=14, fontweight='bold')
    axes[1].set_xlabel('Weight Value')
    axes[1].set_ylabel('Frequency')
    axes[1].grid(alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('/mnt/user-data/outputs/quantization_comparison.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print(f"\nüìä Weight Statistics:")
    print(f"  FP32 - Min: {fp32_weights.min():.6f}, Max: {fp32_weights.max():.6f}")
    print(f"  INT8 - Unique values: {len(np.unique(int8_weights))} (vs 256 possible levels)")

visualize_quantization()

---
## Practice 4: Magnitude Pruning

### üéØ Learning Objectives
- Understand magnitude-based pruning
- Implement weight pruning with different sparsity levels
- Visualize sparsity patterns

### üìñ Key Concepts
**Magnitude Pruning:** Remove weights with small absolute values
- Set small weights to zero ‚Üí sparse network
- **Sparsity:** Percentage of zero weights (e.g., 50% = half weights are zero)
- **Threshold:** Weights below threshold ‚Üí 0

In [None]:
# 4.1 Implement magnitude pruning
def magnitude_pruning(model, sparsity=0.5):
    """
    Apply magnitude-based pruning to all Conv2d and Linear layers
    
    Args:
        model: PyTorch model
        sparsity: Fraction of weights to prune (0 to 1)
    """
    print(f"\nüî™ Applying {sparsity*100:.0f}% magnitude pruning...\n")
    
    for name, module in model.named_modules():
        if isinstance(module, (nn.Conv2d, nn.Linear)):
            # Get weights
            weights = module.weight.data
            
            # Calculate threshold
            threshold = torch.quantile(torch.abs(weights), sparsity)
            
            # Create mask
            mask = torch.abs(weights) > threshold
            
            # Apply mask
            weights *= mask.float()
            
            # Statistics
            total_params = weights.numel()
            zero_params = (weights == 0).sum().item()
            actual_sparsity = zero_params / total_params
            
            print(f"  {name:20s} | Sparsity: {actual_sparsity*100:5.1f}% | "
                  f"Params: {total_params:7,} | Zeros: {zero_params:7,}")
    
    return model

# Test pruning
student_pruned = StudentModel()
student_pruned = magnitude_pruning(student_pruned, sparsity=0.5)

In [None]:
# 4.2 Visualize pruning effect on different sparsity levels
def compare_sparsity_levels():
    """Compare different pruning levels"""
    
    sparsity_levels = [0.0, 0.3, 0.5, 0.7, 0.9]
    
    fig, axes = plt.subplots(1, 5, figsize=(20, 4))
    
    for idx, sparsity in enumerate(sparsity_levels):
        # Create fresh model and prune
        model = StudentModel()
        if sparsity > 0:
            model = magnitude_pruning(model, sparsity=sparsity)
        
        # Get first conv layer weights
        weights = model.conv1.weight.data[0, 0].numpy()  # First filter, first channel
        
        # Plot
        im = axes[idx].imshow(weights, cmap='RdBu_r', vmin=-0.5, vmax=0.5)
        axes[idx].set_title(f'Sparsity: {sparsity*100:.0f}%', fontsize=12, fontweight='bold')
        axes[idx].axis('off')
        
        # Count zeros
        zeros = (weights == 0).sum()
        total = weights.size
        axes[idx].text(0.5, -0.1, f'{zeros}/{total} zeros', 
                      ha='center', transform=axes[idx].transAxes, fontsize=10)
    
    plt.colorbar(im, ax=axes, fraction=0.046, pad=0.04)
    plt.tight_layout()
    plt.savefig('/mnt/user-data/outputs/pruning_visualization.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("\nüìä Observation:")
    print("  ‚Ä¢ White pixels = zero weights (pruned)")
    print("  ‚Ä¢ Colored pixels = remaining weights")
    print("  ‚Ä¢ Higher sparsity = more white (more compression, potential accuracy loss)")

compare_sparsity_levels()

---
## Practice 5: Complete Pipeline - Compress a Medical Image Classifier

### üéØ Learning Objectives
- Combine distillation + quantization + pruning
- Train a complete compressed model
- Evaluate the compression-accuracy tradeoff

### üìñ Key Concepts
**Complete Compression Pipeline:**
1. Train large teacher model (or use pre-trained)
2. Distill to smaller student
3. Apply pruning
4. Apply quantization
5. Measure final model size and accuracy

In [None]:
# 5.1 Quick training function
def train_one_epoch(model, train_loader, optimizer, criterion, device, use_distillation=False, teacher=None, T=3):
    """Train for one epoch"""
    model.train()
    if teacher is not None:
        teacher.eval()
    
    total_loss = 0
    correct = 0
    total = 0
    
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(inputs)
        
        if use_distillation and teacher is not None:
            with torch.no_grad():
                teacher_outputs = teacher(inputs)
            loss, _, _ = distillation_loss(outputs, teacher_outputs, labels, T=T)
        else:
            loss = criterion(outputs, labels)
        
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()
    
    return total_loss / len(train_loader), 100. * correct / total

def evaluate(model, test_loader, device):
    """Evaluate model accuracy"""
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()
    
    return 100. * correct / total

In [None]:
# 5.2 Complete compression pipeline
def complete_compression_pipeline():
    """Run the full model compression pipeline"""
    
    print("="*80)
    print("üöÄ COMPLETE MODEL COMPRESSION PIPELINE")
    print("="*80)
    
    results = {}
    
    # Step 1: Train baseline teacher
    print("\nüìö Step 1: Training Teacher Model (3 epochs)...")
    teacher = TeacherModel().to(device)
    optimizer = torch.optim.Adam(teacher.parameters(), lr=0.001)
    criterion = nn.CrossEntropyLoss()
    
    for epoch in range(3):
        loss, acc = train_one_epoch(teacher, train_loader, optimizer, criterion, device)
        print(f"  Epoch {epoch+1}/3 - Loss: {loss:.4f}, Acc: {acc:.2f}%")
    
    teacher_acc = evaluate(teacher, test_loader, device)
    teacher_size = sum(p.numel() for p in teacher.parameters())
    results['teacher'] = {'accuracy': teacher_acc, 'params': teacher_size}
    print(f"  ‚úÖ Teacher - Accuracy: {teacher_acc:.2f}%, Params: {teacher_size:,}")
    
    # Step 2: Train student with distillation
    print("\nüéì Step 2: Training Student with Distillation (3 epochs)...")
    student = StudentModel().to(device)
    optimizer = torch.optim.Adam(student.parameters(), lr=0.001)
    
    for epoch in range(3):
        loss, acc = train_one_epoch(student, train_loader, optimizer, criterion, device,
                                    use_distillation=True, teacher=teacher, T=3)
        print(f"  Epoch {epoch+1}/3 - Loss: {loss:.4f}, Acc: {acc:.2f}%")
    
    student_acc = evaluate(student, test_loader, device)
    student_size = sum(p.numel() for p in student.parameters())
    results['student'] = {'accuracy': student_acc, 'params': student_size}
    print(f"  ‚úÖ Student - Accuracy: {student_acc:.2f}%, Params: {student_size:,}")
    
    # Step 3: Apply pruning
    print("\nüî™ Step 3: Applying 50% Magnitude Pruning...")
    student_pruned = magnitude_pruning(student, sparsity=0.5)
    pruned_acc = evaluate(student_pruned, test_loader, device)
    pruned_size = sum((p != 0).sum().item() for p in student_pruned.parameters())
    results['pruned'] = {'accuracy': pruned_acc, 'params': pruned_size}
    print(f"  ‚úÖ Pruned - Accuracy: {pruned_acc:.2f}%, Non-zero params: {pruned_size:,}")
    
    # Step 4: Apply quantization
    print("\n‚öôÔ∏è  Step 4: Applying INT8 Quantization...")
    student_pruned_cpu = student_pruned.to('cpu')
    student_quantized = quantize_model_simple(student_pruned_cpu)
    
    # Evaluate quantized model (on CPU)
    test_loader_cpu = DataLoader(
        torch.utils.data.Subset(datasets.CIFAR10(root='./data', train=False, download=True, 
                                                 transform=transforms.Compose([
                                                     transforms.ToTensor(),
                                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                                                 ])), range(200)),
        batch_size=64, shuffle=False)
    
    quantized_acc = evaluate(student_quantized, test_loader_cpu, 'cpu')
    results['quantized'] = {'accuracy': quantized_acc, 'params': pruned_size}  # Same params, but INT8
    print(f"  ‚úÖ Quantized - Accuracy: {quantized_acc:.2f}%, Params: {pruned_size:,} (INT8)")
    
    return results, teacher, student, student_pruned, student_quantized

# Run the pipeline
results, teacher_final, student_final, pruned_final, quantized_final = complete_compression_pipeline()

In [None]:
# 5.3 Visualize final results
def visualize_compression_results(results):
    """Create comprehensive visualization of compression results"""
    
    models = ['Teacher', 'Student\n(Distilled)', 'Pruned\n(50%)', 'Quantized\n(INT8)']
    accuracies = [results['teacher']['accuracy'], results['student']['accuracy'],
                 results['pruned']['accuracy'], results['quantized']['accuracy']]
    params = [results['teacher']['params'], results['student']['params'],
             results['pruned']['params'], results['quantized']['params']]
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Accuracy comparison
    colors = ['#ff6b6b', '#ffa500', '#51cf66', '#1E64C8']
    bars1 = axes[0].bar(models, accuracies, color=colors, alpha=0.7, edgecolor='black', linewidth=2)
    axes[0].set_ylabel('Accuracy (%)', fontsize=12, fontweight='bold')
    axes[0].set_title('Model Accuracy Comparison', fontsize=14, fontweight='bold')
    axes[0].set_ylim([0, 100])
    axes[0].grid(axis='y', alpha=0.3)
    
    # Add value labels
    for bar, acc in zip(bars1, accuracies):
        height = bar.get_height()
        axes[0].text(bar.get_x() + bar.get_width()/2., height,
                    f'{acc:.1f}%', ha='center', va='bottom', fontsize=11, fontweight='bold')
    
    # Parameter count comparison (log scale)
    bars2 = axes[1].bar(models, params, color=colors, alpha=0.7, edgecolor='black', linewidth=2)
    axes[1].set_ylabel('Parameters (count)', fontsize=12, fontweight='bold')
    axes[1].set_title('Model Size Comparison', fontsize=14, fontweight='bold')
    axes[1].set_yscale('log')
    axes[1].grid(axis='y', alpha=0.3)
    
    # Add value labels
    for bar, param in zip(bars2, params):
        height = bar.get_height()
        axes[1].text(bar.get_x() + bar.get_width()/2., height,
                    f'{param:,}', ha='center', va='bottom', fontsize=10, fontweight='bold')
    
    plt.tight_layout()
    plt.savefig('/mnt/user-data/outputs/compression_results.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    # Print summary table
    print("\n" + "="*80)
    print("üìä FINAL COMPRESSION SUMMARY")
    print("="*80)
    print(f"{'Model':<20} {'Accuracy':<15} {'Parameters':<15} {'Compression Ratio':<20}")
    print("-"*80)
    
    baseline = results['teacher']['params']
    for name, key in zip(models, ['teacher', 'student', 'pruned', 'quantized']):
        acc = results[key]['accuracy']
        param = results[key]['params']
        ratio = baseline / param
        name_clean = name.replace('\n', ' ')
        print(f"{name_clean:<20} {acc:>6.2f}%        {param:>10,}      {ratio:>6.1f}x smaller")
    
    print("="*80)
    
    # Calculate metrics
    acc_drop = results['teacher']['accuracy'] - results['quantized']['accuracy']
    size_reduction = (1 - results['quantized']['params'] / results['teacher']['params']) * 100
    
    print(f"\nüéØ Key Metrics:")
    print(f"  ‚Ä¢ Accuracy drop: {acc_drop:.2f}%")
    print(f"  ‚Ä¢ Size reduction: {size_reduction:.1f}%")
    print(f"  ‚Ä¢ Final compression ratio: {baseline / results['quantized']['params']:.1f}x")
    print(f"\n‚úÖ Successfully compressed the model with minimal accuracy loss!")

visualize_compression_results(results)

---
## üéØ Practice Complete!

### Summary of What We Learned:

1. **Knowledge Distillation**: Transferring knowledge from teacher to student
   - Teacher-Student framework
   - Soft targets with temperature scaling
   
2. **Temperature Scaling**: Controlling the "softness" of probability distributions
   - T = 1: Sharp (standard softmax)
   - T = 3~5: Optimal for distillation
   - T ‚Üí ‚àû: Uniform

3. **INT8 Quantization**: Converting FP32 ‚Üí INT8
   - 75% memory reduction
   - Minimal accuracy loss
   - Faster inference

4. **Magnitude Pruning**: Removing small weights
   - Setting weights to zero based on magnitude
   - Creating sparse networks
   - Balancing sparsity and accuracy

5. **Complete Pipeline**: Combining all techniques
   - Distillation ‚Üí Pruning ‚Üí Quantization
   - Achieving 10-100x compression
   - Minimal accuracy degradation

### Key Insights:
- Model compression is essential for deploying AI on edge devices
- Different techniques address different aspects (knowledge, precision, sparsity)
- Combining techniques leads to the best results
- Always measure the accuracy-size tradeoff

### Real-World Applications:
- üì± **Mobile Health Apps**: Skin lesion classification on smartphones
- ‚åö **Wearable Devices**: Heart rate anomaly detection on smartwatches
- üè• **Point-of-Care Systems**: Real-time diagnosis in clinics
- üåç **Resource-Constrained Settings**: Medical AI in developing countries

### Next Steps:
- Experiment with different compression ratios
- Try on real medical imaging datasets
- Deploy models to mobile devices (TensorFlow Lite, Core ML)
- Measure actual inference time and battery consumption