# üöÄ AST Competition Starter - Train Faster, Save GPU Hours

**Adaptive Sparse Training** automatically selects the most important training samples, achieving:
- ‚ö° **60-70% energy savings** (train faster, use less GPU time)
- üéØ **Same or better accuracy** (curriculum learning effect)
- üí∞ **Lower costs** (less compute = lower Kaggle/Colab costs)

This notebook is a **drop-in replacement** for standard training loops.

---

## Installation

In [None]:
!pip install adaptive-sparse-training -q
!pip install timm -q  # For advanced models

## 1Ô∏è‚É£ Setup: Load Your Data

Replace this with your competition dataset

In [None]:
import torch
import torchvision
from torch.utils.data import DataLoader
import timm
from adaptive_sparse_training import AdaptiveSparseTrainer, ASTConfig

# Example: Image classification dataset
# TODO: Replace with your competition data
transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize((224, 224)),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Load your training data
train_dataset = torchvision.datasets.ImageFolder('/kaggle/input/your-data/train', transform=transform)
val_dataset = torchvision.datasets.ImageFolder('/kaggle/input/your-data/val', transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=2)

num_classes = len(train_dataset.classes)
print(f"üìä Dataset: {len(train_dataset)} train, {len(val_dataset)} val, {num_classes} classes")

## 2Ô∏è‚É£ Model Selection

Choose any model from [timm](https://github.com/huggingface/pytorch-image-models)

In [None]:
# Popular choices for competitions:
# - 'efficientnet_b0' (fast, accurate)
# - 'resnet50' (reliable baseline)
# - 'convnext_base' (state-of-the-art)
# - 'swin_base_patch4_window7_224' (transformer)

model = timm.create_model(
    'efficientnet_b0',
    pretrained=True,
    num_classes=num_classes
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
print(f"üîß Model: {model.__class__.__name__} on {device}")

## 3Ô∏è‚É£ AST Configuration

**Key parameter:** `target_activation_rate`
- `0.40` = 60% energy savings (recommended start)
- `0.25` = 75% energy savings (aggressive)
- `0.50` = 50% energy savings (conservative)

In [None]:
config = ASTConfig(
    target_activation_rate=0.35,  # 65% energy savings
    entropy_weight=1.0,           # Balance loss + entropy
    kp=0.1,                       # PI controller proportional gain
    ki=0.01,                      # PI controller integral gain
    use_mixed_precision=True,     # AMP for extra speedup
    min_active_samples=8,         # Safety: never skip entire batch
)

print("‚öôÔ∏è  AST Config:")
print(f"   Target Activation: {config.target_activation_rate:.0%}")
print(f"   Expected Savings: ~{(1-config.target_activation_rate)*100:.0f}%")

## 4Ô∏è‚É£ Training with Live Dashboard

AST automatically tracks energy, cost, and CO2 savings

In [None]:
# Optional: Custom optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)

trainer = AdaptiveSparseTrainer(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    config=config,
    optimizer=optimizer,  # Optional: uses Adam by default
)

# Train with warmup (recommended for stability)
results = trainer.train(
    epochs=50,
    warmup_epochs=5  # First 5 epochs: train on 100% samples
)

print("\n" + "="*60)
print("üèÅ TRAINING COMPLETE")
print("="*60)
print(f"üéØ Best Accuracy: {results['best_accuracy']:.2%}")
print(f"‚ö° Energy Savings: {results['energy_savings']:.1%}")
print(f"‚è±Ô∏è  Total Time: {results['training_time_hours']:.1f}h")
print("="*60)

## 5Ô∏è‚É£ Save Model & Submit

Save the trained model for inference

In [None]:
# Save checkpoint
torch.save({
    'model_state_dict': model.state_dict(),
    'results': results,
    'config': config,
}, 'ast_model.pth')

print("üíæ Model saved to ast_model.pth")
print(f"üìà Trained with {results['energy_savings']:.0%} less energy than standard training!")

## 6Ô∏è‚É£ Inference (Competition Submission)

Standard PyTorch inference - no AST needed

In [None]:
# Load model for inference
checkpoint = torch.load('ast_model.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Make predictions
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

predictions = []
with torch.no_grad():
    for images, _ in test_loader:
        images = images.to(device)
        outputs = model(images)
        preds = outputs.argmax(dim=1)
        predictions.extend(preds.cpu().numpy())

# Create submission
import pandas as pd
submission = pd.DataFrame({
    'id': range(len(predictions)),
    'label': predictions
})
submission.to_csv('submission.csv', index=False)
print("üì§ Submission ready: submission.csv")

---

## üéì Tips for Competition Success

### Hyperparameter Tuning
- **Start conservative**: `target_activation_rate=0.40` (60% savings)
- **Monitor activation rate**: Should stabilize near target after warmup
- **If accuracy drops**: Increase `target_activation_rate` to 0.50 or add more warmup epochs

### When to Use AST
‚úÖ **Good for:**
- Large datasets (>50k samples)
- Limited GPU time (Kaggle 30h/week limit)
- Image classification, object detection
- When you need to try multiple models quickly

‚ùå **Skip AST if:**
- Tiny datasets (<5k samples)
- Already at hardware limits (reduce batch size instead)

### Advanced: Architecture Selection
AST works best with:
- **ResNets, EfficientNets**: Stable, predictable
- **ConvNeXt**: Excellent with AST (tested)
- **Vision Transformers**: May need higher `target_activation_rate` (0.50+)

---

## üìö Resources

- üì¶ [PyPI Package](https://pypi.org/project/adaptive-sparse-training/)
- üêô [GitHub Repo](https://github.com/oluwafemidiakhoa/adaptive-sparse-training)
- üìñ [Documentation](https://github.com/oluwafemidiakhoa/adaptive-sparse-training#readme)

**Questions?** Open an issue on GitHub!

---

*This notebook uses Adaptive Sparse Training to reduce energy consumption by ~65% while maintaining accuracy. Good for the planet üåç and your GPU budget üí∞*