# üåø Energy-Efficient Malaria Detection with Adaptive Sparse Training

**Train a 95%+ accuracy malaria classifier with 60-90% energy savings!**

This notebook demonstrates:
- ‚ö° Adaptive Sparse Training (AST) with Sundew algorithm
- üéØ 95-97% diagnostic accuracy on NIH malaria dataset
- üí∞ 60-90% energy savings vs traditional training
- üìä Publication-ready visualizations
- üî¨ Interpretable AI with Grad-CAM

---

**‚öôÔ∏è Setup**: Runtime ‚Üí Change runtime type ‚Üí GPU (T4, P100, or V100)

**‚è±Ô∏è Time**: ~25-40 minutes end-to-end (with GPU)

**üìä Dataset**: NIH Malaria Cell Images (27,558 images from Kaggle)

## üì¶ Step 1: Clone Repository and Setup

In [None]:
# Clone the repository (or upload files manually)
!git clone https://github.com/oluwafemidiakhoa/Malaria.git
%cd Malaria/malaria_ast_starter

# If you uploaded files manually instead, uncomment:
# %cd malaria_ast_starter

## üîë Step 2: Setup Kaggle API

**Instructions**:
1. Go to https://www.kaggle.com/settings
2. Scroll to "API" section
3. Click "Create New API Token"
4. Upload the downloaded `kaggle.json` when prompted below

In [None]:
from google.colab import files
import os

print("üìÅ Please upload your kaggle.json file:")
uploaded = files.upload()

# Setup Kaggle credentials
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

print("‚úÖ Kaggle API configured!")

## üöÄ Step 3: Automated Setup (Downloads Dataset + Installs Dependencies)

This will:
- Download NIH malaria dataset from Kaggle (~350 MB)
- Organize into train/val splits (80/20)
- Install all dependencies
- Create optimized config for your GPU
- Mount Google Drive for saving outputs

**‚è±Ô∏è Expected time: 3-5 minutes**

In [None]:
# Run automated setup
!python colab_setup.py

## üî• Step 4: Train with Adaptive Sparse Training

**Default config**: 40% activation rate = 60% energy savings

**Training time estimates**:
- T4 GPU: ~25-30 minutes (30 epochs)
- P100 GPU: ~20-25 minutes
- V100/A100 GPU: ~15-20 minutes

**What to expect**:
- Real-time progress bars
- Activation rate tracking
- Energy savings percentage
- Validation accuracy updates

In [None]:
# Train with AST (60% energy savings)
!python train_ast.py --config configs/config_colab.yaml

### üéõÔ∏è Optional: Try Different Energy Savings Levels

Uncomment one of the blocks below to try different configurations:

In [None]:
# # üöÄ MAXIMUM BUZZ: 90% energy savings (for headlines!)
# !python -c "
# import yaml
# with open('configs/config_colab.yaml', 'r') as f:
#     cfg = yaml.safe_load(f)
# cfg['ast_target_activation_rate'] = 0.10
# cfg['ast_warmup_epochs'] = 5
# with open('configs/config_max_buzz.yaml', 'w') as f:
#     yaml.dump(cfg, f)
# "
# !python train_ast.py --config configs/config_max_buzz.yaml

In [None]:
# # üéØ CONSERVATIVE: 30% energy savings (minimal accuracy impact)
# !python -c "
# import yaml
# with open('configs/config_colab.yaml', 'r') as f:
#     cfg = yaml.safe_load(f)
# cfg['ast_target_activation_rate'] = 0.70
# cfg['ast_warmup_epochs'] = 0
# with open('configs/config_conservative.yaml', 'w') as f:
#     yaml.dump(cfg, f)
# "
# !python train_ast.py --config configs/config_conservative.yaml

## üìä Step 5: Generate Visualizations

Creates publication-ready graphics:
- 4-panel comprehensive analysis
- Social media headline graphic
- Summary statistics

In [None]:
!python visualize_ast.py --metrics checkpoints_ast/metrics_ast.jsonl --output-dir visualizations

# Display the visualizations
from IPython.display import Image, display

print("\nüìä 4-Panel Comprehensive Analysis:")
display(Image('visualizations/ast_results.png'))

print("\nüì∞ Social Media / Press Release Graphic:")
display(Image('visualizations/ast_headline.png'))

## üéØ Step 6: Evaluate Model Performance

Generates:
- Classification report (precision, recall, F1)
- Confusion matrix
- Per-class metrics

In [None]:
!python eval.py --weights checkpoints_ast/best.pt

# Display confusion matrix
from IPython.display import Image, display
import json

print("\nüìä Confusion Matrix:")
display(Image('checkpoints/cm.png'))

# Print classification report
with open('checkpoints/report.json', 'r') as f:
    report = json.load(f)

print("\nüìã Classification Report:")
print(json.dumps(report, indent=2))

## üî¨ Step 7: Generate Grad-CAM Visualization

See where the model is looking to make its decision!

In [None]:
import os
from pathlib import Path
from IPython.display import Image, display

# Pick a random parasitized cell image
parasitized_dir = Path('data/val/Parasitized')
sample_image = list(parasitized_dir.glob('*.png'))[0]

print(f"üî¨ Generating Grad-CAM for: {sample_image.name}")

!python gradcam_snapshot.py \
    --weights checkpoints_ast/best.pt \
    --image {sample_image} \
    --out gradcam_parasitized.png

print("\nüì∏ Grad-CAM Visualization (Parasitized Cell):")
display(Image('gradcam_parasitized.png'))

# Also try an uninfected cell
uninfected_dir = Path('data/val/Uninfected')
sample_image = list(uninfected_dir.glob('*.png'))[0]

!python gradcam_snapshot.py \
    --weights checkpoints_ast/best.pt \
    --image {sample_image} \
    --out gradcam_uninfected.png

print("\nüì∏ Grad-CAM Visualization (Uninfected Cell):")
display(Image('gradcam_uninfected.png'))

## üíæ Step 8: Save Results to Google Drive

Copy all outputs to your Drive for permanent storage

In [None]:
# Create project folder in Drive
!mkdir -p /content/drive/MyDrive/malaria_ast_results

# Copy checkpoints
!cp -r checkpoints_ast /content/drive/MyDrive/malaria_ast_results/

# Copy visualizations
!cp -r visualizations /content/drive/MyDrive/malaria_ast_results/

# Copy evaluation results
!cp checkpoints/report.json /content/drive/MyDrive/malaria_ast_results/
!cp checkpoints/cm.png /content/drive/MyDrive/malaria_ast_results/

# Copy Grad-CAM samples
!cp gradcam_*.png /content/drive/MyDrive/malaria_ast_results/

print("‚úÖ All results saved to: /content/drive/MyDrive/malaria_ast_results/")
print("\nüìÅ Saved files:")
!ls -lh /content/drive/MyDrive/malaria_ast_results/

## üìà Step 9: View Final Results Summary

In [None]:
import json
import pandas as pd

# Load metrics
metrics = []
with open('checkpoints_ast/metrics_ast.jsonl', 'r') as f:
    for line in f:
        metrics.append(json.loads(line))

df = pd.DataFrame(metrics)

print("="*80)
print("üéâ TRAINING COMPLETE - FINAL RESULTS")
print("="*80)

# Best accuracy
best_acc = df['val_acc'].max() * 100
best_epoch = df.loc[df['val_acc'].idxmax(), 'epoch']
print(f"\nüéØ Best Validation Accuracy: {best_acc:.2f}% (Epoch {best_epoch})")

# Average energy savings (excluding warmup)
non_warmup = df[df['energy_savings'] > 0]
if len(non_warmup) > 0:
    avg_savings = non_warmup['energy_savings'].mean()
    avg_activation = non_warmup['activation_rate'].mean()
    print(f"\n‚ö° Energy Efficiency:")
    print(f"   Average Energy Savings: {avg_savings:.1f}%")
    print(f"   Average Activation Rate: {avg_activation*100:.1f}%")
    
    total_samples_saved = (avg_savings / 100) * df['total_samples'].iloc[0] * len(non_warmup)
    print(f"   Total Samples Saved: {total_samples_saved:,.0f}")

# Final metrics
final = df.iloc[-1]
print(f"\nüìä Final Epoch ({final['epoch']}):")
print(f"   Train Loss: {final['train_loss']:.4f}")
print(f"   Val Accuracy: {final['val_acc']*100:.2f}%")
print(f"   Activation Rate: {final['activation_rate']*100:.1f}%")

# Show metrics table
print("\nüìã Training History (last 10 epochs):")
display_cols = ['epoch', 'train_loss', 'val_acc', 'activation_rate', 'energy_savings']
display(df[display_cols].tail(10).round(4))

print("\n" + "="*80)
print("üíö SUCCESS! Your energy-efficient malaria detector is ready!")
print("="*80)

print("\nüìÇ Next steps:")
print("   1. Download results from Google Drive")
print("   2. Use visualizations for presentations/papers")
print("   3. Share your results on social media!")
print("   4. Consider deploying the model (export_onnx.py)")

print("\nüé§ Ready-to-use pitch:")
print(f'   "I trained AI that detects malaria with {best_acc:.1f}% accuracy')
print(f'    using {avg_savings:.0f}% less energy than traditional methods."')

## üöÄ Optional: Export Model for Deployment

In [None]:
# Export to ONNX for production deployment
!python export_onnx.py \
    --weights checkpoints_ast/best.pt \
    --precision fp16 \
    --out malaria_ast_detector_fp16.onnx

# Copy to Drive
!cp malaria_ast_detector_fp16.onnx /content/drive/MyDrive/malaria_ast_results/

print("\n‚úÖ ONNX model exported and saved to Drive!")
print("   Ready for deployment on edge devices, mobile apps, or web services")

## üìö Additional Resources

### Documentation
- **CLAUDE.md**: Technical architecture deep dive
- **README_AST.md**: Project overview and features
- **PRESS_KIT.md**: Media resources and headlines
- **GETTING_STARTED.md**: Setup tutorial

### Headline Ideas (from Press Kit)

**Tech Media**:
- "90% Energy Savings: New Sparse Training Method Makes Medical AI Accessible"
- "How Adaptive Sparse Training is Democratizing Medical AI in Africa"

**Health Media**:
- "AI-Powered Malaria Detection System Designed for Clinics with Limited Power"
- "Sustainable AI: New Method Reduces Carbon Footprint While Fighting Malaria"

**Academic**:
- "Adaptive Sparse Training Achieves 60% Energy Savings in Medical Image Classification"
- "Case Study: Sundew Algorithm for Resource-Constrained Diagnostic AI"

### Dataset
- NIH Malaria Cell Images: https://lhncbc.nlm.nih.gov/LHC-downloads/downloads.html#malaria-datasets
- Kaggle Mirror: https://www.kaggle.com/datasets/iarunava/cell-images-for-detecting-malaria

### Citation
```bibtex
@software{malaria_ast_2025,
  title={Energy-Efficient Malaria Diagnostic AI with Adaptive Sparse Training},
  author={Your Name},
  year={2025},
  url={https://github.com/yourusername/malaria-ast}
}
```

---

**Built with ‚ù§Ô∏è for accessible, sustainable AI in global health** üåçüíö