# LLM-Enhanced MARL IDS for IoT - Kaggle Training

**Research Project:** Multi-Agent Reinforcement Learning Intrusion Detection System

**GitHub:** https://github.com/khalil0401/LLM-Enhanced-MARL-Based-IDS-for-IoT

**Session Goal:** Train MAPPO model on IoT-23 dataset

---

## Setup Instructions:
1. **Enable GPU:** Settings (right sidebar) ‚Üí Accelerator ‚Üí GPU P100 or T4
2. **Enable Internet:** Settings ‚Üí Internet ‚Üí ON
3. **Add Dataset:** Add Data ‚Üí Search for "iot23-processed" (your uploaded dataset)
4. **Session Time:** Monitor remaining time (top-right corner)

**Important:** Save checkpoints frequently! Sessions disconnect after 12 hours.

## üì¶ Cell 1: Install Dependencies

In [None]:
%%time
# Install required packages
print("Installing dependencies...")
!pip install -q ray[rllib]==2.8.0
!pip install -q sentence-transformers==2.2.2
!pip install -q openai==1.3.0
!pip install -q pyyaml
!pip install -q wandb  # Optional: for experiment tracking
!pip install -q h5py

print("‚úÖ Dependencies installed!")

## üîß Cell 2: Clone GitHub Repository

In [None]:
%%time
import os

# Clone repository
if not os.path.exists('LLM-Enhanced-MARL-Based-IDS-for-IoT'):
    !git clone https://github.com/khalil0401/LLM-Enhanced-MARL-Based-IDS-for-IoT.git
    print("‚úÖ Repository cloned!")
else:
    print("‚úÖ Repository already exists")

# Change directory
%cd /kaggle/working/LLM-Enhanced-MARL-Based-IDS-for-IoT

# Verify structure
!ls -la

## ‚öôÔ∏è Cell 3: Configure for Kaggle

In [None]:
import yaml
import os

# Kaggle-optimized configuration
kaggle_config = {
    'env_config': {
        'num_agents': 10,
        'observation_dim': 5000,
        'max_episode_steps': 1000,
        'dataset_path': '/kaggle/input/iot23-processed/iot23_processed.h5',  # Update this path!
        'self_play': False
    },
    'training': {
        'lr': 3e-4,
        'gamma': 0.99,
        'lambda': 0.95,
        'clip_param': 0.2,
        'train_batch_size': 2048,      # Reduced for Kaggle
        'sgd_minibatch_size': 64,       # Reduced for Kaggle
        'num_sgd_iter': 10,
        'num_workers': 4,                # Reduced for Kaggle
        'num_gpus': 1,
        'framework': 'torch'
    },
    'experiment': {
        'total_iterations': 500,         # Adjust based on time
        'checkpoint_freq': 10,           # Save every 10 iterations
        'evaluation_interval': 10,
        'checkpoint_dir': '/kaggle/working/checkpoints'
    },
    'reward_weights': {
        'detect': 1.0,
        'fp': -0.5,
        'latency': -0.2,
        'resource': -0.1
    }
}

# Create checkpoint directory
os.makedirs('/kaggle/working/checkpoints', exist_ok=True)

# Save config
with open('config/kaggle_config.yaml', 'w') as f:
    yaml.dump(kaggle_config, f)

print("‚úÖ Configuration saved!")
print("\nConfig summary:")
print(f"  - Agents: {kaggle_config['env_config']['num_agents']}")
print(f"  - Batch size: {kaggle_config['training']['train_batch_size']}")
print(f"  - Workers: {kaggle_config['training']['num_workers']}")
print(f"  - Iterations: {kaggle_config['experiment']['total_iterations']}")

## üìä Cell 4: Check GPU and Resources

In [None]:
import torch
import psutil

# Check GPU
if torch.cuda.is_available():
    print("‚úÖ GPU Available!")
    print(f"   GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"   GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("‚ùå No GPU! Enable GPU in Settings ‚Üí Accelerator")

# Check RAM
ram = psutil.virtual_memory()
print(f"\n‚úÖ RAM: {ram.total / 1e9:.1f} GB")
print(f"   Available: {ram.available / 1e9:.1f} GB")

# Check dataset
import os
dataset_path = '/kaggle/input/iot23-processed/iot23_processed.h5'  # Update this!
if os.path.exists(dataset_path):
    print(f"\n‚úÖ Dataset found: {dataset_path}")
    print(f"   Size: {os.path.getsize(dataset_path) / 1e9:.2f} GB")
else:
    print(f"\n‚ùå Dataset not found! Add dataset in 'Add Data' section")
    print("   Expected path:", dataset_path)

## üöÄ Cell 5: Train MAPPO Model

In [None]:
import sys
sys.path.append('/kaggle/working/LLM-Enhanced-MARL-Based-IDS-for-IoT/src/fog')

from train_mappo import train_mappo
import time

# Start training
print("="*60)
print("Starting MAPPO Training on Kaggle")
print("="*60)

start_time = time.time()

try:
    checkpoint = train_mappo(
        config=kaggle_config,
        experiment_name='kaggle_mappo_iot_ids'
    )
    
    print(f"\n‚úÖ Training complete!")
    print(f"   Total time: {(time.time() - start_time) / 3600:.2f} hours")
    print(f"   Final checkpoint: {checkpoint}")
    
except Exception as e:
    print(f"\n‚ùå Training error: {e}")
    print("   Check logs above for details")

## üìà Cell 6: Monitor Training (Optional - Run in Parallel)

In [None]:
# Run this cell in parallel to monitor GPU/RAM during training
import time
import psutil
import torch

for i in range(60):  # Monitor for 60 iterations
    # RAM
    ram = psutil.virtual_memory()
    print(f"[{i+1:02d}] RAM: {ram.percent:.1f}% | ", end="")
    
    # GPU
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated(0) / 1e9
        reserved = torch.cuda.memory_reserved(0) / 1e9
        print(f"GPU Memory: {allocated:.1f}/{reserved:.1f} GB")
    
    time.sleep(30)  # Check every 30 seconds

## üéØ Cell 7: Evaluate Trained Model

In [None]:
from train_mappo import evaluate_model
import numpy as np

# Evaluate model
print("Evaluating trained model...\n")

evaluate_model(
    checkpoint_path=checkpoint,
    num_episodes=10
)

print("\n‚úÖ Evaluation complete!")

## üíæ Cell 8: Save Results for Download

In [None]:
import json
import shutil
import os

# Create results directory
os.makedirs('/kaggle/working/results', exist_ok=True)

# Copy checkpoints
if os.path.exists('/kaggle/working/checkpoints'):
    print("Copying checkpoints to output...")
    shutil.copytree(
        '/kaggle/working/checkpoints',
        '/kaggle/working/results/checkpoints',
        dirs_exist_ok=True
    )
    print("‚úÖ Checkpoints copied")

# Save training summary
summary = {
    'experiment_name': 'kaggle_mappo_iot_ids',
    'final_checkpoint': checkpoint,
    'config': kaggle_config,
    'training_time_hours': (time.time() - start_time) / 3600,
    'total_iterations': kaggle_config['experiment']['total_iterations']
}

with open('/kaggle/working/results/training_summary.json', 'w') as f:
    json.dump(summary, f, indent=2)

print("\n‚úÖ Results saved to /kaggle/working/results/")
print("   Download from Output tab (top-right)")
print("\nFiles:")
!ls -lh /kaggle/working/results/

## üìä Cell 9: Generate Training Plots (Optional)

In [None]:
import matplotlib.pyplot as plt
import json

# Load training logs (if available)
# Note: This is a placeholder - actual implementation depends on logging format

# Example plot
plt.figure(figsize=(12, 4))

# Subplot 1: Reward
plt.subplot(1, 3, 1)
plt.title('Episode Reward')
plt.xlabel('Iteration')
plt.ylabel('Reward')
# plt.plot(rewards)  # Add actual data

# Subplot 2: Episode Length
plt.subplot(1, 3, 2)
plt.title('Episode Length')
plt.xlabel('Iteration')
plt.ylabel('Steps')
# plt.plot(lengths)  # Add actual data

# Subplot 3: F1 Score
plt.subplot(1, 3, 3)
plt.title('Detection F1 Score')
plt.xlabel('Iteration')
plt.ylabel('F1 Score')
# plt.plot(f1_scores)  # Add actual data

plt.tight_layout()
plt.savefig('/kaggle/working/results/training_curves.png', dpi=150)
print("‚úÖ Training curves saved")
plt.show()

## üîÑ Cell 10: Prepare for Next Session (If Resuming)

In [None]:
print("üìù Instructions for Next Session:")
print("\n1. Download checkpoint:")
print("   - Go to Output tab (top-right)")
print("   - Download 'results' folder")
print("\n2. Upload as Kaggle Dataset:")
print("   - Go to Datasets ‚Üí New Dataset")
print("   - Upload 'checkpoints' folder")
print("   - Name: 'mappo-checkpoint-iter500'")
print("\n3. Resume Training:")
print("   - Create new notebook")
print("   - Add dataset: 'mappo-checkpoint-iter500'")
print("   - Update config:")
print("     resume_from = '/kaggle/input/mappo-checkpoint-iter500/checkpoint_000500'")
print("\n4. Continue from iteration 500 to 1000")

print(f"\n‚úÖ Current checkpoint: {checkpoint}")
print(f"   Iterations completed: {kaggle_config['experiment']['total_iterations']}")

---

## üìã Session Checklist

Before closing this session:

- [ ] Training completed successfully
- [ ] Checkpoints saved to `/kaggle/working/results/`
- [ ] Downloaded results folder from Output tab
- [ ] (If resuming) Uploaded checkpoint as Kaggle dataset
- [ ] Noted final iteration number
- [ ] Saved any important metrics or logs

**Session Time Remaining:** Check top-right corner!

---

## üéì Citation

If you use this code, please cite:

```bibtex
@misc{khalil2026llmmarl,
  author = {Khalil},
  title = {LLM-Enhanced MARL-Based IDS for IoT},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/khalil0401/LLM-Enhanced-MARL-Based-IDS-for-IoT}
}
```