# üéØ PIDRL: Competitive Pursuit-Evasion with Deep RL

Train a Deep RL agent to track agile targets in 3D egocentric environment.

**Features:**
- 3D pursuit-evasion with depth perception
- Focus-based reward system
- Competitive MARL ready (agent vs target)
- GPU accelerated training

**‚ö†Ô∏è IMPORTANT:** Enable GPU in Kaggle:
- Settings ‚Üí Accelerator ‚Üí GPU T4 x2

## 1. Setup Environment

In [None]:
# Clone repository
!git clone https://github.com/nurullahayv/PIDRL.git
%cd PIDRL

In [None]:
# Run setup script (installs dependencies and checks GPU)
!python setup_kaggle.py

## 2. Quick Test (Optional)

In [None]:
# Quick test run (~1 minute)
!python quick_train.py --test

## 3. Full Training

In [None]:
# Full training (500k steps, ~2 hours with GPU)
!python quick_train.py --full

## 4. Monitor Training (Run in Parallel)

In [None]:
# Load TensorBoard
%load_ext tensorboard
%tensorboard --logdir logs/sac_full

## 5. Test Trained Model

In [None]:
# Test the best model (no rendering in Kaggle)
!python test_trained_model.py --model models/sac_full/best_model/best_model.zip --episodes 10 --no-render

## 6. Download Trained Model

In [None]:
# Zip model for download
!zip -r trained_model.zip models/sac_full/best_model

# You can download trained_model.zip from the Output section

## 7. Analyze Results

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from stable_baselines3 import SAC
import yaml
from environments import make_env
import numpy as np

# Load config
with open('configs/config.yaml', 'r') as f:
    config = yaml.safe_load(f)

# Load model
model = SAC.load("models/sac_full/best_model/best_model.zip")

# Test episodes
env = make_env(config, use_3d=True)

episode_rewards = []
focus_times = []

for _ in range(50):
    obs, info = env.reset()
    episode_reward = 0
    total_focus_time = 0
    steps = 0
    done = False
    
    while not done:
        action, _ = model.predict(obs, deterministic=True)
        obs, reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        
        episode_reward += reward
        steps += 1
        if info.get('in_focus', False):
            total_focus_time += 1
    
    episode_rewards.append(episode_reward)
    focus_times.append(total_focus_time / steps * 100)

# Plot results
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

ax1.hist(episode_rewards, bins=20, edgecolor='black')
ax1.set_title('Episode Rewards Distribution')
ax1.set_xlabel('Total Reward')
ax1.set_ylabel('Frequency')
ax1.axvline(np.mean(episode_rewards), color='red', linestyle='--', label=f'Mean: {np.mean(episode_rewards):.2f}')
ax1.legend()

ax2.hist(focus_times, bins=20, edgecolor='black')
ax2.set_title('Time in Focus Distribution')
ax2.set_xlabel('Focus Time (%)')
ax2.set_ylabel('Frequency')
ax2.axvline(np.mean(focus_times), color='red', linestyle='--', label=f'Mean: {np.mean(focus_times):.1f}%')
ax2.legend()

plt.tight_layout()
plt.savefig('training_results.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\nResults (50 episodes):")
print(f"  Average Reward: {np.mean(episode_rewards):.2f} ¬± {np.std(episode_rewards):.2f}")
print(f"  Average Focus Time: {np.mean(focus_times):.1f}% ¬± {np.std(focus_times):.1f}%")

env.close()

## üéâ Training Complete!

Your trained model is saved in `models/sac_full/best_model/`

Download `trained_model.zip` from the Output section to use locally.