AEAP is a multi-actor deterministic policy gradient algorithm that addresses the tension between exploration diversity and computational efficiency in ensemble-based reinforcement learning. It builds on TD3 and introduces two key components:
- Dual-Randomized Actor Selection — Randomly selects different actors for environment interaction and policy updates, maintaining behavioral diversity without explicit regularization.
- Adaptive Dual-Criterion Pruning — Progressively removes underperforming or redundant actors based on critic-estimated Q-values and pairwise action-space similarity.
- Python 3.8+
- PyTorch (with CUDA support recommended)
- NumPy
- Gymnasium
- gymnasium-robotics (for Fetch tasks)
- MuJoCo
pip install torch numpy gymnasium gymnasium-robotics mujoco# Train AEAP on HalfCheetah
python main.py --policy AEAP --env HalfCheetah-v5 --seed 1
# Train TD3 baseline
python main.py --policy TD3 --env HalfCheetah-v5 --seed 1