This codebase accompanies paper "Difference Advantage Estimation for Multi-Agent Policy Gradients"(link). The implementation is based on MAPPO codebase.
Please follow the instructions in MAPPO codebase.
Here we use train_mpe.sh as an example:
cd onpolicy/scripts
chmod +x ./train_mpe.sh
./train_mpe.sh