diff --git a/tests/rl/log/Ant_1.png b/tests/rl/log/Ant_1.png index d18195f88..e9bb76687 100644 Binary files a/tests/rl/log/Ant_1.png and b/tests/rl/log/Ant_1.png differ diff --git a/tests/rl/log/Ant_11.png b/tests/rl/log/Ant_11.png index 2f6dc3235..d19411e77 100644 Binary files a/tests/rl/log/Ant_11.png and b/tests/rl/log/Ant_11.png differ diff --git a/tests/rl/log/HalfCheetah_1.png b/tests/rl/log/HalfCheetah_1.png index faba7a07d..a20382a69 100644 Binary files a/tests/rl/log/HalfCheetah_1.png and b/tests/rl/log/HalfCheetah_1.png differ diff --git a/tests/rl/log/HalfCheetah_11.png b/tests/rl/log/HalfCheetah_11.png index 19844ceb2..4d2e031e3 100644 Binary files a/tests/rl/log/HalfCheetah_11.png and b/tests/rl/log/HalfCheetah_11.png differ diff --git a/tests/rl/log/Hopper_1.png b/tests/rl/log/Hopper_1.png index c54a9ffe4..d7944ac90 100644 Binary files a/tests/rl/log/Hopper_1.png and b/tests/rl/log/Hopper_1.png differ diff --git a/tests/rl/log/Hopper_11.png b/tests/rl/log/Hopper_11.png index 8197dcac9..693ed96d4 100644 Binary files a/tests/rl/log/Hopper_11.png and b/tests/rl/log/Hopper_11.png differ diff --git a/tests/rl/log/Swimmer_1.png b/tests/rl/log/Swimmer_1.png index 5408aafde..895e5599e 100644 Binary files a/tests/rl/log/Swimmer_1.png and b/tests/rl/log/Swimmer_1.png differ diff --git a/tests/rl/log/Swimmer_11.png b/tests/rl/log/Swimmer_11.png index 483a54806..e259bb707 100644 Binary files a/tests/rl/log/Swimmer_11.png and b/tests/rl/log/Swimmer_11.png differ diff --git a/tests/rl/log/Walker2d_1.png b/tests/rl/log/Walker2d_1.png index 20560bc77..caa9f1336 100644 Binary files a/tests/rl/log/Walker2d_1.png and b/tests/rl/log/Walker2d_1.png differ diff --git a/tests/rl/log/Walker2d_11.png b/tests/rl/log/Walker2d_11.png index 822aa9394..d9e841101 100644 Binary files a/tests/rl/log/Walker2d_11.png and b/tests/rl/log/Walker2d_11.png differ diff --git a/tests/rl/performance.md b/tests/rl/performance.md index 9b7afdd8c..75442035c 100644 --- a/tests/rl/performance.md +++ b/tests/rl/performance.md @@ -1,13 +1,14 @@ # Performance for Gym Task Suite -We benchmarked the MARO RL Toolkit implementation in Gym task suite. -Some are compared to the benchmarks in [OpenAI Spinning Up](https://spinningup.openai.com/en/latest/spinningup/bench.html#). -Limited by the environment version difference, -there may be some gaps between the performance here and that in Spinning Up benchmarks. +We benchmarked the MARO RL Toolkit implementation in Gym task suite. Some are compared to the benchmarks in +[OpenAI Spinning Up](https://spinningup.openai.com/en/latest/spinningup/bench.html#). We've tried to align the +hyper-parameters for these benchmarks , but limited by the environment version difference, there may be some gaps +between the performance here and that in Spinning Up benchmarks. Generally speaking, the performance is comparable. ## Experimental Setting -The hyper-parameters are set to align with those used in [Spinning Up](https://spinningup.openai.com/en/latest/spinningup/bench.html#experiment-details): +The hyper-parameters are set to align with those used in +[Spinning Up](https://spinningup.openai.com/en/latest/spinningup/bench.html#experiment-details): **Batch Size**: @@ -24,13 +25,25 @@ The hyper-parameters are set to align with those used in [Spinning Up](https://s - For on-policy algorithms: measured as the average trajectory return across the batch collected at each epoch; - For off-policy algorithms: measured once every 10,000 steps by running the deterministic policy (or, in the case of SAC, the mean policy) without action noise for ten trajectories, and reporting the average return over those test trajectories; -**Total timesteps**: set to 3M for all task suites and algorithms. +**Total timesteps**: set to 4M for all task suites and algorithms. -Other parameters are set to the values in *tests/rl/tasks/*. +More details about the parameters can be found in *tests/rl/tasks/*. -## Performance Comparison +## Performance -Five environments from the MuJoCo Gym task suite are reported in Spinning Up, they are: HalfCheetah, Hopper, Walker2d, Swimmer, and Ant. +Five environments from the MuJoCo Gym task suite are reported in Spinning Up, they are: HalfCheetah, Hopper, Walker2d, +Swimmer, and Ant. The commit id of the code used to conduct the experiments for MARO RL benchmarks is ee25ce1e97. +The commands used are: + +```sh +# Step 1: Set up the MuJoCo Environment in file tests/rl/gym_wrapper/common.py + +# Step 2: Use the command below to run experiment with ALGORITHM (ddpg, ppo, sac) and random seed SEED. +python tests/rl/run.py tests/rl/tasks/ALGORITHM/config.yml --seed SEED + +# Step 3: Plot performance curves by environment with specific smooth window size WINDOWSIZE. +python tests/rl/plot.py --smooth WINDOWSIZE +``` | **Env** | **Spinning Up** | **MARO RL w/o Smooth** | **MARO RL w/ Smooth** | |:---------------:|:---------------:|:----------------------:|:---------------------:|