PPO2 on Swimmer-v2. Avg reward plot matches the one in the paper but video is not meaningful. #1013

hiteshisharma · 2019-10-02T18:49:40Z

I ran PPO2 on Mujoco Swimmer-v2 with the hyper-parameters as mentioned in the paper for 1 million timesteps. I was able to generate the plot for avg rewards similar to the one in the paper but when I ran the trained model to visualize, the agent did not seem to working.

Has anyone faced the same issue?

christopherhesse · 2019-10-25T22:20:42Z

Assuming you're not seeing the correct average reward when visualizing the agent, are you failing to load the parameters or using different observation normalization?

hiteshisharma · 2019-10-25T22:26:14Z

It looks like that in order to have a working agent in the visualization, the swimmer (v2) environment should reach an avg reward of ~300 but it is stuck around 100-110. So, while the avg reward is same while visualization, it does not seem to be moving.

Please let me know if I am not understanding it fully.

christopherhesse · 2019-10-25T22:31:07Z

Oh, so you're saying the learning curve is consistent with the video, but that the learning curve isn't very good because it should get to a reward of 300, but only gets to like 100, right?

christopherhesse · 2019-10-25T22:31:53Z

It looks like baselines expects to get about 100 for swimmer: http://htmlpreview.github.io/?https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm

Where did you get the 300 value from?

hiteshisharma · 2019-10-25T22:38:23Z

This paper : [https://arxiv.org/pdf/1906.08649.pdf]

What I meant was that we need more than 100-120 to see the swimmer working in the video.

christopherhesse · 2019-11-01T22:24:18Z

Looking at that paper, it seems to say that PPO gets a score of 155 after 200k timesteps, right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO2 on Swimmer-v2. Avg reward plot matches the one in the paper but video is not meaningful. #1013

PPO2 on Swimmer-v2. Avg reward plot matches the one in the paper but video is not meaningful. #1013

hiteshisharma commented Oct 2, 2019 •

edited

Loading

christopherhesse commented Oct 25, 2019

hiteshisharma commented Oct 25, 2019

christopherhesse commented Oct 25, 2019

christopherhesse commented Oct 25, 2019

hiteshisharma commented Oct 25, 2019

christopherhesse commented Nov 1, 2019

PPO2 on Swimmer-v2. Avg reward plot matches the one in the paper but video is not meaningful. #1013

PPO2 on Swimmer-v2. Avg reward plot matches the one in the paper but video is not meaningful. #1013

Comments

hiteshisharma commented Oct 2, 2019 • edited Loading

christopherhesse commented Oct 25, 2019

hiteshisharma commented Oct 25, 2019

christopherhesse commented Oct 25, 2019

christopherhesse commented Oct 25, 2019

hiteshisharma commented Oct 25, 2019

christopherhesse commented Nov 1, 2019

hiteshisharma commented Oct 2, 2019 •

edited

Loading