-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO2 on Swimmer-v2. Avg reward plot matches the one in the paper but video is not meaningful. #1013
Comments
Assuming you're not seeing the correct average reward when visualizing the agent, are you failing to load the parameters or using different observation normalization? |
It looks like that in order to have a working agent in the visualization, the swimmer (v2) environment should reach an avg reward of ~300 but it is stuck around 100-110. So, while the avg reward is same while visualization, it does not seem to be moving. Please let me know if I am not understanding it fully. |
Oh, so you're saying the learning curve is consistent with the video, but that the learning curve isn't very good because it should get to a reward of 300, but only gets to like 100, right? |
It looks like baselines expects to get about 100 for swimmer: http://htmlpreview.github.io/?https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm Where did you get the 300 value from? |
This paper : [https://arxiv.org/pdf/1906.08649.pdf] What I meant was that we need more than 100-120 to see the swimmer working in the video. |
Looking at that paper, it seems to say that PPO gets a score of 155 after 200k timesteps, right? |
I ran PPO2 on Mujoco Swimmer-v2 with the hyper-parameters as mentioned in the paper for 1 million timesteps. I was able to generate the plot for avg rewards similar to the one in the paper but when I ran the trained model to visualize, the agent did not seem to working.
Has anyone faced the same issue?
The text was updated successfully, but these errors were encountered: