Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking for PPO and TRPO #61

Open
miriaford opened this issue Jul 21, 2017 · 5 comments
Open

Benchmarking for PPO and TRPO #61

miriaford opened this issue Jul 21, 2017 · 5 comments

Comments

@miriaford
Copy link

Thanks to the OpenAI team for the latest release!

Are there any benchmark results (like Atari score) on PPO and TRPO? DQN has a report here: https://github.com/openai/baselines-results. It's super useful. Thanks again!

@Twinko56X
Copy link

I did not see any in the repo, but as a general indication PPO has a general benchmark at page 11 in the paper: https://openai-public.s3-us-west-2.amazonaws.com/blog/2017-07/ppo/ppo-arxiv.pdf#page=11

@miriaford
Copy link
Author

@Twinko56X thanks for the link! It's actually on arxiv now: https://arxiv.org/pdf/1707.06347.pdf

I wonder if this repo is the same code used to produce those plots.

@ViktorM
Copy link

ViktorM commented Jul 23, 2017

The DQN baselines results https://github.com/openai/baselines-results looks great, missed them. It would be nice to have at some point similar ipython notebook for the PPO vs TRPO vs DDPG vs IPG for continuous control problems and PPO vs DQN for Atari.

@joschu
Copy link
Contributor

joschu commented Aug 28, 2017

I'll add an ipython notebook with the atari an mujoco benchmarks soon.

pzhokhov pushed a commit that referenced this issue Aug 30, 2018
@doviettung96
Copy link

Hi @joschu ,
Currently, I try to replicate the result of the PPO paper on RoboschoolHumanoidFlagrunHarder-v1. Did you use the PPO algorithm in this Openai baselines? I have tried to modified it to include Adaptive learning rate based on KL divergence. Other hyperparameters are set the same as in the paper except the logstd of the action distribution to be zeros (not LinearAnneal(-0.7, -1.6). I have used the policy and value network as (512, 256, 128) and relu activation. However, I could not raise the mean episode reward to 2000. Is there any suggestion? Thanks.

huiwenn pushed a commit to huiwenn/baselines that referenced this issue Mar 20, 2019
kkonen pushed a commit to kkonen/baselines-1 that referenced this issue Sep 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants