Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDPG has no function of plotting? #13

Closed
Alex-zhai opened this issue Jun 4, 2016 · 11 comments
Closed

DDPG has no function of plotting? #13

Alex-zhai opened this issue Jun 4, 2016 · 11 comments

Comments

@Alex-zhai
Copy link

When i use ddpg algorithm, I set plot=True. But the evaluation run after each iteration did't appear.
So what's the problem?

@dementrock
Copy link
Member

Seems like I forgot to implement it. Fixed in 4362ad2. Also added a sample script:

https://github.com/rllab/rllab/blob/master/examples/ddpg_cartpole_stub.py

@Alex-zhai
Copy link
Author

So are the hyper-parameters of DDPG algorithm fixed between all tasks? I remember the hidden size of networks is {300,400}. However, you set {32,32}.

@dementrock
Copy link
Member

I set it to a small network so that it runs faster, and gives sufficiently good results on cartpole balancing. If you'd like to reproduce the results in the paper, you should use larger networks and keep the exact same settings.

@dementrock
Copy link
Member

Also I'd recommend sticking with smaller networks at least when you are e.g. tweaking algorithms. The larger networks run much, much slower.

@Alex-zhai
Copy link
Author

I used larger networks {300,400} and kept the exact same settings as the original paper in order to solve Half-Cheetah tasks. But the agent had a poor performance. This is my setting:
n_epochs=200, epoch_length=1000, batch_size=32, min_pool_size=10000, replay_pool_size=1000000, eval_samples=10000, hidden_sizes=(400, 300)

@Alex-zhai
Copy link
Author

So could you offer your settings of DDPG algorithm on Half-Cheetah tasks? Thank you!!!

@dementrock
Copy link
Member

Almost the same configuration as in the sample script, except:

  • Scale reward by 0.1 instead of 0.01
  • max_path_length 500
  • min_pool_size 10000
  • epoch_length 10000
  • n_epochs 2500
  • hidden sizes (400, 300) for both the policy and the Q function

Also I used n_parallel=4 for my experiments. Although this parallelization is only used when sampling trajectories for evaluation.

The whole experiment runs really slow since it's actually using 25x samples than in the original DDPG paper, to match the settings of other algorithms evaluated in the benchmark paper. You should be able to get pretty good results by just using 100 epochs. You can also get more intermediate progress by setting n_epochs to 1000 and then epoch_length to 1000 (the total number of samples = n_epochs * epoch_length).

@dementrock
Copy link
Member

The reward scaling is really important. Make sure you have that.

@Alex-zhai
Copy link
Author

Perfectly, Thank you for your sharing!!!

@dementrock
Copy link
Member

No problem. Let me know if you have any further issues getting it to work.

@Alex-zhai
Copy link
Author

Ok, no problem. Thank you!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants