Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PandaPush-v2 does not learn with SB3 #21

Closed
shukla-yash opened this issue Mar 3, 2022 · 5 comments
Closed

PandaPush-v2 does not learn with SB3 #21

shukla-yash opened this issue Mar 3, 2022 · 5 comments
Labels
question Further information is requested

Comments

@shukla-yash
Copy link

Hi,

I am trying to recreate your results from the paper 'panda-gym: Open-source goal-conditioned environments for robotic learning', and the code given in train_push.py does not seem to work with the default parameters.
Can you point me to the RL code you used to get those results? Also, are the learning curves in the paper from Sparse reward setting or Dense?
Thanks!

@shukla-yash shukla-yash added the question Further information is requested label Mar 3, 2022
@qgallouedec
Copy link
Owner

qgallouedec commented Mar 3, 2022

Hi,

What do you mean by "does not work" ? Is an error raised during code execution ? Or do you mean that the results that you obtain does not match with the curve on the paper ?

The code I used for the paper is provided in this openai/baselines fork. It will allow you to reproduce strictly the results of the paper.

Nevertheless, since openai has stopped maintaining its repo, I strongly advise you to use RL code maintained like stable-baselines3, even if you will probably not be able to reproduce exactly the results of the paper.

If you still want to use OpenAI/baselines to reproduce strictly my results, please note that I used the v0 version of panda-gym (and not the v2 version I released in the meantime). The changes between these two versions won't change the curves much I think, but I can't guarantee it.

@shukla-yash
Copy link
Author

Thanks for your reply. By does not work, I meant the learning curves did not match (No error in execution). I trained for almost 3x10^6 timesteps, but the success rate for PandaPush-v2 was stuck at 0.15 (The learning curves in the paper converge to a success rate ~ 1).

Thanks for your suggestions, I will try them in the meantime. Did you use sparse reward for the curves?

@qgallouedec
Copy link
Owner

Did you use sparse reward for the curves?

I did.

You can also check the baselines results on the rl-baselines3-zoo repo. For Push, convergence occurs well before 1e6 timesteps.

@shukla-yash
Copy link
Author

shukla-yash commented Mar 16, 2022

Can you please post a snippet for PandaPickAndPlace-v2 that learns using DDPG from SB3, to reproduce the results in the paper? I realize it might not be exactly equivalent with the results from the paper, but anything that learns should work for me

I've tried this, but it does not work:

`env = gym.make("PandaPickAndPlace-v2")
env = make_vec_env(lambda: env, n_envs=4)

model = DDPG(policy="MultiInputPolicy", env=env, replay_buffer_class=HerReplayBuffer, verbose=1, batch_size= 2048, buffer_size=1000000)

model.learn(total_timesteps=4000000)`

Thanks!

@qgallouedec
Copy link
Owner

qgallouedec commented May 23, 2022

You can use rl-baselines3-zoo to train PandaPush-v2. You just need to paste these hyperparameters in hyperparams/ddpg.yml:

PandaPush-v2:
  env_wrapper: sb3_contrib.common.wrappers.TimeFeatureWrapper
  n_timesteps: !!float 1e6
  policy: 'MultiInputPolicy'
  buffer_size: 1000000
  batch_size: 2048
  gamma: 0.95
  learning_rate: !!float 1e-3
  noise_type: 'normal'
  noise_std: 0.1
  replay_buffer_class: HerReplayBuffer
  replay_buffer_kwargs: "dict(
    online_sampling=True,
    goal_selection_strategy='future',
    n_sampled_goal=4,
  )"
  policy_kwargs: "dict(net_arch=[512, 512, 512], n_critics=2)"

then run

python train.py --algo ddpg --env PandaPush-v2

Here is the result you will get:

Screenshot 2022-05-23 at 22 37 41

I should also converge with PandaPickandPlace-v2. Feel free to open a PR in the zoo like this one to share you results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants