PandaPush-v2 does not learn with SB3 #21

shukla-yash · 2022-03-03T16:02:40Z

Hi,

I am trying to recreate your results from the paper 'panda-gym: Open-source goal-conditioned environments for robotic learning', and the code given in train_push.py does not seem to work with the default parameters.
Can you point me to the RL code you used to get those results? Also, are the learning curves in the paper from Sparse reward setting or Dense?
Thanks!

qgallouedec · 2022-03-03T16:41:10Z

Hi,

What do you mean by "does not work" ? Is an error raised during code execution ? Or do you mean that the results that you obtain does not match with the curve on the paper ?

The code I used for the paper is provided in this openai/baselines fork. It will allow you to reproduce strictly the results of the paper.

Nevertheless, since openai has stopped maintaining its repo, I strongly advise you to use RL code maintained like stable-baselines3, even if you will probably not be able to reproduce exactly the results of the paper.

If you still want to use OpenAI/baselines to reproduce strictly my results, please note that I used the v0 version of panda-gym (and not the v2 version I released in the meantime). The changes between these two versions won't change the curves much I think, but I can't guarantee it.

shukla-yash · 2022-03-03T16:44:52Z

Thanks for your reply. By does not work, I meant the learning curves did not match (No error in execution). I trained for almost 3x10^6 timesteps, but the success rate for PandaPush-v2 was stuck at 0.15 (The learning curves in the paper converge to a success rate ~ 1).

Thanks for your suggestions, I will try them in the meantime. Did you use sparse reward for the curves?

qgallouedec · 2022-03-03T17:06:06Z

Did you use sparse reward for the curves?

I did.

You can also check the baselines results on the rl-baselines3-zoo repo. For Push, convergence occurs well before 1e6 timesteps.

shukla-yash · 2022-03-16T17:31:35Z

Can you please post a snippet for PandaPickAndPlace-v2 that learns using DDPG from SB3, to reproduce the results in the paper? I realize it might not be exactly equivalent with the results from the paper, but anything that learns should work for me

I've tried this, but it does not work:

`env = gym.make("PandaPickAndPlace-v2")
env = make_vec_env(lambda: env, n_envs=4)

model = DDPG(policy="MultiInputPolicy", env=env, replay_buffer_class=HerReplayBuffer, verbose=1, batch_size= 2048, buffer_size=1000000)

model.learn(total_timesteps=4000000)`

Thanks!

qgallouedec · 2022-05-23T20:22:17Z

You can use rl-baselines3-zoo to train PandaPush-v2. You just need to paste these hyperparameters in hyperparams/ddpg.yml:

PandaPush-v2:
  env_wrapper: sb3_contrib.common.wrappers.TimeFeatureWrapper
  n_timesteps: !!float 1e6
  policy: 'MultiInputPolicy'
  buffer_size: 1000000
  batch_size: 2048
  gamma: 0.95
  learning_rate: !!float 1e-3
  noise_type: 'normal'
  noise_std: 0.1
  replay_buffer_class: HerReplayBuffer
  replay_buffer_kwargs: "dict(
    online_sampling=True,
    goal_selection_strategy='future',
    n_sampled_goal=4,
  )"
  policy_kwargs: "dict(net_arch=[512, 512, 512], n_critics=2)"

then run

python train.py --algo ddpg --env PandaPush-v2

Here is the result you will get:

I should also converge with PandaPickandPlace-v2. Feel free to open a PR in the zoo like this one to share you results.

shukla-yash added the question Further information is requested label Mar 3, 2022

shukla-yash closed this as completed Mar 4, 2022

shukla-yash reopened this Mar 16, 2022

qgallouedec mentioned this issue May 10, 2022

Train files and parameter issues #27

Closed

qgallouedec closed this as completed May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PandaPush-v2 does not learn with SB3 #21

PandaPush-v2 does not learn with SB3 #21

shukla-yash commented Mar 3, 2022

qgallouedec commented Mar 3, 2022 •

edited

Loading

shukla-yash commented Mar 3, 2022

qgallouedec commented Mar 3, 2022

shukla-yash commented Mar 16, 2022 •

edited

Loading

qgallouedec commented May 23, 2022 •

edited

Loading

PandaPush-v2 does not learn with SB3 #21

PandaPush-v2 does not learn with SB3 #21

Comments

shukla-yash commented Mar 3, 2022

qgallouedec commented Mar 3, 2022 • edited Loading

shukla-yash commented Mar 3, 2022

qgallouedec commented Mar 3, 2022

shukla-yash commented Mar 16, 2022 • edited Loading

qgallouedec commented May 23, 2022 • edited Loading

qgallouedec commented Mar 3, 2022 •

edited

Loading

shukla-yash commented Mar 16, 2022 •

edited

Loading

qgallouedec commented May 23, 2022 •

edited

Loading