Fixes issues with PushT diffusion #41

alexander-soare · 2024-03-21T11:30:43Z

This PR makes a number of changes to enable training of a SOTA diffusion policy model for the PushT environment.

Tweaks to the observation encoder including using SpatialSoftmax, relu activation, and switching the order of concatenation of observation features (to match the original implementation).
Enable random crop during training.
Image normalization to match the original implementation.
Track the EMA model in the torch.nn.Module parameters and use it for rollout.
Don't draw the action marker on image observations during rollout.

About the trained model weights

The training losses for a model trained on lerobot and an equivalent model trained on the original repository look about the same:

I then ported their weights over to lerobot and ran 500 eval experiments for each model (I chose the best weights from each, so around 200k training steps)

For theirs we got:

{
  "avg_max_reward": 0.9191964075746946,
  "pc_success": 60.199999999999996
}

For ours we got:

{
  "avg_max_reward": 0.9064452842990868,
  "pc_success": 42.4,
}

The "pc_success" measures the proportion (as a %) of rollouts that results in a >= 95% overlap between the T and the target being reached.

The "avg_max_reward" metric is the one they use in the paper, for which they report 0.91/0.84 (picking the best checkpoint / picking the average of the last 10 checkpoints). It measures max(clip(overlap / success_threshold, 0, 1)) such that if there is success, the reward is 1.

Because of the relatively small gap in "avg_max_reward" (0.1) but the higher gap in "pc_success" (~0.2) I speculate that the issue has to do with fine grained control when the T is in near 95% overlap. For example, consider this rollout where our policy quickly achieves a near optimal placement but isn't able to manage closing the final gap:

Contrast this with the model trained on the official DP repo where the initial approach is arguably worse, but it's able to apply finishing touches much faster:

Also note that eval on their repo (with their same model weights) is giving higher "avg_max_reward": 0.97 ~ 0.98, although the rollouts look qualitatively the same. Clearly we have some other differences in eval/data that we need to hunt down.

Cadene

LGTM

Cadene · 2024-03-21T14:30:10Z

Thanks for your PR. It looks super clean. Too bad we can't exactly reproduce the exact same policy. It seems a bit behind their policy as you showed with the not-fine grained behavior it produces.

Cadene · 2024-03-21T23:16:31Z

Merged to be able to load your pretrained model on main ;)

alexander-soare added 5 commits March 20, 2024 09:49

backup wip

32e3f71

backup wip

d323993

ready for review

acf1174

Merge remote-tracking branch 'upstream/main' into fix_pusht_diffusion

72d3c31

revert changes to default.yaml

4e10cd3

alexander-soare requested a review from Cadene March 21, 2024 11:30

alexander-soare added 4 commits March 21, 2024 11:42

update deps

b562f89

add cpu dep

48df15e

cpu poetry lock

9836107

remove TODO

41912b9

Cadene approved these changes Mar 21, 2024

View reviewed changes

Cadene merged commit b633748 into huggingface:main Mar 21, 2024
1 check passed

alexander-soare mentioned this pull request Mar 22, 2024

Reproduce diffusion policy results on Pusht #43

Closed

alexander-soare deleted the fix_pusht_diffusion branch March 27, 2024 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes issues with PushT diffusion #41

Fixes issues with PushT diffusion #41

alexander-soare commented Mar 21, 2024 •

edited

Loading

Cadene left a comment

Cadene commented Mar 21, 2024

Cadene commented Mar 21, 2024

Fixes issues with PushT diffusion #41

Fixes issues with PushT diffusion #41

Conversation

alexander-soare commented Mar 21, 2024 • edited Loading

About the trained model weights

Cadene left a comment

Choose a reason for hiding this comment

Cadene commented Mar 21, 2024

Cadene commented Mar 21, 2024

alexander-soare commented Mar 21, 2024 •

edited

Loading