Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Important components #12

Closed
hbishop1 opened this issue Jul 6, 2023 · 2 comments
Closed

Important components #12

hbishop1 opened this issue Jul 6, 2023 · 2 comments

Comments

@hbishop1
Copy link

hbishop1 commented Jul 6, 2023

Hi, I am currently looking at implementing a diffusion model for policy learning and was very impressed by your work! I was wondering what components of your approach you found to be particularly important for good results? 3 things I specifically was curious about were:

  • I see you use EMA, did you find that the model predictions were particularly unimodal/overfit to recent training data without it?
  • Was the causal attention masking used in the transformer variant crucial in getting this architecture to work, or do you think simply decoding waypoints from a more BERT-style encoder architecture would work?
  • In the appendix it seems you used a particularly large model for the CNN variant and say that you always found larger CNN -> better performance. Was the performance of much smaller CNNs (e.g. ~10M) much worse?
@cheng-chi
Copy link
Collaborator

Hi @hbishop1:

  1. I empirically found EMA to accelerate training (eval performance increases faster) and improve performance (by <5%), but the policy should "work" even without it.
  2. I found the causal attention masking to be critical to get the transformer variant of diffusion policy to work. My suspicion is that when used without it, the model "cheats" by looking ahead into future end-effector poses, which is almost identical to the action of the current timestep.
  3. I think the model capacity needed depends on task complexity (more complex task requires larger CNN). Reducing the number of training diffusion steps also reduces CNN capacity requirement at the expense of reduced action quality. ~10M CNN should still work with less than 10% performance penalty on benchmarks we have tested.

@hbishop1
Copy link
Author

hbishop1 commented Jul 6, 2023

Great, thanks for the quick and detailed response, that will really help!

@hbishop1 hbishop1 closed this as completed Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants