Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal data as input to the model #10

Closed
SergioArnaud opened this issue Oct 29, 2022 · 1 comment
Closed

Multimodal data as input to the model #10

SergioArnaud opened this issue Oct 29, 2022 · 1 comment

Comments

@SergioArnaud
Copy link

Hi, congratulations on the amazing work!

I wanted to ask a question, the paper mentions that multimodal data [RGB + proprioception] can be used as input of the model

In the code, the observations are sent to an encoder that process them in different ways depending if it's pixels or another modality, nevertheless I'm not sure that any of those options apply to multimodal data containing both pixels and state information. Given the experiments you made in the paper, how would you recommend processing such data in the encoder?

@nicklashansen
Copy link
Owner

Hi, thank you for your interest. We recently open-sourced an extension to the TD-MPC algorithm which takes multimodal data (pixels + state) inputs by default. It is available here: https://github.com/facebookresearch/modem/blob/6cee94def92b910a3fe122a10dcec1330c3519c3/algorithm/tdmpc.py#L37
Modalities are fused by projecting features from each modality into a low-dim space and summing them. Feel free to re-open if you have additional questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants