Multimodal data as input to the model #10

SergioArnaud · 2022-10-29T01:33:58Z

Hi, congratulations on the amazing work!

I wanted to ask a question, the paper mentions that multimodal data [RGB + proprioception] can be used as input of the model

In the code, the observations are sent to an encoder that process them in different ways depending if it's pixels or another modality, nevertheless I'm not sure that any of those options apply to multimodal data containing both pixels and state information. Given the experiments you made in the paper, how would you recommend processing such data in the encoder?

nicklashansen · 2022-12-22T16:25:48Z

Hi, thank you for your interest. We recently open-sourced an extension to the TD-MPC algorithm which takes multimodal data (pixels + state) inputs by default. It is available here: https://github.com/facebookresearch/modem/blob/6cee94def92b910a3fe122a10dcec1330c3519c3/algorithm/tdmpc.py#L37
Modalities are fused by projecting features from each modality into a low-dim space and summing them. Feel free to re-open if you have additional questions!

nicklashansen closed this as completed Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal data as input to the model #10

Multimodal data as input to the model #10

SergioArnaud commented Oct 29, 2022

nicklashansen commented Dec 22, 2022

Multimodal data as input to the model #10

Multimodal data as input to the model #10

Comments

SergioArnaud commented Oct 29, 2022

nicklashansen commented Dec 22, 2022