How to solve with audio+video for classification? #44

Yinjie-ZHENG · 2021-08-31T09:32:13Z

Hi, thanks for your implement and I am confused with how you implement Audio + video. In the paper, I see that audio and video modalities are fused at input, and "We achieve this by concatenating a learned, modality-specific encoding to each input. "

Could you give an example of using your "learned, modality-specific encoding" to concatenate these two modality? What should the input be like so that I can feed the data into your perceiver model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to solve with audio+video for classification? #44

How to solve with audio+video for classification? #44

Yinjie-ZHENG commented Aug 31, 2021

How to solve with audio+video for classification? #44

How to solve with audio+video for classification? #44

Comments

Yinjie-ZHENG commented Aug 31, 2021