Double Multi-Head Attention Multimodal System for Odyssey 2024 SER Challenge Federico Costa; Miquel India; Javier Hernando Odyssey 2024

Abstract

model
- Pre-trained self-supervised models were used to extract acoustic and text features. An
- early fusion strategy was adopted, where
- a Multi-Head Attention layer transforms these mixed features into complementary contextualized representations
- A second attention mechanism is then applied to pool these representations into an utterance-level vector. Our proposed system achieved the
third position in the categorical task ranking with a 34.41% Macro-F1 score,
- 31 teams participated in total

Provide feedback