Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction
ACL (Findings) 2024
Meishan Zhang, Hao Fei*, Bin Wang , Shengqiong Wu, Yixin Cao, Fei Li, Min Zhang (*Correspondence )
Note: The data part of this project currently has not been released yet, as we are working on significantly expanding the dataset in terms of both quantity and annotation content.
To evaluate the performance of our grounded MUIE system, we develop a benchmark testing set. We select 9 existing datasets from different modalities (or combinations thereof) for IE/MIE tasks. The following table summarizes these datasets of the raw sources. We then process these datasets, such as Text↔Speech, to create 6 new datasets under new multimodal (combination) scenarios. Before annotation, we carefully select 200 instances from their corresponding testing sets, ensuring each instance contained as much IE information as possible.
