To address the lack of context utilization in SLT, we construct a large-scale question-driven sign language (QSL) dataset. We select QA pairs from daily life scenarios, such as hotel, restaurant, tourist attraction, etc., as the corpus for the QSL dataset, while the answers are represented as sign language videos.
The QSL dataset contains a total of 30,040 available QA pairs with a spatial resolution of 1280×720 and a frame rate of 30 FPS. We split the dataset into Train, Dev and Test sets with a ratio of 3 : 1 : 1 (i.e., 18,024 for training and 6,008 for dev and 6,008 for test).
The QSL dataset is released to universities and research institutes for research purpose only. To request the access right to the data resources, please follow the link below:
Baidu Netdisk: https://pan.baidu.com/s/1Soe0fcnN2ByFr1Yionagzg [password: ffei]
To implement question-driven sign language translation, we propose a Gloss-Bridged Translator (GBT) model. The code is as follows: https://github.com/glq-1992/GBT-code.