Skip to content

patrick-tssn/NLPCC-2022-Shared-Task-4

Repository files navigation

NLPCC-2022-Shared-Task-4

Multimodal Dialogue Understanding and Generation

Please refer to the latest shared task: NLPCC2023 Shared Task 10

◇ Task 4 - Multimodal Dialogue Understanding and Generation

The multimodal dialogue understanding and generation task can be divided into two phases: multimodal context understanding and response generation. Specifically, the former includes dialogue session identification (i.e., determining whether the dialogue content has changed) and dialogue scene identification (i.e., determining whether the video context has changed). The ultimate goal is to generate a response that is coherent to the dialogue context and relevant to the video context. This task includes three tracks:

● Track 1: Dialogue scene identification: predict the boundaries of different dialogue scenes given a set of continuous dialogue utterances and a related video.

● Track 2: Dialogue session identification: predict the boundaries of different dialogue sessions given a set of continuous dialogue utterances and a related video (which is identical to Track 1).

● Track 3: Dialogue response generation: generate a response based on scene and session predictions, while coherently catching up with the conversation.

Organizer: Wangxuan Institute of Computer Technology, Peking University

Data

Statistics

clips utterances scenes sessions utter/clip scene/clip session/clip en_word/clip en_word/utter ch_word/clip ch_word/utter
train 40,006 1,000,079 56,535 106,078 25 1.41 2.65 166.46 6.66 267.74 10.71
valid 1,955 50,032 3,202 6,331 25.6 1.64 3.24 174.49 6.82 283.7 11.09
test 1,934 50,131 3,284 6,949 25.92 1.7 3.59 178.65 6.89 286.42 11.05

LeaderBoard

Track 1

Team ID Accuracy F1
LingJing 93.88 18.15

Track 2

Team ID Accuracy F1
LingJing 87.79 28.90
Eastwood 72.29 39.76

Track 3

Team ID Bleu1 Bleu2 Bleu3 Bleu4 Rouge-L METEOR CIDEr
LingJing 0.139 0.100 0.083 0.072 0.226 0.117 1.290

UPDATE

2022.05.23 We released the evaluation scripts

evaluation/evaluate.py

2022.05.12 We released the test set, you can download from BaiduNetDisk(closed)

SUBMISSION FORMAT:

#
# task 1: TEAMID_track1.json
{
       'Friends_S01E01_clip_000':[1, 0, ...], # your prediction in list format(you are not asked to predict the label of the last utterance.)
       ...
}
# task 2: TEAMID_track2.json
{
       'Friends_S01E01_clip_000':[1, 0, ...], # your prediction in list format(you are not asked to predict the label of the last utterance.)
       ...
}
# task 3: TEAMID_track3.json
{
       'Friends_S01E01_clip_000':'__ANSWER__', # replace __ANSWER__ with your answer(English only)
       ...
}

SUBMISSION Deadline: 2022.05.20

SUBMISSION ENTRY(closed)

Details

Download [train|valid].jsonl and MDUG_rgb_[train|valid]_n.zip BaiduNetDisk(closed) | GoogleDrive(closed)

  • TEXT
# .jsonl
{'Friends_S01E01': // episode id: [name of TV show]_S[season]_E[episode]
       [{
            'vid':'Friends_S01E01_clip_000', // clip id
            'subs':[
                {
                    'turn_id':0, // turn id of current clip
                    'en_text':'', // English
                    'ch_text':'', // Chinese
                    'start':0.00, // start time of the utterance in the clip
                    'end':0.00, // end time of the utterance in the clip
                    'scene':1, // 1: start of new dialog scene; 0: otherwise
                    'session':1 // 1: start of new dialog session; 0: otherwise
                }, 
                ...]
        }, 
        ...]
 }
 ...
  • VIDEO

NOTE: We only provide frames(3fps) of the provided utterances

// [name of TV show]_S[season]_E[episode]_clip_[clip id]_dia_[turn id]_[image id].jpg
├── Friends_S01E01
     └── clip_000
              └── Friends_S01E01_clip_000_dia_00_01.jpg 
              └── Friends_S01E01_clip_000_dia_00_02.jpg
              └── ...
     └── ...
├── ...
// MD5
{
       "MDUG_rgb_train_1.zip": "746ce538d0568f190659084c908d7de0", 
       "MDUG_rgb_train_2.zip": "9413810d1d1c9332376b08cf4fd7c037", 
       "MDUG_rgb_train_3.zip": "5247a69664dd9089f6801bcba32bc036", 
       "MDUG_rgb_train_4.zip": "934265252eaba3df98806fd9755b8a1e", 
       "MDUG_rgb_train_5.zip": "6c35c52733ff50fa868523abb237771b", 
       "MDUG_rgb_train_6.zip": "1d0c508569d3f56cac2d263ba7f419f6", 
       "MDUG_rgb_train_7.zip": "5208ac23a7f674ff5640466878b70669", 
       "MDUG_rgb_train_8.zip": "bc988d145bc6f940a7e0f64b5da69685", 
       "MDUG_rgb_train_9.zip": "d451cfe63f1db914bc9f6d4041c3f897", 
       "MDUG_rgb_train_10.zip": "0bd60f108da36e301882f7b30ad77233", 
       "MDUG_rgb_train_11.zip": "abedefadcda2998e4cd1c6563c62e182", 
       "MDUG_rgb_train_12.zip": "e114d158b456339bbb9fc6a0a32be56a", 
       "MDUG_rgb_train_13.zip": "2faddea5401d7b191e0155bb20350280", 
       "MDUG_rgb_train_14.zip": "c68ce06a7a3c6c8466f4484b04682d16", 
       "MDUG_rgb_train_15.zip": "96909cf7a7aa6b3780dae90e6a703f9c", 
       "MDUG_rgb_train_16.zip": "0ffd572468c8f8255aa16dcf25d20b7d"
}
{
       "MDUG_rgb_valid.zip": "1e00d0ba50dc7fdb41f12936500a7de7"
}

About

Multimodal Dialogue Understanding and Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published