Skip to content
2018 OKBQA-7 Task3. Multimodal Character Identification on Videos
Perl Python C++ Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
conll-2012/scorer/v8.01
data
src
.gitignore
README.md
cache_elmo.py
character_evaluate.py
character_evaluate_bigbang.py
character_evaluate_friendsnew.py
conll.py
coref_kernels.cc
coref_model.py
coref_ops.py
demo.py
evaluate.py
experiments.conf
filter_embeddings.py
get_char_vocab.py
link_bigbang_gold_character.py
link_character.py
link_character_bigbang.py
link_character_friendsnew.py
link_gold_character.py
metrics.py
minimize.py
predict.py
requirements.txt
setup_all.sh
setup_training.sh
train.py
util.py

README.md

OKBQA-7 Task3 : Multimodal Character Identification on Videos

Task Definition

This task aims to link each mention to a certain character in dialogue based on given dialouge text and corresponding video. Let a mention be a nominal referring to a person (e.g., she, mom, Judy), and an entity be a character in a dialogue.

Example

Introduction

Character identification on text have been studied on Friends dataset and shown practical performance for identifying main characters(Chen et al., 2017; Choi&Chen, 2018). However, these studies solved the problem in the form of entity linking on pre-defined characters. Thus, these modules couldn’t be applied to other than the Friends script unless module is re-trained on the newly constructed data. This task should be approached in the form of coreference resolution to be applied to arbitrary dialogue or video script. There is a study that introduces coreference resolution based approach for this task(Chen et al., 2017), but coreference resolution is difficult problem in NLP, so the performance is not practical(F1 : 57.46% for 9 main characters).

Therefore, if we expand the task to get not only dialouge text but also video as inputs, the performance would be improved to a practical level by utilizing richer features. This task is the extension of SemEval2018 Task4. There are two main extensions. Firstly, it adds multi modality by utilizing video as a input. Secondly, the final module of this task could be applied to arbitrary dialogue or video script.

Task Organizers

Datasets

The first two seasons of the TV show Friends are annotated for this task. Each season consists of episodes, each episode comprises scenes, and each scene is segmented into sentences. The followings describe the distributed datasets:

No dedicated development set was distributed for this task; feel free to make your own development set for training or perform cross-validation on the training sets.

Format

All datasets follow the CoNLL 2012 Shared Task data format. Documents are delimited by the comments in the following format:

#begin document (<Document ID>)[; part ###]
...
#end document

Each sentence is delimited by a new line ("\n") and each column indicates the following:

  1. Document ID: /<name of the show>-<season ID><episode ID> (e.g., /friends-s01e01).
  2. Scene ID: the ID of the scene within the episode.
  3. Token ID: the ID of the token within the sentence.
  4. Word form: the tokenized word.
  5. Part-of-speech tag: the part-of-speech tag of the word (auto generated).
  6. Constituency tag: the Penn Treebank style constituency tag (auto generated).
  7. Lemma: the lemma of the word (auto generated).
  8. Frameset ID: not provided (always _).
  9. Word sense: not provided (always _).
  10. Speaker: the speaker of this sentence.
  11. Named entity tag: the named entity tag of the word (auto generated).
  12. Start time: start time of the sentence on video. (millisecond)
  13. End time: start time of the sentence on video. (millisecond)
  14. Video file: Pre-processed sequence of image file from the video corresponding to the sentence. This column represents the file name of the pickle object (Pickle object will be released on 08/01)
  15. Entity ID: the entity ID of the mention, that is consistent across all documents.

Here is a sample from the training dataset:

/friends-s01e01  0  0  He     PRP   (TOP(S(NP*)    he     -  -  Monica_Geller   *  55422 59256 00005.pickle (284)
/friends-s01e01  0  1  's     VBZ          (VP*    be     -  -  Monica_Geller   *  55422 59256 00005.pickle -
/friends-s01e01  0  2  just   RB        (ADVP*)    just   -  -  Monica_Geller   *  55422 59256 00005.pickle -
/friends-s01e01  0  3  some   DT        (NP(NP*    some   -  -  Monica_Geller   *  55422 59256 00005.pickle -
/friends-s01e01  0  4  guy    NN             *)    guy    -  -  Monica_Geller   *  55422 59256 00005.pickle (284)
/friends-s01e01  0  5  I      PRP  (SBAR(S(NP*)    I      -  -  Monica_Geller   *  55422 59256 00005.pickle (248)
/friends-s01e01  0  6  work   VBP          (VP*    work   -  -  Monica_Geller   *  55422 59256 00005.pickle -
/friends-s01e01  0  7  with   IN     (PP*))))))    with   -  -  Monica_Geller   *  55422 59256 00005.pickle -
/friends-s01e01  0  8  !      .             *))    !      -  -  Monica_Geller   *  55422 59256 00005.pickle -
/friends-s01e01  0  0  C'mon  VB   (TOP(S(S(VP*))  c'mon  -  -  Joey_Tribbiani  *  59459 61586 00006.pickle -
/friends-s01e01  0  1  ,      ,                 *  ,      -  -  Joey_Tribbiani  *  59459 61586 00006.pickle -
/friends-s01e01  0  2  you    PRP           (NP*)  you    -  -  Joey_Tribbiani  *  59459 61586 00006.pickle (248)
/friends-s01e01  0  3  're    VBP            (VP*  be     -  -  Joey_Tribbiani  *  59459 61586 00006.pickle -
/friends-s01e01  0  4  going  VBG            (VP*  go     -  -  Joey_Tribbiani  *  59459 61586 00006.pickle -
/friends-s01e01  0  5  out    RP           (PRT*)  out    -  -  Joey_Tribbiani  *  59459 61586 00006.pickle -
/friends-s01e01  0  6  with   IN             (PP*  with   -  -  Joey_Tribbiani  *  59459 61586 00006.pickle -
/friends-s01e01  0  7  the    DT             (NP*  the    -  -  Joey_Tribbiani  *  59459 61586 00006.pickle -
/friends-s01e01  0  8  guy    NN            *))))  guy    -  -  Joey_Tribbiani  *  59459 61586 00006.pickle (284)
/friends-s01e01  0  9  !      .               *))  !      -  -  Joey_Tribbiani  *  59459 61586 00006.pickle -

A mention may include more than one word:

/friends-s01e02  0  0  Ugly         JJ   (TOP(S(NP(ADJP*  ugly         -  -  Chandler_Bing  *  332158 334460 00038.pickle (380
/friends-s01e02  0  1  Naked        JJ                *)  naked        -  -  Chandler_Bing  *  332158 334460 00038.pickle -
/friends-s01e02  0  2  Guy          NNP               *)  Guy          -  -  Chandler_Bing  *  332158 334460 00038.pickle 380)
/friends-s01e02  0  3  got          VBD             (VP*  get          -  -  Chandler_Bing  *  332158 334460 00038.pickle -
/friends-s01e02  0  4  a            DT              (NP*  a            -  -  Chandler_Bing  *  332158 334460 00038.pickle -
/friends-s01e02  0  5  Thighmaster  NN               *))  thighmaster  -  -  Chandler_Bing  *  332158 334460 00038.pickle -
/friends-s01e02  0  6  !            .                *))  !            -  -  Chandler_Bing  *  332158 334460 00038.pickle -

The mapping between the entity ID and the actual character can be found in friends_entity_map.txt.

Input

You can use friends.train.episode_delim.conll as a training input, and friends.test.episode_delim.conll as a test input.

Output and Evaluation

Your output must consist of the entity ID of each mention, one per line, in the sequential order. There are 6 mentions in the above example, which will generate the following output:

284
284
248
248
284
380

Given this output, the evaluation script will measure,

  1. The label accuracy considering only 7 entities, that are the 6 main characters (Chandler, Joey, Monica, Phoebe, Rachel, and Ross) and all the others as one entity.
  2. The macro average between the F1 scores of the 7 entities.
  3. The label accuracy considering all entities, where characters not appearing in the tranining data are grouped as one entity, others.
  4. The macro average between the F1 scores of all entities.
  5. The F1 scores for 7 entities.
  6. The F1 scores for all entities.
You can’t perform that action at this time.