Skip to content
adrianb82 edited this page Mar 14, 2016 · 2 revisions

Welcome to the videocorpus wiki!

This github presents a set of resources for evaluating Named Entity Linking (NEL) performance on a set of video transcripts.

Currently we provide a single corpus.

RBB150

Currently it contains 150 documents collected from RBB transcripts.

It contains several folders:

  • tutorial - short GATE tutorial used by the annotators
  • guideline - annotation guideline
  • subs - raw subtitles / transcripts
  • ontology - original and enriched ontology (created by automatically adding subtypes to the original ontology)
  • gold - gold standards in various formats (csv, nif, etc).

Depending on the evaluation task and the method you use for evaluation (by entity type, GERBIL, TAC, etc), the gold standard can include unlinked (NIL) entities or not.

Clone this wiki locally