Skip to content

rafehr/COLF-VID

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COLF-VID

COLF-VID

COLF-VID is a COrpus of Literal and Figurative readings of German Verbal IDioms in context. It comes in 34 files containing annotated instances (along with the sentences they occur in) of 34 different German verbal idiom (VID) types. The annotation consists of four labels: LITERAL -> LIT, IDIOMATIC/FIGURATIVE -> FIG, UNDECIDABLE -> UND and BOTH -> BOTH. A more detailed description of the corpus can be found in the paper Supervised Disambiguation of German Verbal Idioms with a BiLSTM Architecture. At the moment, there exist three different versions of COLF-VID:

  • COLF-VID_1.0: The version of the corpus that was used during the experiments described in the paper. It was lemmatized with GermaLemma and POS tagged with the TreeTagger.
  • COLF-VID_1.1: The cleaned version of COLF-VID_1.0. COLF-VID_1.0 contained some dublicates that were removed. Does not contain lemmas or POS tags at the moment, but we will add those along with dependency information in the near future with UDPipe.
  • COLF-VID_2.0: Work in progress. We aim to add annotations for VID instances in the corpus that were not part of the pre-chosen set of the 34 VID types and thus were not annotated in the first run.

LICENSE

Available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (In the paper we erroneously write that we make it available under the Creative Commons Attribution-ShareAlike 4.0 International license).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published