Corpora used in WSD annotated with senses and converted to NAF (XML based) format
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
scripts
semcor1.6
semcor3.0
semeval2007_task17_allwords
semeval2010_task17
semeval2013_task12
sense_freq
senseval2
senseval3
wordnet_gloss_corpus
LICENSE
README.md

README.md

#WSD CORPORA# Corpora used in WSD annotated with senses and converted to NAF format

##Corpora##

Available you can find:

  • SemCor (with WordNet senses 1.6 and 3.0)
  • SensEval2: traditional all-words task
  • SenseEval3: traditional all-words task
  • SemEval-2010 task 17: WSD on a specific domain
  • SemEval-2007 task 17 all words
  • SemEval-2013 Task 12: Multilingual Word Sense Disambiguation (langs en,es,fr,it,de)
  • Princeton WordNet Gloss Corpus (original files are also included in the folder itself)

##scripts##

Scripts available:

  • mfs_dict.py. to generate sense frequencies dicts for semcor 1.6 and semcor 3.0 run python scripts/mfs_dict.py -h for more information. In order to use it (you will need to install the lxml.etree library for python.2.7)
  • wn_gloss_corpus_to_naf.py to convert the princeton wordnet gloss corpus to NAF. run python scripts/wn_gloss_corpus_to_naf.py -h for more information on how to use it.

##Contact##