Toy WSD system with basic functions for training and testing a simple system with SVM.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



    1. -> for training the system with a new instance
  • Usage: python "sense_id" "text"

  • Example: python 'caballo.1' "resultado de su salto, el caballo..."

  • Output: void

  • --> for classifying a new instance

  • Usage: python "text"

  • Example: python "este caballo es un animal muy noble"

  • Output: the guessed sense (or unknown in case there is no classifier)

  • --> to clean all the classifiers and "forget" everything

  • python

*** Important ***

  • The sense identifiers should follow the format: target_word.sensenumber:

  • caballo.1 paard.2 muis.200 ...

  • The scripts produce some debug output. To remove it edit the file and comment the next line (line number 15)

  • logging.basicConfig(stream=sys.stderr,format='%(asctime)s - %(levelname)s - %(message)s',level=logging.DEBUG)

  • The will obtain the target word directly from the sense identifier, but the will try to "guess" which is the target word by selecting the target word most likely from a predefined list of words by using the levenshtein distance (comparing each possible target word with each token in the text). This predefined list of possible target words is encoded in the file "target_words", so for adding/removing a target word just modify that file (one target word per file in lowercase)

  • There are shell scripts which simulate the training/classification of some instances for the spanish word "caballo" (horse) which is also polysemous in spanish. You can run them to see how the system works:

  • (will train a classifier for "caballo" with 4 examples, 2 for sense 1 and 2 for sense2)

  • (will classify 3 examples with the caballo classifier"