Tagger Tester

latika-bhurani edited this page Jul 20, 2015 · 40 revisions
Clone this wiki locally

The tagger tester is a program that can be used after deploying the tagger, to test it independently of the other modules.

It requires both the tagger stand-alone application to be running, and the aidr-tagger-api, aidr-trainer-api, aidr-task-manager, aidr-db-manager EE applications to be deployed.

Command line

The tagger tester is run through the following command:

mvn test -DTaggerTesterTest PARAMETERS

These parameters are optional:

-Dconfig=FILE
-Dnitems-train=NUMBER (default 200)
-Dnitems-test=NUMBER (default 1000)
-Dquiet=TRUE/FALSE (default false)

The config is the name of the tagger configuration (or of a centralized configuration) to read properties that the tester needs to know to perform the testing.

The nitems-train is the number of training (labelled) items to give to the tagger. It must be strictly larger than sampleCountThreshold which is the minimum number of items required to create a model.

The nitems-test is the number of testing (unlabelled) items to give to the tagger. It can be any number greater than 1000. The reason why this cannot be a small number is to be able to have good statistics about the number of cases classified correctly and incorrectly.

The quiet option suppresses the print of the tweets. All other messages are printed even in quiet mode.

Text of training/testing tweets

All tweets generated by this tester are synthetic and randomly generated, but conform to a specific format.

Half of the training tweets have the "WHITE" attribute value as a human-provided tag, and are made of random 30-word sequences of the words "light", "clear", "snow", "clouds", "neutral", and "wNN". Whenever the word is wNN, it is written as a letter "w" followed by 2 random digits.

Half of the training tweets have the "BLACK" attribute value as a human-provided tag, and are made of random 30-word sequences of the words "coal", "night", "coffee", "ink", "neutral", and "wNN". Whenever the word is wNN, it is written as a letter "w" followed by 2 random digits.

Example training tweets:

light clear w09 w19 snow ... neutral clear light w87 light -> WHITE
clear clouds snow light neutral ... w91 clear clear light light -> WHITE
neutral neutral coffee w58 night ... coal night w10 ink night -> BLACK
w82 coffee night coal w11 ... ink night coffee neutral w31 -> BLACK

The testing tweets are generated in the same way, half of them correspond to "WHITE" tweets, half of them to "BLACK" tweets. Note that the testing items have no label associated to them, i.e. they are unlabelled.

The purpose of the "neutral" word is to have overlap, i.e. a word that appears in both the WHITE and BLACK sets, which avoids generating a trivial classification problem. The purpose of the "wNNN" random words is to bypass the de-duplication check done by the tagger, ensuring every tweet is different enough from others.

Execution

The tagger tester should perform the following steps:

  1. Make sure there is no data with code="tagger_tester" in aidr-predict database in case the tagger tester died abnormally in a previous run. If there is data, write a warning message, run the CLEANUP routine, and FAIL (forcing the user to run the tagger tester again)
  2. Create a test user Tagger Tester User using the addUser service of the UserResource in the Tagger-API module. Check that the user exists after creating it. FAIL if this does not succeed.
  3. Create a collection (name="Tagger Tester Crisis", code="tagger_tester") using the addCrisis service in the CrisisResource of the Tagger-API module. Check that the collection exists after creating it. FAIL if this does not succeed.
  4. Create a classifier using the following steps:
    1. Create an attribute (name="tagger_tester_classifier") using the NominalAttributeResource in the Tagger-API module. Check that the attribute exists after creating it. FAIL if this does not succeed.
    2. Create three labels using the NominalLabelResource in the Tagger-API module (use attribute_id generated during the previous step). Check that all labels exist after creating them. FAIL if this does not succeed.
      1. name="White", code="white"
      2. name="Black", code="black"
      3. name="Does not apply" code="null"
  5. Create a ModelFamily using the addCrisisAttribute service of the ModelFamilyResource in the Tagger-API module (use crisis_id, nominal_attribute_id and nominal_label_id generated in the previous steps). Check that the model family exists after creating it. FAIL if this does not succeed.
  6. Subscribe to the Redis queue where the tagger writes its output, otherwise FAIL
  7. Generate random items (defined above) and Push them to Redis on channel FetcherChannel.tagger_tester at the rate of 5 items/second. A valid AIDR item is a JSON document with minimum required fields as defined here. You can add use the tweetid field, for example, to keep track of which item belongs to which label (i.e. White, Black). Keep pushing items until the document table in the aidr-predict database receives at least 200 items waiting to be labeled. (TO-DO: need an API to check the total number of unlabeled items for a crisis)
  8. Get a task to label by using the getOneTaskBufferToAssign service of the DocumentController of the Trainer-API module
  9. Assign the correct label to that item (using its tweetid) and save it using the save service of the TaskAnswerContoller of the Trainer-API module
  10. After about 100 white items and 100 black items have been tagged, check if the Tagger module has created a model using getModelsByModelFamilyID service of ModelResource of Tagger-API module. If not, wait 10 seconds and keep tagging more items, 50 at a time.
  11. For testing, generate WHITE testing items and push them to the tagger
  12. Subscribe to aidr_predict.tagger_tester
  13. Verify (reading from the aidr_predict.tagger_tester) that at least 80% of them are tagged WHITE, otherwise FAIL
  14. Generate BLACK testing items and push them to the tagger
  15. Verify (reading from the aidr_predict.tagger_tester) that at least 80% of them are tagged BLACK, otherwise FAIL
  16. Run a CLEANUP routine
  17. If this point is reached, exit with a successful return code

FAIL means executing the CLEANUP routine, printing a clear and informative message describing the condition, and exiting with code 1 (non success).

CLEANUP means removing all data associated to code="tagger_tester"

On interrupt by the user, the classifier tester should attempt to cleanup any state created in the classifier.