Join GitHub today
The tagger tester is a program that can be used after deploying the tagger, to test it independently of the other modules.
It requires both the tagger stand-alone application to be running, and the aidr-tagger-api, aidr-trainer-api, aidr-task-manager, aidr-db-manager EE applications to be deployed.
The tagger tester is run through the following command:
mvn test -DTaggerTesterTest PARAMETERS
These parameters are optional:
-Dconfig=FILE -Dnitems-train=NUMBER (default 200) -Dnitems-test=NUMBER (default 1000) -Dquiet=TRUE/FALSE (default false)
The config is the name of the tagger configuration (or of a centralized configuration) to read properties that the tester needs to know to perform the testing.
The nitems-train is the number of training (labelled) items to give to the tagger. It must be strictly larger than
sampleCountThreshold which is the minimum number of items required to create a model.
The nitems-test is the number of testing (unlabelled) items to give to the tagger. It can be any number greater than 1000. The reason why this cannot be a small number is to be able to have good statistics about the number of cases classified correctly and incorrectly.
The quiet option suppresses the print of the tweets. All other messages are printed even in quiet mode.
Text of training/testing tweets
All tweets generated by this tester are synthetic and randomly generated, but conform to a specific format.
Half of the training tweets have the "WHITE" attribute value as a human-provided tag, and are made of random 30-word sequences of the words "light", "clear", "snow", "clouds", "neutral", and "wNN". Whenever the word is wNN, it is written as a letter "w" followed by 2 random digits.
Half of the training tweets have the "BLACK" attribute value as a human-provided tag, and are made of random 30-word sequences of the words "coal", "night", "coffee", "ink", "neutral", and "wNN". Whenever the word is wNN, it is written as a letter "w" followed by 2 random digits.
Example training tweets:
light clear w09 w19 snow ... neutral clear light w87 light -> WHITE clear clouds snow light neutral ... w91 clear clear light light -> WHITE neutral neutral coffee w58 night ... coal night w10 ink night -> BLACK w82 coffee night coal w11 ... ink night coffee neutral w31 -> BLACK
The testing tweets are generated in the same way, half of them correspond to "WHITE" tweets, half of them to "BLACK" tweets. Note that the testing items have no label associated to them, i.e. they are unlabelled.
The purpose of the "neutral" word is to have overlap, i.e. a word that appears in both the WHITE and BLACK sets, which avoids generating a trivial classification problem. The purpose of the "wNNN" random words is to bypass the de-duplication check done by the tagger, ensuring every tweet is different enough from others.
The tagger tester should perform the following steps:
- Make sure there is no data with
aidr-predictdatabase in case the tagger tester died abnormally in a previous run. If there is data, write a warning message, run the CLEANUP routine, and FAIL (forcing the user to run the tagger tester again)
- Create a test user
Tagger Tester Userusing the
addUserservice of the
Tagger-APImodule. Check that the user exists after creating it. FAIL if this does not succeed.
- Create a collection (
name="Tagger Tester Crisis", code="tagger_tester") using the
addCrisisservice in the
CrisisResourceof the Tagger-API module. Check that the collection exists after creating it. FAIL if this does not succeed.
- Create a classifier using the following steps:
- Create an attribute (name="tagger_tester_classifier") using the
NominalAttributeResourcein the Tagger-API module. Check that the attribute exists after creating it. FAIL if this does not succeed.
- Create three labels using the
NominalLabelResourcein the Tagger-API module (use
attribute_idgenerated during the previous step). Check that all labels exist after creating them. FAIL if this does not succeed.
name="Does not apply" code="null"
- Create a ModelFamily using the
addCrisisAttributeservice of the
ModelFamilyResourcein the Tagger-API module (use
nominal_label_idgenerated in the previous steps). Check that the model family exists after creating it. FAIL if this does not succeed.
- Subscribe to the Redis queue where the tagger writes its output, otherwise FAIL
- Generate random items (defined above) and Push them to Redis on channel
FetcherChannel.tagger_testerat the rate of 5 items/second. A valid AIDR item is a JSON document with minimum required fields as defined here. You can add use the
tweetidfield, for example, to keep track of which item belongs to which label (i.e. White, Black). Keep pushing items until the document table in the
aidr-predictdatabase receives at least 200 items waiting to be labeled. (TO-DO: need an API to check the total number of unlabeled items for a crisis)
- Get a task to label by using the
getOneTaskBufferToAssignservice of the
- Assign the correct label to that item (using its tweetid) and save it using the
saveservice of the
- After about 100 white items and 100 black items have been tagged, check if the Tagger module has created a model using
Tagger-APImodule. If not, wait 10 seconds and keep tagging more items, 50 at a time.
- For testing, generate WHITE testing items and push them to the tagger
- Subscribe to
- Verify (reading from the
aidr_predict.tagger_tester) that at least 80% of them are tagged WHITE, otherwise FAIL
- Generate BLACK testing items and push them to the tagger
- Verify (reading from the
aidr_predict.tagger_tester) that at least 80% of them are tagged BLACK, otherwise FAIL
- Run a CLEANUP routine
- If this point is reached, exit with a successful return code
FAIL means executing the CLEANUP routine, printing a clear and informative message describing the condition, and exiting with code 1 (non success).
CLEANUP means removing all data associated to
On interrupt by the user, the classifier tester should attempt to cleanup any state created in the classifier.