Tagger Concepts
Navigation
- Home
- [What is AIDR?](AIDR Overview)
- The science behind AIDR
- [Operator's manual](AIDR Operator's Manual)
- [Public API documentation](API documentation)
System administrators
Developers
- High-level overview
- Common
- DB Manager
- DTO standards
- Database schema
- Manager
- Manager API
- Collector
- Collector API
- Reconnect strategy
- Collector Tester
- Output
- Output API
- Output Buffered
- Output Streaming
- Output Tester
- Persister
- Persister API
- Persister Tester
- Tagger
- Tagger Concepts
- Tagger API
- Tagger Tester
- Trainer
- Trainer API
- PyBossa Trainer
Design and standards
- User needs and personas
- Roadmap
- Design concepts
- Stream processing
- Per collection start or stop
- Standards
- Logging standards
- Coding standards
- Naming conventions
- Testing
- Release checklist
QCRI-specific
Credits
Clone this wiki locally
If you haven't worked with automatic classification of documents, read this short explanation first.
A nominal attribute is an attribute that takes a number of values, such as "color".
A nominal label is a possible value of a nominal attribute. For instance, if the nominal attribute is "color", possible nominal labels are "red", "green", and "blue".
A model is an automatic classifier, associated to a collection and nominal attribute, that has been trained to assign automatically a nominal label to an item, based on a set of human-labelled items.
A model family is a set of models for the same collection and nominal attribute. At every moment, only one model is active within a model family. The active model is usually the model that has been trained with the larger number of human-labelled items, or the one that has the greater http://www.dataschool.io/roc-curves-and-auc-explained/.
A feature is a characteristic of a message (e.g. "it includes the word 'house'"). In aidr-tagger item is converted to a set of features containing all unigrams (words) and bigrams (consecutive two-word sequences). For instance "the house is red" is converted into { "the", "house", "is", "red", "the house", "house is", "is red" }.
A feature selection method is a way of selecting features that are of interest. In aidr-tagger a feature selection algorithm is ran over the data, keeping the 500 most discriminant features for a given classifier.
A learning scheme is a machine learning algorithm. In aidr-tagger this is a random forest of decision trees.