Tagger Concepts

Carlos Castillo edited this page Jul 1, 2015 · 1 revision
Clone this wiki locally

If you haven't worked with automatic classification of documents, read this short explanation first.

A nominal attribute is an attribute that takes a number of values, such as "color".

A nominal label is a possible value of a nominal attribute. For instance, if the nominal attribute is "color", possible nominal labels are "red", "green", and "blue".

A model is an automatic classifier, associated to a collection and nominal attribute, that has been trained to assign automatically a nominal label to an item, based on a set of human-labelled items.

A model family is a set of models for the same collection and nominal attribute. At every moment, only one model is active within a model family. The active model is usually the model that has been trained with the larger number of human-labelled items, or the one that has the greater http://www.dataschool.io/roc-curves-and-auc-explained/.

A feature is a characteristic of a message (e.g. "it includes the word 'house'"). In aidr-tagger item is converted to a set of features containing all unigrams (words) and bigrams (consecutive two-word sequences). For instance "the house is red" is converted into { "the", "house", "is", "red", "the house", "house is", "is red" }.

A feature selection method is a way of selecting features that are of interest. In aidr-tagger a feature selection algorithm is ran over the data, keeping the 500 most discriminant features for a given classifier.

A learning scheme is a machine learning algorithm. In aidr-tagger this is a random forest of decision trees.