Skip to content

Part of Speech Tagging

josephrocca edited this page Aug 14, 2018 · 11 revisions

a Part-of-Speech refers to whether a term is being used as a Verb, Noun, Adjective, or so on.

It's important to have a sense of how a word is being used, to know which methods & transformations are appropriate for each word. In compromise, a term may have any number of POS tags, like:

nlp("John's family").terms(0).data() /* => 
[{
  text:'John\'s',
  tags:[
    'Person',
    'Noun', 
    'Possessive'
  }
}] */

The brill-based pos-tagger used in nlp_compromise does not use a standard tagset, and avoids three-letter naming-schemes in its API. It maps pretty-closely to the Penn Tagset, and you can find an up-to-date version of the tagset here

Tagging Issues:

the easiest way to debug a tagging problem is to run .verbose() and view the output:

var nlp=require('compromise')
nlp.verbose('tagger'); //show logging for the tagger only
nlp('it is supercalifragilisticexpialidocious')
// 'it'          ->   Pronoun   (lexicon-match)
// 'is'          ->   Copula    (lexicon-match)
//               ->   Verb      (parent-tag)
// 'supercal...' ->   Adjective (suffix-lookup)

if you want to change the tagging behaviour, a quick grep will find the 'suffix-lookup' logic, etc.

Tagging is done in a few-dozen separate steps, and in a somewhat-delicate order. Changing tagging behaviour in one place may have unexpected effects, so be sure to run npm test :+1:

You can’t perform that action at this time.