Named Entity Tagger
In order to use NamedEntityTagger, you need to download and build a few java libraries. The jar files need to be placed in the deps directory. Furthermore, you need OpenNLP model data that must be placed under models.
NamedEntityTagger exposes a minimal API. The main class is
EntityTagger and it provides the
#tag(text). This method takes the text that should be tagged and returns a new string
where the words that were identified as named entities are highlighted. The encoding of the
output is defined by a formatter object. Currently there is only the
CSSClassAnnotationFormatter that adds span tags around the named entities. The span tag has a
class corresponding to the model that matched the entity.
require 'lib/entity_tagger' require 'lib/css_class_annotation_formatter' tagger = EntityTagger.new(CSSClassAnnotationFormatter.new) tagger.tag("Mrs. Smith flew to Berlin") => "Mrs. <span class=\"person\">Smith</span> flew to <span class=\"location\">Berlin</span>"