A Ruby library for tagging named entities in text based on OpenNLP
Ruby
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
deps
features
lib
models
spec
.gitignore
README.markdown

README.markdown

Named Entity Tagger

A Ruby library for tagging named entities in text based on OpenNPL. As OpenNLP is written in Java, NamedEntityTagger needs to be run on JRuby.

Dependencies

In order to use NamedEntityTagger, you need to download and build a few java libraries. The jar files need to be placed in the deps directory. Furthermore, you need OpenNLP model data that must be placed under models.

Libraries

Model Data

Usage

NamedEntityTagger exposes a minimal API. The main class is EntityTagger and it provides the method #tag(text). This method takes the text that should be tagged and returns a new string where the words that were identified as named entities are highlighted. The encoding of the output is defined by a formatter object. Currently there is only the CSSClassAnnotationFormatter that adds span tags around the named entities. The span tag has a class corresponding to the model that matched the entity.

For example:

require 'lib/entity_tagger'
require 'lib/css_class_annotation_formatter' 
tagger = EntityTagger.new(CSSClassAnnotationFormatter.new)
tagger.tag("Mrs. Smith flew to Berlin")

=> "Mrs. <span class=\"person\">Smith</span> flew to <span class=\"location\">Berlin</span>"