NAME

Text::NLP::Stanford::EntityExtract - Talks to a stanford-ner socket server to get named entities back

Quick Start:

Grab the Stanford Named Entity recogniser from http://nlp.stanford.edu/ner/index.shtml.

Run the server, something like as follows:

java -server -mx400m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier classifiers/ner-eng-ie.crf-4-conll-distsim.ser.gz 1234

Wrte a script to extract the named entities from the text, like the following:

#!/usr/bin/env perl -w
use strict;
use Text::NLP::Stanford::EntityExtract;
my $ner = Text::NLP::Stanford::EntityExtract->new;
my $server = $ner->server;
my @txt = ("Some text\n\n", "Treated as \\n\\n delimited paragraphs");
my @tagged_text = $ner->get_entities(@txt);
my $entities = $ner->entities_list($txt[0]); # rather complicated
                                             # @AOA based data
                                             # structure for further
                                             # processing

METHODS

new ( host => '127.0.0.1', port => '1234' debug => 0|1|2);

The debug flag warns the length of the text sent to the server if set to 1 and shows the actual text as well as the length if set to > 1.

server

Gets the socket connection. I think that the ner server will only do one line per connection, so you want a new connection for every line of text.

get_entities(@txt)

Grabs the tagged text for an arbitrary number of paragraphs of text, and returns as the ner tagged text.

_process_line ($line)

processes a single line of text to tagged text

entities_list($tagged_line)

returns a rater arcane data structure of the entities from the text. the position of the word in the line is recorded as is the entity type, so that the line of text can be recovered in full from the data structure.

TODO: This needs some utility subs around it to make it more useful.

list_entities ($self->entities_list($line)

Lists the entities contained within a line based from the data structure provided by entities_list($line).

If passed a list of entities it adds to that list, including counts of the numbes of each entity already found.

The data structure returns looks like this:

$list_data = {
   'LOCATION' => {
       'Outer Mongolia' => 1,
       'Location Location Location' => 1,
       'Chinese Mainland' => 1,
       'Britney' => 1
   },
   'O' => {
       'may have returned from the' => 1,
       'said from his home in' => 1,
       '. Test a three word entity' => 1,
       'faith that she follows . Now she is attempting , for a second time , to persuade' => 1,
       '. There is a question that' => 1,
       'blah blah' => 1,
       'to the controversial' => 1,
       '.' => 1,
       'to follow suit , reports said .' => 1
   },
   'PERSON' => {
       'Bruce Lee' => 1,
       'Gwyneth Paltrow' => 1,
       'Lord Lucan' => 1
   },
   'MISC' => {
       'Jewish-based' => 1
   }
};

AUTHOR

Kieren Diment, <zarquon at cpan.org>

BUGS

Please report any bugs or feature requests to bug-text-nlp-stanford-entityextract at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-NLP-Stanford-EntityExtract. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

The git repository for this code is available from git://github.com/singingfish/text-nlp-stanford-entityextract.git

You can find documentation for this module with the perldoc command.

perldoc Text::NLP::Stanford::EntityExtract

You can also look for information at:

RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Text-NLP-Stanford-EntityExtract
AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Text-NLP-Stanford-EntityExtract
CPAN Ratings

http://cpanratings.perl.org/d/Text-NLP-Stanford-EntityExtract
Search CPAN

http://search.cpan.org/dist/Text-NLP-Stanford-EntityExtract/

ACKNOWLEDGEMENTS

COPYRIGHT & LICENSE

This program is released under the following license: GPL

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
lib/Text/NLP/Stanford		lib/Text/NLP/Stanford
t		t
.gitignore		.gitignore
Changes		Changes
README.pod		README.pod
dist.ini		dist.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib/Text/NLP/Stanford

lib/Text/NLP/Stanford

t

t

.gitignore

.gitignore

Changes

Changes

README.pod

README.pod

dist.ini

dist.ini

Repository files navigation

NAME

Quick Start:

METHODS

new ( host => '127.0.0.1', port => '1234' debug => 0|1|2);

server

get_entities(@txt)

_process_line ($line)

entities_list($tagged_line)

list_entities ($self->entities_list($line)

AUTHOR

BUGS

SUPPORT

ACKNOWLEDGEMENTS

COPYRIGHT & LICENSE

About

Releases

Packages

Languages

dr-kd/text-nlp-stanford-entityextract

Folders and files

Latest commit

History

Repository files navigation

NAME

Quick Start:

METHODS

new ( host => '127.0.0.1', port => '1234' debug => 0|1|2);

server

get_entities(@txt)

_process_line ($line)

entities_list($tagged_line)

list_entities ($self->entities_list($line)

AUTHOR

BUGS

SUPPORT

ACKNOWLEDGEMENTS

COPYRIGHT & LICENSE

About

Resources

Stars

Watchers

Forks

Languages