PHP wrapper for the Stanford Natural Language Processing library. Supports POSTagger and CRFClassifier.
PHP
Latest commit 3040d8c Feb 6, 2017 @patrickschur Removed debug code
Permalink
Failed to load latest commit information.
src/StanfordTagger Removed debug code Feb 6, 2017
.editorconfig Initial commit Feb 6, 2017
.gitattributes Initial commit Feb 6, 2017
.gitignore Initial commit Feb 6, 2017
CHANGELOG.md Removed debug code Feb 6, 2017
CONTRIBUTING.md Initial commit Feb 6, 2017
LICENSE.md Initial commit Feb 6, 2017
README.md Fixed a typo Feb 6, 2017
composer.json Initial commit Feb 6, 2017

README.md

stanford-nlp-tagger

Version Total Downloads Maintenance Minimum PHP Version License

A PHP wrapper for the Stanford Natural Language Processing library. Supports POSTagger and CRFClassifier. Loads automatically the right packages and detects the language of the given text.

Requirements

  • You have to install Java in version 1.8+ or higher.
  • Download the right packages and extract them into the directory. (The script loads automatically the right packages, no matter where they are.)

Installation with Composer

$ composer require patrickschur/stanford-nlp-tagger

Example

  • Download the required packages for the POSTagger here for English only or here for Arabic, Chinese, French, Spanish, and German.
  • Extract the (.zip) package into your directory. (Please do not rename the packages, only if you want to add this packages manually.)
$pos = new \StanfordTagger\POSTagger();

$pos->tag('My dog also likes eating sausage.');

Results in

My_PRP$ dog_NN also_RB likes_VBZ eating_JJ sausage_NN ._.

setOutputFormat()

There are three ways of output formats (xml, slashTags and tsv)

$pos = new \StanfordTagger\POSTagger();

$pos->setOutputFormat(StanfordTagger::OUTPUT_FORMAT_XML);

$pos->tag('My dog also likes eating sausage.');

Result as XML:

<?xml version="1.0" encoding="UTF-8"?>
<pos>
<sentence id="0">
  <word wid="0" pos="PRP$">My</word>
  <word wid="1" pos="NN">dog</word>
  <word wid="2" pos="RB">also</word>
  <word wid="3" pos="VBZ">likes</word>
  <word wid="4" pos="JJ">eating</word>
  <word wid="5" pos="NN">sausage</word>
  <word wid="6" pos=".">.</word>
</sentence>
</pos>

or use

$pos->setOutputFormat(StanfordTagger::OUTPUT_FORMAT_TSV);

for

My  PRP$
dog NN
also    RB
likes   VBZ
eating  JJ
sausage NN
.   .

setModel(), setJarArchive() and setClassfier()

All packages are loaded automatically but if you want to change that you can set them manually.

$pos = new \StanfordTagger\POSTagger();

$pos->setModel(__DIR__ . '/stanford-postagger-full-2016-10-31/models/english-bidirectional-distsim.tagger');

$pos->setJarArchive(__DIR__ . '/stanford-postagger-full-2016-10-31/stanford-postagger.jar');

CRFClassifier

  • For English only, download the required packages for the CRFClassifier here.
  • You have to download the language models separately:
  • Extract the (.jar) files if you downloaded a language model and add them into your directory.

Example

$ner = new \StanfordTagger\CRFClassifier();

$ner->tag('Albert Einstein was a theoretical physicist born in Germany.');
Albert/PERSON Einstein/PERSON was/O theoretical/O physicist/O born/O in/O Germany/LOCATION./O 

Contribute

Feel free to contribute to this repository. Any help is welcome.