Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (NER), semantic role labeling (SRL) and syntactic parsing (PSG) with skip-gram all in Python and still more features will be added. The website give …
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Update issue templates Aug 8, 2018
.pytest_cache adding .env Sep 29, 2018
.vscode small corrections Aug 6, 2018
docs update doc bug settings Nov 4, 2018
pntl general setting init docs Oct 12, 2018
tests updates Oct 5, 2018
.coveragerc 'new updates' Oct 5, 2017
.editorconfig 'Kick' Sep 25, 2017
.gitattributes updateing about pip package Oct 8, 2017
.gitignore updates Oct 4, 2018
.gitmodules senna submodule for testing Oct 8, 2017
.pre-commit-config.yaml making good format Sep 16, 2018
.travis.yml fix bug for docs and CI Oct 21, 2018
AUTHORS.rst 'Kick' Sep 25, 2017
CHANGELOG.rst rename history => changelog Sep 30, 2018
CODE_OF_CONDUCT.md Create CODE_OF_CONDUCT.md Oct 4, 2017
CONTRIBUTING.rst 'Kick' Sep 25, 2017
LICENSE rename history => changelog Sep 30, 2018
MANIFEST.in patch update in pypi 0.2.2 Oct 6, 2017
Makefile 'Kick' Sep 25, 2017
README.rst rm Wercker as it is unwanted Jan 20, 2019
requirements.txt updating pyyaml for security Vulnerable Jan 20, 2019
requirements_dev.txt Merge pull request #144 from jawahar273/pyup-update-pytest-4.0.2-to-4… Jan 20, 2019
requirements_huey.txt Update huey from 1.10.4 to 1.10.5 Dec 19, 2018
requirements_py3.6.txt update for py3.6 in `black` as readdocs support py3.5 for now Oct 7, 2018
requirements_ujson.txt rm requirement => requirements Oct 19, 2018
requirements_xxhash.txt Update xxhash from 1.2.0 to 1.3.0 Oct 21, 2018
setup.cfg patch update in pypi 0.2.2 Oct 6, 2017
setup.py rm requirement => requirements Oct 19, 2018
tox.ini changes Nov 19, 2017
travis_pypi_setup.py end point class arch update Sep 17, 2018
wercker.yml updates Oct 5, 2018

README.rst

practNLPTools-lite

Warning

CLI is only for example purpose don't use for long running jobs.

Creating practNLPTools in lite mode.[ get the old coding in devbranch or oldest stable code properbranch]

Author python_version

Build Status - on click this built this might take you to build of practNLPTools which is testing ground for this repository so don’t worry.

Practical Natural Language Processing Tools for Humans. practNLPTools is a pythonic library over SENNA and Stanford Dependency Extractor.

name status
PyPi pypi status
travis travis status
Documentation Documentation Status
dependency Updates
blocker Pyupbot Python 3
FOSSA FOSSA Status

QuickStart

Downlarding Stanford Parser JAR

To downlard the stanford-parser from github automatically and placing them inside the install direction.

pntl -I true
# downlards required file from github.

Running Predefine Examples Sentences

To run predefine example in batch mode(which has more than one list of examples).

pntl -SE home/user/senna -B true

Example

Batch mode means listed sentences.

..code:

# Example structure for predefine
# Sentences in the code.

sentences = [
    "This is line 1",
    "This is line 2",

]

To run predefine example in non batch mode.

pntl -SE home/user/senna

Running user given sentence

To run user given example using -S is

pntl -SE home/user/senna -S 'I am gonna make him an offer he can not refuse.'

Functionality

  • Semantic Role Labeling.
  • Syntactic Parsing.
  • Part of Speech Tagging (POS Tagging).
  • Named Entity Recognisation (NER).
  • Dependency Parsing.
  • Shallow Chunking.
  • Skip-gram(in-case).
  • find the senna path if is install in the system.
  • stanford parser and depPaser file into installed direction.

Future work

  • tag2file(new)
  • creating depParser for corresponding os environment
  • custome input format for stanford parser insted of tree format

Features

  1. Fast: SENNA is written is C. So it is Fast.
  2. We use only dependency Extractor Component of Stanford Parser, which takes in Syntactic Parse from SENNA and applies dependency Extraction. So there is no need to load parsing models for Stanford Parser, which takes time.
  3. Easy to use.
  4. Platform Supported - Windows, Linux and Mac
  5. Automatic finds stanford parsing jar if it is present in install path[pntl].

Note

SENNA pipeline has a fixed maximum size of the sentences that it can read. By default it is 1024 token/sentence. If you have larger sentences, changing the MAX_SENTENCE_SIZE value in SENNA_main.c should beconsidered and your system specific binary should be rebuilt. Otherwise this could introduce misalignment errors.

Installation

Requires:

A computer with 500mb memory, Java Runtime Environment (1.7 preferably, works with 1.6 too, but didnt test.) installed and python.

Linux:

run:

sudo python setup.py install

windows:

run this commands as administrator:

python setup.py install

Bench Mark comparsion

By using the time command in ubuntu on running the testsrl.py on this link and along with tools.py on pntl

  pntl NLTK-senna
at fist run    
  real 0m1.674s real 0m2.484s
  user 0m1.564s user 0m1.868s
  sys 0m0.228s sys 0m0.524s
at second run    
  real 0m1.245s real 0m3.359s
  user 0m1.560s user 0m2.016s
  sys 0m0.152s sys 0m1.168s

Note

this bench mark may differt accouding to system’s working and to restult present here is exact same result in my system ububtu 4Gb RAM and i3 process. If I find another good benchmark techinque then I will change to it.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.