PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…
Python Other
Latest commit 0f49486 Nov 2, 2016 @proycon committed on GitHub Merge pull request #24 from irushchyshyn/enhancements
Enhancements
Permalink
Failed to load latest commit information.
clients Fix comparison between integer and byte string (python3) Mar 8, 2016
docs Remove redundand make.sh sript to build docs Nov 1, 2016
formats Fix tests/formats.py on python3 Nov 1, 2016
lm Handle the undocumented dependency srilmcc Oct 31, 2016
mt moved from pynlpl to colibri Feb 7, 2012
tests Implemented extra check for newlines in text, passes fine (checking i… Oct 25, 2016
tools fix Feb 8, 2016
.gitignore Add .gitignore Nov 1, 2016
.travis.yml only test master branch Feb 8, 2016
AUTHORS added readhme, authors Jul 6, 2010
LICENSE README and description update Nov 20, 2015
MANIFEST.in added manifest Oct 18, 2016
README.rst shuffling badges Aug 6, 2016
__init__.py Implemented extra check for newlines in text, passes fine (checking i… Oct 25, 2016
algorithms.py fixes May 14, 2013
common.py fix in net.py, don't send unicode strings to twisted May 21, 2013
datatypes.py Updated PhraseTable May 14, 2013
evaluation.py lose the numpy dependency, only used in one little used function... o… Sep 22, 2016
fsa.py fixes May 20, 2015
net.py removed debug Sep 7, 2016
requirements.txt added continuous test integration Nov 8, 2014
search.py updated tests, ran tests on Python 2 + 3, corrected errors Mar 29, 2013
setup.cfg AbstractExtendedToken annotation should be allowed under all Structur… Nov 25, 2012
setup.py Implemented extra check for newlines in text, passes fine (checking i… Oct 25, 2016
statistics.py levenshtein improvements Feb 2, 2015
tagger.py added packaging for Python 3 + fixes Mar 29, 2013
textprocessors.py removed classdecoder/classencoder Feb 8, 2016

README.rst

PyNLPl - Python Natural Language Processing Library

https://travis-ci.org/proycon/pynlpl.svg?branch=master Documentation Status http://applejack.science.ru.nl/lamabadge.php/pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation).

The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

The following modules are available:

  • pynlpl.datatypes - Extra datatypes (priority queues, patterns, tries)
  • pynlpl.evaluation - Evaluation & experiment classes (parameter search, wrapped progressive sampling, class evaluation (precision/recall/f-score/auc), sampler, confusion matrix, multithreaded experiment pool)
  • pynlpl.formats.cgn - Module for parsing CGN (Corpus Gesproken Nederlands) part-of-speech tags
  • pynlpl.formats.folia - Extensive library for reading and manipulating the documents in FoLiA format (Format for Linguistic Annotation).
  • pynlpl.formats.fql - Extensive library for the FoLiA Query Language (FQL), built on top of pynlpl.formats.folia. FQL is currently documented here.
  • pynlpl.formats.cql - Parser for the Corpus Query Language (CQL), as also used by Corpus Workbench and Sketch Engine. Contains a convertor to FQL.
  • pynlpl.formats.giza - Module for reading GIZA++ word alignment data
  • pynlpl.formats.moses - Module for reading Moses phrase-translation tables.
  • pynlpl.formats.sonar - Largely obsolete module for pre-releases of the SoNaR corpus, use pynlpl.formats.folia instead.
  • pynlpl.formats.timbl - Module for reading Timbl output (consider using python-timbl instead though)
  • pynlpl.lm.lm - Module for simple language model and reader for ARPA language model data as well (used by SRILM).
  • pynlpl.search - Various search algorithms (Breadth-first, depth-first, beam-search, hill climbing, A star, various variants of each)
  • pynlpl.statistics - Frequency lists, Levenshtein, common statistics and information theory functions
  • pynlpl.textprocessors - Simple tokeniser, n-gram extraction

API Documentation can be found here.