PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…
Python Other
Permalink
Failed to load latest commit information.
clients Fix comparison between integer and byte string (python3) Mar 8, 2016
docs documentation update for folia1.3 span elements Aug 31, 2016
formats better FQL support for querying by confidence (related to proycon/fla… Sep 27, 2016
lm lm: add support for trie ngrams storage Jun 1, 2014
mt moved from pynlpl to colibri Feb 7, 2012
tests handle escaped newlines (\n) in FQL (relates to issue proycon/flat#51) Sep 27, 2016
tools fix Feb 8, 2016
.travis.yml only test master branch Feb 8, 2016
AUTHORS added readhme, authors Jul 6, 2010
LICENSE README and description update Nov 20, 2015
README.rst shuffling badges Aug 6, 2016
__init__.py version bump Sep 7, 2016
algorithms.py fixes May 14, 2013
common.py fix in net.py, don't send unicode strings to twisted May 21, 2013
datatypes.py Updated PhraseTable May 14, 2013
evaluation.py lose the numpy dependency, only used in one little used function... o… Sep 22, 2016
fsa.py fixes May 20, 2015
net.py removed debug Sep 7, 2016
requirements.txt added continuous test integration Nov 8, 2014
search.py updated tests, ran tests on Python 2 + 3, corrected errors Mar 29, 2013
setup.cfg AbstractExtendedToken annotation should be allowed under all Structur… Nov 25, 2012
setup.py lose the numpy dependency, only used in one little used function... o… Sep 22, 2016
statistics.py levenshtein improvements Feb 2, 2015
tagger.py added packaging for Python 3 + fixes Mar 29, 2013
textprocessors.py removed classdecoder/classencoder Feb 8, 2016

README.rst

PyNLPl - Python Natural Language Processing Library

https://travis-ci.org/proycon/pynlpl.svg?branch=master Documentation Status http://applejack.science.ru.nl/lamabadge.php/pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation).

The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

The following modules are available:

  • pynlpl.datatypes - Extra datatypes (priority queues, patterns, tries)
  • pynlpl.evaluation - Evaluation & experiment classes (parameter search, wrapped progressive sampling, class evaluation (precision/recall/f-score/auc), sampler, confusion matrix, multithreaded experiment pool)
  • pynlpl.formats.cgn - Module for parsing CGN (Corpus Gesproken Nederlands) part-of-speech tags
  • pynlpl.formats.folia - Extensive library for reading and manipulating the documents in FoLiA format (Format for Linguistic Annotation).
  • pynlpl.formats.fql - Extensive library for the FoLiA Query Language (FQL), built on top of pynlpl.formats.folia. FQL is currently documented here.
  • pynlpl.formats.cql - Parser for the Corpus Query Language (CQL), as also used by Corpus Workbench and Sketch Engine. Contains a convertor to FQL.
  • pynlpl.formats.giza - Module for reading GIZA++ word alignment data
  • pynlpl.formats.moses - Module for reading Moses phrase-translation tables.
  • pynlpl.formats.sonar - Largely obsolete module for pre-releases of the SoNaR corpus, use pynlpl.formats.folia instead.
  • pynlpl.formats.timbl - Module for reading Timbl output (consider using python-timbl instead though)
  • pynlpl.lm.lm - Module for simple language model and reader for ARPA language model data as well (used by SRILM).
  • pynlpl.search - Various search algorithms (Breadth-first, depth-first, beam-search, hill climbing, A star, various variants of each)
  • pynlpl.statistics - Frequency lists, Levenshtein, common statistics and information theory functions
  • pynlpl.textprocessors - Simple tokeniser, n-gram extraction

API Documentation can be found here.