Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Python Natural Language Processing Library -- (Note: pynlpl is pronounced as: pineapple). In addition to generic algorithms and data structures for NLP, PyNLPl also contains modules for a wide variety of NLP tasks, such as parsers for various file formats common in Dutch Computational Linguistics, such as FoLiA, D-Coi, SoNaR, Tadpole/Frog, Timbl.
Python Other
Branch: master

fix in next()/previous() - skip over non-structural elements when loo…

…king for structural elements (added test too)
latest commit 24e0118c7f
@proycon authored
Failed to load latest commit information.
clients fix
docs regenerated documentation + version bump
examples fix
formats fix in next()/previous() - skip over non-structural elements when loo…
lm lm: add support for trie ngrams storage
mt moved from pynlpl to colibri
tests fix in next()/previous() - skip over non-structural elements when loo…
tools removed
.travis.yml dropping Python 2.6 support, only 2.7 supported
AUTHORS added readhme, authors
README.rst RST fix so PyPI doesn't break on it terminology change-> PhoneticsLayer -> PhonologyLayer AnnotationType.… fixes fix in, don't send unicode strings to twisted Updated PhraseTable verbose fixes fix
requirements.txt added continuous test integration updated tests, ran tests on Python 2 + 3, corrected errors
setup.cfg AbstractExtendedToken annotation should be allowed under all Structur… fix in next()/previous() - skip over non-structural elements when loo… levenshtein improvements added packaging for Python 3 + fixes fix Adapting main modules for Python 3 compatibility. Dropped Python 2.5 …


PyNLPl - Python Natural Language Processing Library

PyNLPl, pronounced as "pineapple", is a Python library for Natural Language Processing. It is a collection of various independent or loosely interdependent modules useful for common, and less common, NLP tasks. PyNLPl can be used for example the computation of n-grams, frequency lists and distributions, language models. There are also more complex data types, such as Priority Queues, and search algorithms, such as Beam Search.

The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

The following modules are available:

  • pynlpl.datatypes - Extra datatypes (priority queues, patterns, tries)
  • pynlpl.evaluation - Evaluation & experiment classes (parameter search, wrapped progressive sampling, class evaluation (precision/recall/f-score/auc), sampler, confusion matrix, multithreaded experiment pool)
  • pynlpl.formats.cgn - Module for parsing CGN (Corpus Gesproken Nederlands) part-of-speech tags
  • pynlpl.formats.folia - Extensive library for reading and manipulating the documents in FoLiA format (Format for Linguistic Annotation).
  • pynlpl.formats.fql - Extensive library for the FoLiA Query Language (FQL), built on top of pynlpl.formats.folia. FQL is currently documented here.
  • pynlpl.formats.cql - Parser for the Corpus Query Language (CQL), as also used by Corpus Workbench and Sketch Engine. Contains a convertor to FQL.
  • pynlpl.formats.giza - Module for reading GIZA++ word alignment data
  • pynlpl.formats.moses - Module for reading Moses phrase-translation tables.
  • pynlpl.formats.sonar - Largely obsolete module for pre-releases of the SoNaR corpus, use pynlpl.formats.folia instead.
  • pynlpl.formats.timbl - Module for reading Timbl output (consider using python-timbl instead though)
  • pynlpl.lm.lm - Module for simple language model and reader for ARPA language model data as well (used by SRILM).
  • - Various search algorithms (Breadth-first, depth-first, beam-search, hill climbing, A star, various variants of each)
  • pynlpl.statistics - Frequency lists, Levenshtein, common statistics and information theory functions
  • pynlpl.textprocessors - Simple tokeniser, n-gram extraction

API Documentation can be found here.

Something went wrong with that request. Please try again.