PyNLPl - Python Natural Language Processing Library
PyNLPl, pronounced as "pineapple", is a Python library for Natural Language Processing. It is a collection of various independent or loosely interdependent modules useful for common, and less common, NLP tasks. PyNLPl can be used for example the computation of n-grams, frequency lists and distributions, language models. There are also more complex data types, such as Priority Queues, and search algorithms, such as Beam Search.
The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.
The following modules are available:
pynlpl.datatypes- Extra datatypes (priority queues, patterns, tries)
pynlpl.evaluation- Evaluation & experiment classes (parameter search, wrapped progressive sampling, class evaluation (precision/recall/f-score/auc), sampler, confusion matrix, multithreaded experiment pool)
pynlpl.formats.cgn- Module for parsing CGN (Corpus Gesproken Nederlands) part-of-speech tags
pynlpl.formats.folia- Extensive library for reading and manipulating the documents in FoLiA format (Format for Linguistic Annotation).
pynlpl.formats.fql- Extensive library for the FoLiA Query Language (FQL), built on top of
pynlpl.formats.folia. FQL is currently documented here.
pynlpl.formats.cql- Parser for the Corpus Query Language (CQL), as also used by Corpus Workbench and Sketch Engine. Contains a convertor to FQL.
pynlpl.formats.giza- Module for reading GIZA++ word alignment data
pynlpl.formats.moses- Module for reading Moses phrase-translation tables.
pynlpl.formats.sonar- Largely obsolete module for pre-releases of the SoNaR corpus, use
pynlpl.formats.timbl- Module for reading Timbl output (consider using python-timbl instead though)
pynlpl.lm.lm- Module for simple language model and reader for ARPA language model data as well (used by SRILM).
pynlpl.search- Various search algorithms (Breadth-first, depth-first, beam-search, hill climbing, A star, various variants of each)
pynlpl.statistics- Frequency lists, Levenshtein, common statistics and information theory functions
pynlpl.textprocessors- Simple tokeniser, n-gram extraction
API Documentation can be found here.