Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Python Natural Language Processing Library -- (Note: pynlpl is pronounced as: pineapple). In addition to generic algorithms and data structures for NLP, PyNLPl also contains modules for a wide variety of NLP tasks, such as parsers for various file formats common in Dutch Computational Linguistics, such as FoLiA, D-Coi, SoNaR, Tadpole/Frog, Timbl.
Python Other
Branch: master

fix

latest commit 91025f2643
@proycon authored

README.rst

PyNLPl - Python Natural Language Processing Library

https://travis-ci.org/proycon/pynlpl.svg?branch=master

PyNLPl, pronounced as "pineapple", is a Python library for Natural Language Processing. It is a collection of various independent or loosely interdependent modules useful for common, and less common, NLP tasks. PyNLPl can be used for example the computation of n-grams, frequency lists and distributions, language models. There are also more complex data types, such as Priority Queues, and search algorithms, such as Beam Search.

The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

The following modules are available:

  • pynlpl.datatypes - Extra datatypes (priority queues, patterns, tries)
  • pynlpl.evaluation - Evaluation & experiment classes (parameter search, wrapped progressive sampling, class evaluation (precision/recall/f-score/auc), sampler, confusion matrix, multithreaded experiment pool)
  • pynlpl.formats.cgn - Module for parsing CGN (Corpus Gesproken Nederlands) part-of-speech tags
  • pynlpl.formats.folia - Extensive library for reading and manipulating the documents in FoLiA format (Format for Linguistic Annotation).
  • pynlpl.formats.fql - Extensive library for the FoLiA Query Language (FQL), built on top of pynlpl.formats.folia. FQL is currently documented here.
  • pynlpl.formats.cql - Parser for the Corpus Query Language (CQL), as also used by Corpus Workbench and Sketch Engine. Contains a convertor to FQL.
  • pynlpl.formats.giza - Module for reading GIZA++ word alignment data
  • pynlpl.formats.moses - Module for reading Moses phrase-translation tables.
  • pynlpl.formats.sonar - Largely obsolete module for pre-releases of the SoNaR corpus, use pynlpl.formats.folia instead.
  • pynlpl.formats.timbl - Module for reading Timbl output (consider using python-timbl instead though)
  • pynlpl.lm.lm - Module for simple language model and reader for ARPA language model data as well (used by SRILM).
  • pynlpl.search - Various search algorithms (Breadth-first, depth-first, beam-search, hill climbing, A star, various variants of each)
  • pynlpl.statistics - Frequency lists, Levenshtein, common statistics and information theory functions
  • pynlpl.textprocessors - Simple tokeniser, n-gram extraction

API Documentation can be found here.

Something went wrong with that request. Please try again.