The Pimlico Processing Toolkit
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
admin
ant
bin
build
docs
examples
lib
models
src
test
.gitignore
COPYING
README.md
build.xml
datatype_to_update.txt
pimlico.properties
requirements.txt

README.md

Logo

The Pimlico Processing Toolkit

The Pimlico Processing Toolkit (PIpelined Modular LInguistic COrpus processing) is a toolkit for building pipelines made up of linguistic processing tasks to run on large datasets (corpora).

It provides a wrappers around many existing, widely used Natural Language Processing (NLP) tools. It makes it easy to write potentially complex pipelines and apply them to large datasets.

Pimlico aims:

  • to provide clear documentation of what has been done;
  • to make it easy to run standard NLP tasks on your data;
  • to make it easy to implement your own non-standard tasks, specific to a pipeline;
  • to support simple distribution of code for reproduction, for example, on other datasets.

Full documentation, including a guide on geting started using Pimlico, is available at http://pimlico.readthedocs.io.

Pimlico is hosted on Github