Skip to content

markgw/pimlico

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

The Pimlico Processing Toolkit

The Pimlico Processing Toolkit (PIpelined Modular LInguistic COrpus processing) is a toolkit for building pipelines made up of linguistic processing tasks to run on large datasets (corpora).

It provides a wrappers around many existing, widely used Natural Language Processing (NLP) tools. It makes it easy to write potentially complex pipelines and apply them to large datasets.

Pimlico aims:

  • to provide clear documentation of what has been done;
  • to make it easy to run standard NLP tasks on your data;
  • to make it easy to implement your own non-standard tasks, specific to a pipeline;
  • to support simple distribution of code for reproduction, for example, on other datasets.

Full documentation, including a guide on geting started using Pimlico, is available at http://pimlico.readthedocs.io.

Pimlico is hosted on Github

Packages

No packages published

Languages