A highly extensible plattform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
gh-site/img
misc
pepper-doc
pepper-framework
pepper-lib
pepper-newModule
pepper-parentModule
scripts
src
.gitignore
.travis.yml
CHANGELOG.txt
LICENSE
NOTICE
README.md
nb-configuration.xml
pom.xml

README.md

Build Status

About Pepper

If you need to convert corpora from one linguistic format into another, Pepper is your swiss-army knife. When your annotation tool produces a different data format from the one your analysis tool can read, Pepper is there to the rescue.

  • Pepper can convert documents in a variety of linguistic formats, such as: EXMARalDA, Tiger XML, MMAX2, RST, TCF, TreeTagger format, TEI (subset), ANNIS format, PAULA and many many more.
  • Pepper comes with a plug-in mechanism which makes it easy to extend it for further formats and data manipulations.
  • Pepper is module-based, each mapping is done by a separate module. This enables each module to be combined with every other module in one single workflow.
  • Pepper uses the intermediate model Salt, which reduces the number of mappings to convert n into m formats.
  • Pepper modules, such as the MergingModule, allow to merge the data from different annotation tools together and create multilayer corpora.
  • Pepper can be used as an interactive command line tool, as a command to be included in scripts, or as an API to be integrated in other software products.
  • Pepper is written in Java and can be run on all operation systems which are ready to run Java (Windows, Mac, Linux, Unix, ...).
  • Pepper is free and open source software. It is distributed under the Apache License, Version 2.0.

Pepper is your weapon to fight the format monster

Want to know more?