Skip to content
This repository

Natural language processing framework for Ruby.

branch: master
README.md

Build Status Code Climate

Treat Logo

New in v2.0.5: OpenNLP integration and Yomu support

Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition. Learn more by taking a quick tour or by reading the manual.

Features

  • Text extractors for PDF, HTML, XML, Word, AbiWord, OpenOffice and image formats (Ocropus).
  • Text chunkers, sentence segmenters, tokenizers, and parsers (Stanford & Enju).
  • Lexical resources (WordNet interface, several POS taggers for English).
  • Language, date/time, topic words (LDA) and keyword (TF*IDF) extraction.
  • Word inflectors, including stemmers, conjugators, declensors, and number inflection.
  • Serialization of annotated entities to YAML, XML or to MongoDB.
  • Visualization in ASCII tree, directed graph (DOT) and tag-bracketed (standoff) formats.
  • Linguistic resources, including language detection and tag alignments for several treebanks.
  • Machine learning (decision tree, multilayer perceptron, LIBLINEAR, LIBSVM).
  • Text retrieval with indexation and full-text search (Ferret).

Contributing

I am actively seeking developers that can help maintain and expand this project. You can find a list of ideas for contributing to the project here.

Authors

Lead developper: @louismullie [Twitter]

Contributors:

  • @bdigital
  • @automatedtendencies
  • @LeFnord
  • @darkphantum
  • @whistlerbrk
  • @smileart
  • @erol

License

This software is released under the GPL License and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.

Something went wrong with that request. Please try again.