Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
tree: 151cfadc16
Fetching contributors…

Cannot retrieve contributors at this time

56 lines (30 sloc) 2.098 kb
# Project Specification #
Proof should provide a framework for readability and possibly other forms of text analysis.
## Features ##
### Inputs ###
Proof currently accepts only plain-text files. It might be useful to provide features to extract plain-text from documents that use common markup formats.
### Outputs ###
Proof currently outputs results in text formats with ERb templates. Ideally it should either be able to serialize results as YAML and JSON, or at least suggest defined mechanisms to do so.
It should probably also have a pass/fail mode where the user specifies an acceptable level and the utility just returns pass or fail according to whether the source meets that level of readability:
$ proof --pass-fail-test=fog --pass-fail-level=12 my-book/
FAIL
This is probably not a good idea in most cases, but could be useful in situations where documents are required to meet a particular standard for publication.
## Architecture ##
Proof currently uses the Lingua library for the following services:
* Rendering text into the component sentences (Lingua::EN::Sentence)
* Rendering text into the component syllables (Lingua::EN::Syllable)
* Rendering text into the component paragraphs (Lingua::EN::Paragraph)
* Calculating readability results using sums of outputs of the previous three (Lingua::EN::Readability).
## Related Projects and Resources ##
This Stack Overflow discussion provides code for implementing syllable counts in Ruby:
http://stackoverflow.com/questions/1271918/ruby-count-syllables
The punkt-segmenter offers a more sophisticated system for deriving sentences:
http://github.com/lfcipriani/punkt-segmenter
This is a port of some NLTK functionality. NLTK is a large Open Source project for Natural Language parsing, written in Python:
http://www.nltk.org/
The Ruby Linguistics library has some sentence handling facilities, but is not directly applicable:
http://github.com/ged/linguistics
The OTS library provides text summarization:
http://libots.sourceforge.net/
It is used by the TLDR Web application:
http://tldr.r10.railsrumble.com/
Jump to Line
Something went wrong with that request. Please try again.