Port of arc90's readability project to ruby
Fetching latest commit…
Cannot retrieve the latest commit at this time
Ruby Readability Command line: (sudo) gem install ruby-readability Bundler: gem "ruby-readability", :require => 'readability' Example: require 'rubygems' require 'readability' require 'open-uri' source = open('http://lab.arc90.com/experiments/readability/').read puts Readability::Document.new(source).content Options: You may provide additions options to Readability::Document.new, including: :tags - the base whitelist of tags to sanitize, defaults to %w[div p] :remove_empty_nodes - remove <p> tags that have no text content; this will also remove p tags that contain only images :attributes - whitelist of allowed attributes :debug - provide debugging output, defaults false :encoding - if this page is of a known encoding, you can specify it; if left unspecified, the encoding will be guessed (only in Ruby 1.9.x) :html_headers - in Ruby 1.9.x these will be passed to the guess_html_encoding gem to aid with guessing the HTML encoding Readability comes with a command-line tool for experimentation in bin/readability. Usage: readability [options] URL -d, --debug Show debug output -i, --images Keep images and links -h, --help Show this message Potential issues: * If you're on a Mac and are getting segmentation faults, see this discussion https://github.com/tenderlove/nokogiri/issues/404 and consider updating your version of libxml2. Version 2.7.8 of libxml2 with the following worked for me: gem install nokogiri -- --with-xml2-include=/usr/local/Cellar/libxml2/2.7.8/include/libxml2 --with-xml2-lib=/usr/local/Cellar/libxml2/2.7.8/lib --with-xslt-dir=/usr/local/Cellar/libxslt/1.1.26 === This code is under the Apache License 2.0. http://www.apache.org/licenses/LICENSE-2.0 This is a ruby port of arc90's readability project http://lab.arc90.com/experiments/readability/ Given a html document, it pulls out the main body text and cleans it up. Ruby port by starrhorne, libc, and iterationlabs. Original gemification by fizx.