Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
39 lines (25 sloc) 1.28 KB

rdbl.erl - Erlang readability library

This is Erlang library to extract reasonable content and remove junk from html pages. Inspired by readability.js from arc90.

Source is primary hosted at My Readabilizer service is using this library to extract content from web pages.

Licensing and author

This library is distributed under the GNU General Public License version 3 and is also available under alternative licenses negotiated directly with rdbl author Ivan Koshkin

The GPL (version 3) is included in this source tree in the file COPYING.


cd src/ && make


1> rdbl:simplify_url("") -> simplified page text as string()

2> rdbl:simplify_url("", "out.html") -> ok

3> rdbl:simplify_file("input.html", "out.html") -> ok

4> rdbl:simplify_page(HtmlPageText) -> PageTextSimplified

See other examples in rdbl.erl.


Library uses mochiweb html library to parse HTML-content (included). Only following files from mochiweb are needed: mochinum.erl, mochiutf8.erl, mochiweb_charref.erl, mochiweb_html.erl.