fast python port of arc90's readability tool, updated to match latest readability.js!
This code is under the Apache License 2.0.

This is a python port of a ruby port of arc90's readability project

In few words,
Given a html document, it pulls out the main body text and cleans it up.
It also can clean up title based on latest readability.js code.

Based on:
 - Latest readability.js ( )
 - Ruby port by starrhorne and iterationlabs
 - Python port by gfxmonk ( , based on BeautifulSoup )
 - Decruft effort to move to lxml ( )
 - "BR to P" fix from readability.js which improves quality for smaller texts.
 - Github users contributions.


    easy_install readability-lxml
    pip install readability-lxml


    from readability.readability import Document
    import urllib
    html = urllib.urlopen(url).read()
    readable_article = Document(html).summary()
    readable_title = Document(html).short_title()

Command-line usage::

    python -m readability.readability -u
