html main body extractor
HTML Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
test
.gitignore
LICENSE
Makefile
README.rst
dev-requirements.txt
readability.py
requirements.txt
setup.py
unsolved.txt

README.rst

Readability

Another algorithm & implementation of widely known readability conception.

Usage:

import requests
from readability import Readability

html = requests.get('http://blog.hucheng.com/articles/482.html').content
parser = Readability(html.decode('utf8'))

parser.title
parser.article
parser.article.get_text()