html main body extractor
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
test
.gitignore
LICENSE
Makefile
README.rst
dev-requirements.txt
readability.py
requirements.txt
setup.py
unsolved.txt

README.rst

Readability

Another algorithm & implementation of widely known readability conception.

Usage:

import requests
from readability import Readability

html = requests.get('http://blog.hucheng.com/articles/482.html').content
parser = Readability(html.decode('utf8'))

parser.title
parser.article
parser.article.get_text()