Skip to content
This repository has been archived by the owner on May 24, 2018. It is now read-only.

phensley/python-readable

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

readable - A port of the Arc90 readability Javascript rules to Python:

http://lab.arc90.com/experiments/readability

License: Apache 2.0.

Depends on the Python lxml module.

Currently only the article body extraction is implemented, and there is a slight delta between the output produced by the JS version due to the differences in how DOM is represented and manipulated between the browser DOM and Python's lxml. For instance, lxml represents text as attributes on nodes rather than nodes themselves. These differences complicated the translation of the algorithm a bit.

About

Python port of Arc90's Readability content extraction rules

Resources

License

Stars

Watchers

Forks

Packages