A Python script to scrape the content from web pages and leave the garbage.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.




Content.py is a Python script for scraping the content from a web page. Most pages are covered in ads, navigation, and other crap. Content.py tries to remove the crap and return a clean, well-formatted div of content.


Call getContentFromURL and pass it the url (as a string) of any web page.

It will return a unicode string representing the html of the content.