Skip to content
A Python script to scrape the content from web pages and leave the garbage.
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

==Introduction is a Python script for scraping the content from a web page. Most pages are covered in ads, navigation, and other crap. tries to remove the crap and return a clean, well-formatted div of content.


Call getContentFromURL and pass it the url (as a string) of any web page.

It will return a unicode string representing the html of the content.

Something went wrong with that request. Please try again.