Skip to content
Screenscrapers to collect press releases (for
Branch: master
Clone or download



Scrapers to collect press releases. In Python.

This project is a stand-alone sandbox for easy development of press release scrapers for incorporation into It's designed to require minimal setup...


  • lxml
  • BeautifulSoup (for UnicodeDammit)


Just copy an existing scraper (eg and start hacking about! is a mockup of the scraper interface. Rather than working against a database it just dumps scraped press releases out to stdout. It provides the BaseScraper interface to derive scrapers from. It also installs a caching handler for urllib2 which creates a ".cache" directory to stores downloaded files. This makes repeated test runs during development a lot quicker. Just delete the ".cache" dir to clear the cache.

To try out your scraper, add something like this:

if __name__ == "__main__":
    scraper = Scraper()

Then you can just run it directly, eg:

$ python <your_scraper>
You can’t perform that action at this time.