Python Web Scraping Utilities (PWSU)

Fork me on Github: http://github.com/pariser/pwsu

Written by Andrew Pariser -- http://pariser.me / @pariser

Provides the following tools

HTMLCache

About HTMLCache

The goal of HTMLCache is to locally cache the HTML from a live URL, so that while writing a web scraping tool, you reduce the number of live calls to the server.

Usage

The goal is to make the library as easy to use as possible. To download the html for a given url and cache the result (or read the cached version, if it exists), run:

html = HTMLCache.read_url(url, redownload=False)

Configuration

Use the following methods to set options that affect library behavior

HTMLCache.set_logger( logger )
HTMLCache.set_cache_dir( cache_dir )
HTMLCache.set_slash_character( slash_character )

logger is an override of a default Python logger, which prints DEBUG messages regarding the operation of the HTMLCache.
cache_dir defines the folder into which html files are cached. Its default value is "~/data/html_cache".
slash_character is used to to replace slashes in the output file name. Its default value is "~", so that the url http://github.com/pariser/pwsu gets put to disk at cache_dir + "/http:~~github.com~pariser~pwsu".

A convenience method allows setting all options at once:

HTMLCache.set_opts( opts={} )

where opts is a dictionary with optional keys logger, cache_dir, and slash_character.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
__init__.py		__init__.py
html_cache.py		html_cache.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Web Scraping Utilities (PWSU)

Provides the following tools

About HTMLCache

Usage

Configuration

About

Releases

Packages

Languages

pariser/pwsu

Folders and files

Latest commit

History

Repository files navigation

Python Web Scraping Utilities (PWSU)

Provides the following tools

About HTMLCache

Usage

Configuration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages