This is a Python library of web-related functions, such as:
- remove comments, or tags from HTML snippets
- extract base url from HTML snippets
- translate entites on HTML strings
- convert raw HTTP headers to dicts and vice-versa
- construct HTTP auth header
- converting HTML pages to unicode
- RFC-compliant url joining
- sanitize urls (like browsers do)
- extract arguments from urls
The w3lib package consists of four modules:
- w3lib.url - functions for working with URLs
- w3lib.html - functions for working with HTML
- w3lib.http - functions for working with HTTP
- w3lib.encoding - functions for working with character encoding
See the NEWS file.
For more information, see the code and tests. The functions are all documented with docstrings.
nose is the preferred way to run tests. Just run: nosetests from the root directory to execute tests using the default Python interpreter.
tox could be used to run tests for all supported Python versions. Install it (using 'pip install tox') and then run tox from the root directory - tests will be executed for all available Python interpreters.
The code of w3lib was originally part of the Scrapy framework but was later stripped out of Scrapy, with the aim of make it more reusable and to provide a useful library of web functions without depending on Scrapy.