Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Python library of web-related functions
Python Shell

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
w3lib
LICENSE
NEWS
README.rst
setup.py

README.rst

w3lib

Overview

This is a Python library of web-related functions, such as:

  • remove comments, or tags from HTML snippets
  • extract base url from HTML snippets
  • translate entites on HTML strings
  • encoding mulitpart/form-data
  • convert raw HTTP headers to dicts and vice-versa
  • construct HTTP auth header
  • converting HTML pages to unicode
  • RFC-compliant url joining
  • sanitize urls (like browsers do)
  • extract arguments from urls

Modules

The w3lib package consists of four modules:

  • w3lib.url - functions for working with URLs
  • w3lib.html - functions for working with HTML
  • w3lib.http - functions for working with HTTP
  • w3lib.encoding - functions for working with character encoding
  • w3lib.form - functions for working with web forms

Requirements

  • Python 2.5, 2.6 or 2.7

Install

pip install w3lib

Release notes

See the NEWS file.

Documentation

For more information, see the code and tests. The functions are all documented with docstrings.

Tests

nose is the preferred way to run tests. Just run: nosetests from the root directory.

License

The w3lib library is licensed under the BSD license.

History

The code of w3lib was originally part of the Scrapy framework but was later stripped out of Scrapy, with the aim of make it more reusable and to provide a useful library of web functions without depending on Scrapy.

Something went wrong with that request. Please try again.