Some functions to parse and normalize URLs.
Latest commit 76bf599 Nov 29, 2014 @rbaier fixed compatibility to py26
Failed to load latest commit information.
urltools fixed compatibility to py26 Nov 29, 2014
.gitignore Initial commit Mar 16, 2013
LICENSE added classifiers; moved license Sep 16, 2014 prepare for wheel Sep 18, 2014 Update Nov 28, 2014
setup.cfg prepare for wheel Sep 18, 2014 prepare for wheel Sep 18, 2014
tox.ini fixed compatibility to py26 Nov 29, 2014

urltools version Supported Python versions format downloads license

Some functions to parse and normalize URLs.

The main focus of this library is to make it possible to work on all segments of an URL. Thus a core feature (which is not provided by stdlib) is to split a domain name correctly by using the Public Suffix List (see below).



>>> urltools.normalize("Http://")

Rules that are applied to normalize a URL:

  • tolower scheme
  • tolower host (also works with IDNs)
  • remove default port
  • remove ':' without port
  • remove DNS root label
  • unquote path, query, fragment
  • collapse path (remove '//', '/./', '/../')
  • sort query params and remove params without value

normalize uses the functions for splitting and normalization which are descriped below. The hostname is not tolowered by normalize_host. It is already done in the split_host step before to make splitting of malformed netlocs easier.


The result of parse and extract is a URL named tuple that contains scheme, username, password, subdomain, domain, tld, port, path, query, fragment and the original url itself.

>>> urltools.parse("")
URL(scheme='http', username='', password='', subdomain='', domain='example',
tld='', port='', path='/foo/bar', query='x=1', fragment='abc',

If the scheme is missing parse interprets the URL as relative.

>>> urltools.parse("")
URL(scheme='', username='', password='', subdomain='', domain='', tld='',
port='', path='', query='', fragment='',


extract does not care about relative URLs and always tries to extract as much information as possible.

>>> urltools.extract("")
URL(scheme='', username='', password='', subdomain='www', domain='example',
tld='', port='', path='/abc', query='', fragment='',

Additional functions

Besides the already described main functions urltools has some more functions to manipulate segments of a URL or create new URLs.

  • construct a new URL from parts

    >>> construct(URL('http', '', '', '', 'example', 'com', '/abc', 'x=1',
    ... 'foo', None))
  • compare two urls to check if they are the same

    >>> compare("",
    ... "")
  • encode (IDNA, see RFC 3490)

    >>> urltools.encode("http://mü")
  • normalize_host decodes IDNA encoded segments of a DNS name

    >>> normalize_host('')
    >>> normalize_host('xn--e1afmkfd.xn--p1ai')
  • normalize_path

    >>> normalize_path("/a/b/../../c")
  • normalize_query

    >>> normalize_query("x=1&y=&z=3")
  • normalize_fragment unquotes fragments except for the characters +# and space

  • unquote a string. Optional it's possible to specify a list of characters which are not unquoted

      >>> unquote('foo%23bar')
      >>> unquote('foo%23bar', ['#'])
  • split is basically the same as urlparse.urlparse in Python2.7 or urllib.parse.urlparse in Python3.4. In Python2.7 it handles some malformed URLs better than urlparse. Differences to urlparse in Python3.4 were not analyzed.

      >>> split("")
      SplitResult(scheme='http', netloc='', path='/abc',
      query='x=1&y=2', fragment='foo')
  • split_netloc splits a network location (netloc) to username, password, host and port

      >>> split_netloc("")
      ('foo', 'bar', '', '8080')
  • split_host uses the Public Suffix List to split a domain name correctly

    >>> split_host("")
    ('www', 'example', '')

Public Suffix List

urltools uses the Public Suffix List (PSL) to split domain names correctly. E.g. the TLD of would be and not .uk. It is not possible to decide "how big" the TLD is without a lookup in this list.

A local copy of the PSL is recommended. Otherwise it is downloaded with each import of urltools. The path of the local copy has to be defined in the env variable PUBLIC_SUFFIX_LIST:

export PUBLIC_SUFFIX_LIST=/path/to/effective_tld_names.dat

For more information about how PSL works see


You can install urltools from the Python Package Index (PyPI):

pip install urltools

... or get the latest version directly from GitHub:

pip install -e git://

The second option is not recommended because some features might be in an experimental state.

There is (or should be) a git tag for each version that was released on PyPI.


tox and pytest are used for testing. Simply install tox and run it:

pip install tox