Skip to content
stricaud edited this page Jun 3, 2013 · 7 revisions

FAUP: Finally An URL Parser!

This wiki is not for installation instructions, getting started, API etc. If this is what you are looking for, please have a look at the README.md, accessible from the source tree.

The discussion we want to have in this wiki are simply and things to know, questions and doubts whenever we need to normalize URLs.

Normalization of URLs (within faup? or somewhere else) (URI and IRI)

This is not an issue as is but more notes about normalization of URLs.

Following our discussion, here is some notes regarding URL, URI and IRI normalization:

https://github.com/mitsuhiko/werkzeug/blob/master/werkzeug/urls.py (URI->IRI conversion/normalization implementation)

It seems that not everyone has the same definition of URL normalization:

https://github.com/redguardtoo/url-normalization-in-c/blob/master/src/cleanurl.c

Especially regarding "default page/index".

The best definition I found was on a Perl module (URI):

Returns a normalized version of the URI. The rules for normalization are scheme-dependent. They usually involve lowercasing the scheme and Internet host name components, removing the explicit port specification if it matches the default port, uppercasing all escape sequences, and unescaping octets that can be better represented as plain characters.