You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These are two different URLs; one has 2 path components, the other has 3. Most web app routing code would treat them differently - quite important when you're doing routes for search queries for example :)
In terms of the spec, only 'unreserved' characters can be percent-encoded or decoded without changing the semantics of the URL. If one of the reserved chars (gen-delims or sub-delims) has been encoded, you have to assume that it might have been encoded for a reason, to distinguish it from the use of that character as a separator.
So I think 'normalize' should only normalize the encoding of unreserved chars, and leave reserved ones untouched.
There were some docs saying it was designed to be user-friendly rather than strictly spec-compliant, so I guess that might be why, although even then - applications rely on encoding distinctions for separator characters in URI paths, so if the user copy/pastes a URL with these distinctions, it needs to get through to the web app untouched.
It also seems like .normalize is used under the hood in a lot of places in the codebase (eg in URI#==), so I'd prefer the default implementation of it to be spec-compliant, with a "user_friendly_normalize" as an alternative if needs be...
The text was updated successfully, but these errors were encountered:
Bleh, so it turns out that this is actually a very difficult problem when you try to solve it in conjunction with multi-byte characters and unicode NFKC normalization. Might take me awhile to resolve unfortunately.
The problem is that 'normalize' is un-encoding percent-encoded separator characters in the URI path. So eg
These are two different URLs; one has 2 path components, the other has 3. Most web app routing code would treat them differently - quite important when you're doing routes for search queries for example :)
In terms of the spec, only 'unreserved' characters can be percent-encoded or decoded without changing the semantics of the URL. If one of the reserved chars (gen-delims or sub-delims) has been encoded, you have to assume that it might have been encoded for a reason, to distinguish it from the use of that character as a separator.
So I think 'normalize' should only normalize the encoding of unreserved chars, and leave reserved ones untouched.
There were some docs saying it was designed to be user-friendly rather than strictly spec-compliant, so I guess that might be why, although even then - applications rely on encoding distinctions for separator characters in URI paths, so if the user copy/pastes a URL with these distinctions, it needs to get through to the web app untouched.
It also seems like .normalize is used under the hood in a lot of places in the codebase (eg in URI#==), so I'd prefer the default implementation of it to be spec-compliant, with a "user_friendly_normalize" as an alternative if needs be...
The text was updated successfully, but these errors were encountered: