Skip to content
This repository

Normalization of path segments should probably happen before normalization of percent escaping #8

Open
sporkmonger opened this Issue · 5 comments

2 participants

Bob Aman Oleksiy Kovyrin
Bob Aman
Owner
Addressable::URI.parse("/%2E/").normalize.to_str.should == "/%2E/"
Bob Aman
Owner

This issue probably requires a check-in with the IETF URI mailing list before deciding one way or the other.

Oleksiy Kovyrin

I understand that it's been a long time ago, but still wanted to check in to see what's up with this issue? We've hit this bug in a bit different context and are not sure how to deal with it. Any chance this going to be fixed?

Bob Aman
Owner

Could you elaborate on the issue you're hitting? A test case would be awesome.

Oleksiy Kovyrin

Actually, now I'm not sure if our issue is related to this one. Here is our problem:

irb(main):001:0> Addressable::URI.parse(PostRank::URI.unescape("http://foo.com/blah%ef%bc%9f"))
=> #<Addressable::URI:0x5648890 URI:http://foo.com/blah?>
irb(main):002:0> Addressable::URI.parse(PostRank::URI.unescape("http://foo.com/blah%ef%bc%9f")).normalize!
=> #<Addressable::URI:0x564ed08 URI:http://foo.com/blah%3F>

Normalize call screws up a perfectly valid (AFAIU) unicode symbol and replaces it with a latin1 question mark.

Bob Aman
Owner

It's doing the right thing actually. IRIs (unicode-friendly URIs) use unicode normalization form KC to limit phishing. NFKC tends to do perceptual codepoint conversions, like converting '?' to '?'. The solution here is not to normalize the URI if this is causing a problem, or to instead normalize components piecemeal. "http://foo.com/blah%ef%bc%9f" and "http://foo.com/blah%3F" are considered equivalent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.