Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make normalize ignore %2B in query strings #99

Merged
merged 2 commits into from
Dec 28, 2012

Conversation

tps12
Copy link
Contributor

@tps12 tps12 commented Dec 28, 2012

In a query string, '+' is reserved as a shorthand for space, so "real"
pluses encoded as %2b should be preserved during normalization:

http://example.com/one%2btwo/calc?q=1%2b2+2%2b3

is normalized as:

http://example.com/one+two/calc?q=1%2B2+2%2B3

Previously this would have been normalized to:

http://example.com/one+two/calc?q=1+2+2+3

making '+' ambiguous.

This fixes #50.

In a query string, '+' is reserved as a shorthand for space, so "real"
pluses encoded as %2b should be preserved during normalization:

  http://example.com/one%2btwo/calc?q=1%2b2+2%2b3

is normalized as:

  http://example.com/one+two/calc?q=1%2B2+2%2B3

Previously this would have been normalized to:

  http://example.com/one+two/calc?q=1+2+2+3

making '+' ambiguous.
@sporkmonger
Copy link
Owner

I'm torn. This is a bug I've wanted fixed for ages and I haven't found a good way to fix it myself. But on the other hand, I really don't like the way you've used upper-case vs. lower-case as semantically meaningful, and I simply can't merge this as-is.

@tps12
Copy link
Contributor Author

tps12 commented Dec 28, 2012

Hi, thanks for the response. Case should not be semantically meaningful:

...?q=1%2b2+2%2B3

is normalized to

...?q=1%2B2+2%2B3

for example. Percent encodings are upcased as part of normalization, which I believe is the current/expected behavior.

@sporkmonger
Copy link
Owner

Oh man, total code-read fail on my part. I read leave_encoded.include?(c) ? sequence.upcase : c as something completely different. OK, in that case, I'm much happier with the commit, with one minor quibble. The unencode method should not perform any kind of normalization. So no upcasing. Just leave case as-is.

@sporkmonger sporkmonger reopened this Dec 28, 2012
@tps12
Copy link
Contributor Author

tps12 commented Dec 28, 2012

Awesome, thanks, I'll fix that.

@sporkmonger
Copy link
Owner

Also I'd like to see tests that include both % and %25 in the same query string as %2B. I like my edge cases. The example "?v=%7E&w=%&x=%25&y=%2B&z=C%CC%A7" should normalize to "?v=~&w=%25&x=%25&y=%2B&z=%C3%87". While "?v=%7E&w=%&x=%25&y=+&z=C%CC%A7" should still normalize to "?v=~&w=%25&x=%25&y=+&z=%C3%87".

@sporkmonger
Copy link
Owner

There should probably be a test for any method that takes a leave_encoded parameter that ensures it's behaving correctly around characters that aren't on the list. Currently you're just testing strings that contain a percent-encoded "+" character, but it needs to verify all three character categories are encoded correctly in a single return value. So use something like "%%25~%7E+%2B" as an input. I'd like to see some unit tests of unencode directly and not just tests of methods that happen to call it downstream, since it's part of the public API (unlike normalize_component, which I don't expect anyone to ever call directly).

@tps12
Copy link
Contributor Author

tps12 commented Dec 28, 2012

Awesome, will add those.

Instead of upcasing leave_encoded characters inside the unencode call,
leave them as they are and pass the list on to encode_component for
upcasing.

This encapsulation keeps unencode free of any normalization logic.

Also added some more test cases around leave_encoded handling.
@tps12
Copy link
Contributor Author

tps12 commented Dec 28, 2012

I moved the upcasing out of unencode and added some test cases.

@sporkmonger
Copy link
Owner

LGTM.

sporkmonger added a commit that referenced this pull request Dec 28, 2012
Make normalize ignore %2B in query strings
@sporkmonger sporkmonger merged commit 72bf6c0 into sporkmonger:master Dec 28, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent normalization of % escaping
3 participants