Make normalize ignore %2B in query strings #99

Merged
merged 2 commits into from Dec 28, 2012

Conversation

Projects
None yet
3 participants
@tps12
Contributor

tps12 commented Dec 28, 2012

In a query string, '+' is reserved as a shorthand for space, so "real"
pluses encoded as %2b should be preserved during normalization:

http://example.com/one%2btwo/calc?q=1%2b2+2%2b3

is normalized as:

http://example.com/one+two/calc?q=1%2B2+2%2B3

Previously this would have been normalized to:

http://example.com/one+two/calc?q=1+2+2+3

making '+' ambiguous.

This fixes #50.

Make normalize ignore %2B in query strings
In a query string, '+' is reserved as a shorthand for space, so "real"
pluses encoded as %2b should be preserved during normalization:

  http://example.com/one%2btwo/calc?q=1%2b2+2%2b3

is normalized as:

  http://example.com/one+two/calc?q=1%2B2+2%2B3

Previously this would have been normalized to:

  http://example.com/one+two/calc?q=1+2+2+3

making '+' ambiguous.
@sporkmonger

This comment has been minimized.

Show comment Hide comment
@sporkmonger

sporkmonger Dec 28, 2012

Owner

I'm torn. This is a bug I've wanted fixed for ages and I haven't found a good way to fix it myself. But on the other hand, I really don't like the way you've used upper-case vs. lower-case as semantically meaningful, and I simply can't merge this as-is.

Owner

sporkmonger commented Dec 28, 2012

I'm torn. This is a bug I've wanted fixed for ages and I haven't found a good way to fix it myself. But on the other hand, I really don't like the way you've used upper-case vs. lower-case as semantically meaningful, and I simply can't merge this as-is.

@tps12

This comment has been minimized.

Show comment Hide comment
@tps12

tps12 Dec 28, 2012

Contributor

Hi, thanks for the response. Case should not be semantically meaningful:

...?q=1%2b2+2%2B3

is normalized to

...?q=1%2B2+2%2B3

for example. Percent encodings are upcased as part of normalization, which I believe is the current/expected behavior.

Contributor

tps12 commented Dec 28, 2012

Hi, thanks for the response. Case should not be semantically meaningful:

...?q=1%2b2+2%2B3

is normalized to

...?q=1%2B2+2%2B3

for example. Percent encodings are upcased as part of normalization, which I believe is the current/expected behavior.

@sporkmonger

This comment has been minimized.

Show comment Hide comment
@sporkmonger

sporkmonger Dec 28, 2012

Owner

Oh man, total code-read fail on my part. I read leave_encoded.include?(c) ? sequence.upcase : c as something completely different. OK, in that case, I'm much happier with the commit, with one minor quibble. The unencode method should not perform any kind of normalization. So no upcasing. Just leave case as-is.

Owner

sporkmonger commented Dec 28, 2012

Oh man, total code-read fail on my part. I read leave_encoded.include?(c) ? sequence.upcase : c as something completely different. OK, in that case, I'm much happier with the commit, with one minor quibble. The unencode method should not perform any kind of normalization. So no upcasing. Just leave case as-is.

@sporkmonger sporkmonger reopened this Dec 28, 2012

@tps12

This comment has been minimized.

Show comment Hide comment
@tps12

tps12 Dec 28, 2012

Contributor

Awesome, thanks, I'll fix that.

Contributor

tps12 commented Dec 28, 2012

Awesome, thanks, I'll fix that.

@sporkmonger

This comment has been minimized.

Show comment Hide comment
@sporkmonger

sporkmonger Dec 28, 2012

Owner

Also I'd like to see tests that include both % and %25 in the same query string as %2B. I like my edge cases. The example "?v=%7E&w=%&x=%25&y=%2B&z=C%CC%A7" should normalize to "?v=~&w=%25&x=%25&y=%2B&z=%C3%87". While "?v=%7E&w=%&x=%25&y=+&z=C%CC%A7" should still normalize to "?v=~&w=%25&x=%25&y=+&z=%C3%87".

Owner

sporkmonger commented Dec 28, 2012

Also I'd like to see tests that include both % and %25 in the same query string as %2B. I like my edge cases. The example "?v=%7E&w=%&x=%25&y=%2B&z=C%CC%A7" should normalize to "?v=~&w=%25&x=%25&y=%2B&z=%C3%87". While "?v=%7E&w=%&x=%25&y=+&z=C%CC%A7" should still normalize to "?v=~&w=%25&x=%25&y=+&z=%C3%87".

@sporkmonger

This comment has been minimized.

Show comment Hide comment
@sporkmonger

sporkmonger Dec 28, 2012

Owner

There should probably be a test for any method that takes a leave_encoded parameter that ensures it's behaving correctly around characters that aren't on the list. Currently you're just testing strings that contain a percent-encoded "+" character, but it needs to verify all three character categories are encoded correctly in a single return value. So use something like "%%25~%7E+%2B" as an input. I'd like to see some unit tests of unencode directly and not just tests of methods that happen to call it downstream, since it's part of the public API (unlike normalize_component, which I don't expect anyone to ever call directly).

Owner

sporkmonger commented Dec 28, 2012

There should probably be a test for any method that takes a leave_encoded parameter that ensures it's behaving correctly around characters that aren't on the list. Currently you're just testing strings that contain a percent-encoded "+" character, but it needs to verify all three character categories are encoded correctly in a single return value. So use something like "%%25~%7E+%2B" as an input. I'd like to see some unit tests of unencode directly and not just tests of methods that happen to call it downstream, since it's part of the public API (unlike normalize_component, which I don't expect anyone to ever call directly).

@tps12

This comment has been minimized.

Show comment Hide comment
@tps12

tps12 Dec 28, 2012

Contributor

Awesome, will add those.

Contributor

tps12 commented Dec 28, 2012

Awesome, will add those.

Move normalization logic out of URI::unencode
Instead of upcasing leave_encoded characters inside the unencode call,
leave them as they are and pass the list on to encode_component for
upcasing.

This encapsulation keeps unencode free of any normalization logic.

Also added some more test cases around leave_encoded handling.
@tps12

This comment has been minimized.

Show comment Hide comment
@tps12

tps12 Dec 28, 2012

Contributor

I moved the upcasing out of unencode and added some test cases.

Contributor

tps12 commented Dec 28, 2012

I moved the upcasing out of unencode and added some test cases.

@sporkmonger

This comment has been minimized.

Show comment Hide comment
@sporkmonger

sporkmonger Dec 28, 2012

Owner

LGTM.

Owner

sporkmonger commented Dec 28, 2012

LGTM.

sporkmonger added a commit that referenced this pull request Dec 28, 2012

Merge pull request #99 from tps12/leave-%2b-in-query
Make normalize ignore %2B in query strings

@sporkmonger sporkmonger merged commit 72bf6c0 into sporkmonger:master Dec 28, 2012

1 check passed

default The Travis build passed
Details
@bblimke

This comment has been minimized.

Show comment Hide comment
@bblimke

bblimke Mar 4, 2013

Although this commit seems only to be refactoring, it changed the behaviour of encode_component method. encode_component before was returning a new object without modifying passed component. Now with two gsub replaced with gsub! it modifies passed object.

bblimke commented on 37131be Mar 4, 2013

Although this commit seems only to be refactoring, it changed the behaviour of encode_component method. encode_component before was returning a new object without modifying passed component. Now with two gsub replaced with gsub! it modifies passed object.

This comment has been minimized.

Show comment Hide comment
@bblimke

bblimke Mar 4, 2013

It's not a problem for Ruby >= 1.9 where string is cloned if it responds to :force_encoding but this caused me problem in Ruby < 1.9

It's not a problem for Ruby >= 1.9 where string is cloned if it responds to :force_encoding but this caused me problem in Ruby < 1.9

This comment has been minimized.

Show comment Hide comment
@sporkmonger

sporkmonger Mar 4, 2013

Owner

I'm in the process of removing all usage of gsub! from the library in favor of gsub. I haven't investigated this bug yet, but most of the tests you've submitted look right. Please split the tests out into their own commit and rewrite the fix somehow without using gsub!.

Owner

sporkmonger replied Mar 4, 2013

I'm in the process of removing all usage of gsub! from the library in favor of gsub. I haven't investigated this bug yet, but most of the tests you've submitted look right. Please split the tests out into their own commit and rewrite the fix somehow without using gsub!.

This comment has been minimized.

Show comment Hide comment
@sporkmonger

sporkmonger Mar 4, 2013

Owner

Oh, nevermind, misread, thought this was a pull request. Hmm, yeah. I'll take a closer look and attempt to revise.

Owner

sporkmonger replied Mar 4, 2013

Oh, nevermind, misread, thought this was a pull request. Hmm, yeah. I'll take a closer look and attempt to revise.

This comment has been minimized.

Show comment Hide comment
@tps12

tps12 Mar 5, 2013

Contributor

Agh, good catch, sorry about that. You're right, no reason to be messing with the input string there at all.

Contributor

tps12 replied Mar 5, 2013

Agh, good catch, sorry about that. You're right, no reason to be messing with the input string there at all.

This comment has been minimized.

Show comment Hide comment
@sporkmonger

sporkmonger Mar 31, 2013

Owner

This should be resolved by 64f82f2.

Owner

sporkmonger replied Mar 31, 2013

This should be resolved by 64f82f2.

@sporkmonger

This comment has been minimized.

Show comment Hide comment
@sporkmonger

sporkmonger Mar 4, 2013

Owner

I'm not sure I understand these two tests.

I'm not sure I understand these two tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment