Skip to content

Loading…

I believe babe1e84230c57ac2b998324521509a0853131d3 is a regression and should be reverted #638

Closed
myronmarston opened this Issue · 18 comments

9 participants

@myronmarston

I commented directly on the commit and @rkh is aware of it but I wanted to make sure the issue isn't lost since it's been 6 days since that comment. Here it is in full:

I think that the old behavior was correct, and this is a regression, actually (I got a test failure from upgrading sinatra that I traced back to this commit).

From what I understand, the encoding of a space as a plus is an aspect of application/x-www-form-urlencoded encoding, and not of general URI encoding. It is common to treat a query string as being encoded according to the rules of application/x-www-form-urlencoded since web browsers send the values of a GET form as the query parameters -- so in the query string, it makes a certain amount of sense to treat + as a space. In the path, I don't think it makes sense at all -- there's no reason to treat the path of a URI as application/x-www-form-urlencoded encoded data.

I tried to find an RFC that clearly states this but haven't had any luck in my few minutes of looking. However, I did find several places where others have said the same things I said above...but it's obviously not as authoritative as an RFC:

http://stackoverflow.com/a/1006074/29262
http://stackoverflow.com/a/2678602/29262
http://blogwaffe.com/2010/02/07/an-open-letter-to-web-developers-on-percent-encoding/

Given that, would you be willing to revert this change, @rkh?

@rkh
Sinatra member

Yes, but I wanna look into this more closely and am super jammed right now.

@rkh rkh was assigned
@myronmarston

Sorry about the noise. Not trying to rush you on this. I realized after I posted the comment on the commit that this is really better off as an issue so it's not lost in the shuffle. Was just trying to correct that.

Carry on with being awesome :).

@TrevorBramble
Sinatra member

I failed to find anything definitive as well. I think mostly because the language in RFCs is borderline legalese.

It's hinted in RFC 2396 that + is not given special treatment in URIs in the "G.2." modification text:

 The plus "+", dollar "$", and comma "," characters have been added to
   those in the "reserved" set, since they are treated as reserved
   within the query component.

Likewise the HTML 5 draft asserts "+" handling in form data:

For each byte in the entry's name and value, apply the appropriate subsubsteps from the following list:

The byte is 0x20 (U+0020 SPACE if interpreted as ASCII)
Replace the byte with a single 0x2B byte ("+" (U+002B) character if interpreted as ASCII).

But for what it's worth, the difference between Ruby's CGI.escape() and URI.escape() has always been correct in my experience. (CGI turns spaces into pluses, URI turns spaces into %20 sequences.)

@rkh
Sinatra member

None of the above links seem to be referring to Sinatra explicitly. I have to say I like the current behaviour. I don't buy the "my value contains a +", as you could just properly escape the +.

@JonRowe

I've actually run into this and had to manually hack round it, in my case I was using timestamps for id's (which contain + for timezones) so this threw me, so for asthetics I'd also like to see it reverted ;)

(e.g. http://myservice.com/thing/2013-02-25T18:28:55+1100)

@rkh
Sinatra member

OK, cool, this is like the first real world feedback I'm getting on this.

@davidlumley

Similar to @JonRowe I've previously used unix timestamps in URLs, but being able to use UTC timestamps would be much more readable.

@rkh
Sinatra member

@boucher you might wanna comment.

@yb66

After reading as little as possible (and far too much) about URI's I've changed my mind. I'm for reverting this.

I'm also for outputting "DON'T BE FOOLISH"` (and possibly "DON'T MAKE ME HAVE TO READ ANY MORE RFC's") to the console of anyone using pluses in a path fragment.

@myronmarston

None of the above links seem to be referring to Sinatra explicitly.

No, but they refer to how spaces and pluses should generally be treated in different parts of a URL, and it seems like Sinatra's behavior should be in line with this.

I don't buy the "my value contains a +", as you could just properly escape the +.

I agree that it's probably wisest to percent-escape "+" in a URL, but the problem is that my API has been accepting + to mean + for a while -- so to upgrade Sinatra, I have to force all my clients to start escaping the +, or I have to put in hacky work arounds. In my specific use case, I've got a route like /social/:campaign_id/:channel/overview where :channel can be Twitter, Facebook or Google+. If I upgrade sinatra, Google+ comes through as Google.

@boucher

This is an interesting question, and I plan to read some more and comment later today. My preliminary thoughts are influenced by the actual way values are used in Sinatra -- query string parameters and parameters taken from the URL are mixed together, and having the params hash be the result of multiple different encodings is possibly weird.

@isaacsanders

Looking at the semantics of the name, I would agree with @myronmarston. The change to use decode_www_form_component seems to be out of place when decoding the URL. :+1: for @myronmarston

@timcraft

It's a regression, replacing space characters by plus characters is a feature of application/x-www-form-urlencoded.

RFC 3986 defines URI syntax. Space characters should be percent encoded (as they are not in the unreserved or sub-delims character sets). Plus characters don't need to be encoded (they are in the sub-delims character set), but they shouldn't be translated to space characters. Please revert.

@myronmarston

My preliminary thoughts are influenced by the actual way values are used in Sinatra -- query string parameters and parameters taken from the URL are mixed together, and having the params hash be the result of multiple different encodings is possibly weird.

FWIW, rails has the same issue (named dynamic route segments in the path are presented in the params hash), and it does not treat + as a space:

Action Controller_ Exception caught

(I created this by creating a new rails 3.2.12 app, generating a posts resource and defining the show action as raise params.inspect)

@boucher

I'm inclined to agree: the spec says that a '+' appearing in the path component of a URL is meant to be a '+'.

The original issue (#463 -- also filed by me) stemmed from the fact that this was a regression from earlier versions of Sinatra. In the interim, though, lots of people seem to have come to rely on the new behavior anyway. I'm ok with deciding that '+' characters in URLs should be treated as '+'. This should probably be documented somewhere, at least in the change log, as there are definitely people relying on the current behavior.

@rkh rkh added a commit that closed this issue
@rkh rkh Revert "Plus symbols in the URL should be converted to spaces when co…
…nsidered as param values. Closes #463."

This reverts commit babe1e8.

Conflicts:
	lib/sinatra/base.rb

Fixes #638
b4d7d4b
@rkh rkh closed this in b4d7d4b
@JonRowe

Thanks @rkh :)

@myronmarston

Thanks :).

@rkh
Sinatra member

I added a test in cd82a57.

@Rican7 Rican7 referenced this issue in chriso/klein.php
Closed

Named params should be urldecode'd #117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.