New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape spaces in path segments properly #1347

Merged
merged 2 commits into from Jul 19, 2013

Conversation

Projects
None yet
4 participants
@richdougherty
Member

richdougherty commented Jul 18, 2013

Play currently uses java.net.URLEncoder.encode to encode path segments. But this uses query string encoding, which encodes spaces as +. Spaces in path segments should be encoded as %20.

https://groups.google.com/d/msg/play-framework/esgvgNcJfl8/RhKNF0qLkH4J

@ghost ghost assigned richdougherty Jul 17, 2013

@cloudbees-pull-request-builder

This comment has been minimized.

Show comment
Hide comment
@cloudbees-pull-request-builder

cloudbees-pull-request-builder commented Jul 18, 2013

play2-master-PRs #385 SUCCESS
This pull request looks good

@cloudbees-pull-request-builder

This comment has been minimized.

Show comment
Hide comment
@cloudbees-pull-request-builder

cloudbees-pull-request-builder commented Jul 18, 2013

play2-master-PRs #386 SUCCESS
This pull request looks good

jroper added a commit that referenced this pull request Jul 19, 2013

Merge pull request #1347 from richdougherty/url-encoding
Escape spaces in path segments properly

@jroper jroper merged commit fe32a48 into playframework:master Jul 19, 2013

val in = s.getBytes(encoding)
val out = new ByteArrayOutputStream()
for (b <- in) {
val allowed = segmentChars.get(b & 0xFF)

This comment has been minimized.

@huntc

huntc Jul 19, 2013

Collaborator

awesome

@huntc

huntc Jul 19, 2013

Collaborator

awesome

This comment has been minimized.

@jroper

jroper Jul 26, 2013

Member

This always confuses me... the old unsigned byte trick.

@jroper

jroper Jul 26, 2013

Member

This always confuses me... the old unsigned byte trick.

This comment has been minimized.

@richdougherty

richdougherty Jul 27, 2013

Member

Like I said, bit arithmetic makes an interesting change from working with iteratees!

@richdougherty

richdougherty Jul 27, 2013

Member

Like I said, bit arithmetic makes an interesting change from working with iteratees!

@huntc

This comment has been minimized.

Show comment
Hide comment
@huntc

huntc Jul 19, 2013

Collaborator

Beautifully written, and the tests look awesome too.

Collaborator

huntc commented Jul 19, 2013

Beautifully written, and the tests look awesome too.

@richdougherty richdougherty deleted the richdougherty:url-encoding branch Jul 19, 2013

* The string `s` will be converted to octets (bytes) using this character encoding.
* @return An encoded string in the US-ASCII character set.
*/
def encodePathSegment(s: String, encoding: String): String = {

This comment has been minimized.

@jroper

jroper Jul 26, 2013

Member

It seems that according to RFC-3986, new URI schemes must be UTF-8 encoded. http is not a new URI scheme, and it doesn't make this assertion about it because that would make many browsers non spec compliant. But it seems to push you strongly to use UTF-8 if you're the one doing the encoding. So I'm not sure if we need the encoding parameter.

@jroper

jroper Jul 26, 2013

Member

It seems that according to RFC-3986, new URI schemes must be UTF-8 encoded. http is not a new URI scheme, and it doesn't make this assertion about it because that would make many browsers non spec compliant. But it seems to push you strongly to use UTF-8 if you're the one doing the encoding. So I'm not sure if we need the encoding parameter.

This comment has been minimized.

@richdougherty

richdougherty Jul 27, 2013

Member

I wasn't confident enough about how URL encoding happens around the world to go UTF-8 only. I did read the bit that you highlight, but since it didn't mandate UTF-8 encoding for http URIs I thought I'd keep it more general. Possibly if we're only interested in producing URIs to be consumed by Play itself then we could say UTF-8.

@richdougherty

richdougherty Jul 27, 2013

Member

I wasn't confident enough about how URL encoding happens around the world to go UTF-8 only. I did read the bit that you highlight, but since it didn't mandate UTF-8 encoding for http URIs I thought I'd keep it more general. Possibly if we're only interested in producing URIs to be consumed by Play itself then we could say UTF-8.

@richdougherty

This comment has been minimized.

Show comment
Hide comment
@richdougherty

richdougherty Jul 27, 2013

Member

I made a list of the links I used when researching this issue: http://notes.richdougherty.com/2013/07/url-path-segment-encoding.html

Member

richdougherty commented Jul 27, 2013

I made a list of the links I used when researching this issue: http://notes.richdougherty.com/2013/07/url-path-segment-encoding.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment