Make `URI.encode/1` to conform to RFC 3986. Introduce `URI.encode/2` function. #2392

lexmag · 2014-06-10T14:07:50Z

According to RFC 3986 the space character should be escaped as %20, not + (it is behaviour of very early version of the URI percent-encoding rules).
URI.encode/2

josevalim · 2014-06-10T14:11:16Z

Thanks @lexmag! Just to be sure we are moving in the right direction, how other languages handles this? Is space always escaped as %20? How can you configure the encoding rules in those languages? Maybe there are two encode versions?

lexmag · 2014-06-10T14:17:00Z

I've checked Ruby, NodeJS, Haskell, C# - all do %20 encoding.

josevalim · 2014-06-10T14:20:47Z

Thanks. I see that reserved and unreserved do not take all possible characters (they are mutually exclusive). What happens when a characters is not in any of those groups?

lexmag · 2014-06-10T14:21:05Z

Haskell applies transformation depending of "predicate" function (like 448947abbc90d70c5ab76251133e8c6ec72ccd6c).
[source: http://hackage.haskell.org/package/network-2.2.1.7/docs/src/Network-URI.html]
It has a bunch of predefined predicate functions (Do we need them all?).

lexmag · 2014-06-10T14:25:23Z

Most common character transformations are:

leave it as is for reserved + unreserved (NodeJS has encodeURI)
leave it as is for only unreserved (NodeJS has encodeURIComponent)

lexmag · 2014-06-10T14:27:48Z

What happens when a characters is not in any of those groups?

@josevalim character will be percent-encoded.

josevalim · 2014-06-10T14:29:35Z

Yeah, I would go something similar. I don't think our URI.encode should encode : or / by default.

alco · 2014-06-10T14:35:29Z

Does URI support the older convention for encoding spaces using + in the query string?

For instance, where can I find an example of the query string like in this test?

http://en.wikipedia.org/wiki/Percent-encoding#The_application.2Fx-www-form-urlencoded_type

lexmag · 2014-06-10T14:35:31Z

Here are question came to mind:

what is the default behaviour?
if we go with URI.encode/2 do we need more "predicate" functions?
should I make "predicate" functions to accept lists of one char?

I'd go with:

leave it as is reserved + unreserved (like Ruby, NodeJS do)
reserved, unreserved, unescaped
I see no need

@josevalim @alco WDYT?

lexmag · 2014-06-10T14:39:48Z

@alco Ruby has encode_www_form for that conversion.

josevalim · 2014-06-10T14:46:54Z

@lexmag I agree with the answer to all questions. One other question is if we want to provide URI.encode and URI.encode! and I would say that yes. But we should provide that as part of another pull request.

`URI.encode/1` escapes character if it is not satisfied `URI.char_unescaped?/1`

lexmag · 2014-06-10T16:05:49Z

Updated.

lexmag · 2014-06-10T19:16:32Z

I've checked Twitter API.
They interpret track=world+cup and track=world%20cup with www-form-urlencoded header equally.

edgurgel · 2014-06-10T21:01:36Z

lib/elixir/lib/uri.ex

+    c in ?0..?9 or
+    c in ?a..?z or
+    c in ?A..?Z or
+    c in '~_-.'


It's crying 😢

😁 it's intentional

Make `URI.encode/1` to conform to RFC 3986. Introduce `URI.encode/2` function.

josevalim · 2014-06-11T09:28:11Z

Thank you @lexmag!

@alco and @lexmag what do you think about adding encode_www_form to handle + and spaces as per www-form-urlencoded? I believe we need to add this functionary anyway, otherwise our query_encode and query_decode functionality will be broken. It is already broken today as this line:

https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/uri.ex#L145

Must now be encode(query, &char_unreserved?/1).

Also, should we change decode to leave + intact and add decode_www_form?

lexmag · 2014-06-11T10:33:29Z

I think we need functionary to handle + as space.
In Ruby encode_www_form is analog of our query_encode.
Haskell has importList from Data.URLEncoded.

lexmag · 2014-06-11T10:34:53Z

And we need a pair for it :)

josevalim · 2014-06-11T10:53:39Z

Our encode_query is meant to work with pairs, so I would rather leave it as is and added encode_www_form and decode_www_form to work on strings. Can you please submit a new pull request? :D It would be much appreciated.

lexmag · 2014-06-11T10:55:05Z

Sure.

alco · 2014-06-11T11:27:28Z

Can we call it encode_web_form? Much easier to pronounce.

josevalim · 2014-06-11T11:47:08Z

I would keep it encode_www_form just because the header is application/x-www-form-urlencoded.

lexmag added 2 commits June 10, 2014 16:20

Improve function naming in URI

7f4437e

Make URI.encode/1 to conform to RFC 3986

14722ba

Introduce URI.encode/2 function

ffa2abd

`URI.encode/1` escapes character if it is not satisfied `URI.char_unescaped?/1`

edgurgel reviewed Jun 10, 2014
View reviewed changes

josevalim pushed a commit that referenced this pull request Jun 11, 2014

Merge pull request #2392 from lexmag/improve-uri

8cb4c71

Make `URI.encode/1` to conform to RFC 3986. Introduce `URI.encode/2` function.

josevalim merged commit 8cb4c71 into elixir-lang:master Jun 11, 2014

lexmag deleted the improve-uri branch June 11, 2014 11:57

lexmag mentioned this pull request Jun 13, 2014

Introduce URI.encode(decode)_www_form/2 functions #2402

Merged

Make URI.encode/1 to conform to RFC 3986. Introduce URI.encode/2 function. #2392

Make URI.encode/1 to conform to RFC 3986. Introduce URI.encode/2 function. #2392

Uh oh!

Conversation

lexmag commented Jun 10, 2014

Uh oh!

josevalim commented Jun 10, 2014

Uh oh!

lexmag commented Jun 10, 2014

Uh oh!

josevalim commented Jun 10, 2014

Uh oh!

lexmag commented Jun 10, 2014

Uh oh!

lexmag commented Jun 10, 2014

Uh oh!

lexmag commented Jun 10, 2014

Uh oh!

josevalim commented Jun 10, 2014

Uh oh!

alco commented Jun 10, 2014

Uh oh!

lexmag commented Jun 10, 2014

Uh oh!

lexmag commented Jun 10, 2014

Uh oh!

josevalim commented Jun 10, 2014

Uh oh!

lexmag commented Jun 10, 2014

Uh oh!

lexmag commented Jun 10, 2014

Uh oh!

edgurgel Jun 10, 2014

Choose a reason for hiding this comment

Uh oh!

lexmag Jun 10, 2014

Choose a reason for hiding this comment

Uh oh!

josevalim commented Jun 11, 2014

Uh oh!

lexmag commented Jun 11, 2014

Uh oh!

lexmag commented Jun 11, 2014

Uh oh!

josevalim commented Jun 11, 2014

Uh oh!

lexmag commented Jun 11, 2014

Uh oh!

alco commented Jun 11, 2014

Uh oh!

josevalim commented Jun 11, 2014

Uh oh!

Uh oh!

Make `URI.encode/1` to conform to RFC 3986. Introduce `URI.encode/2` function. #2392

Make `URI.encode/1` to conform to RFC 3986. Introduce `URI.encode/2` function. #2392