Skip to content

Conversation

lexmag
Copy link
Member

@lexmag lexmag commented Jun 10, 2014

  • According to RFC 3986 the space character should be escaped as %20, not + (it is behaviour of very early version of the URI percent-encoding rules).
  • URI.encode/2

@josevalim
Copy link
Member

Thanks @lexmag! Just to be sure we are moving in the right direction, how other languages handles this? Is space always escaped as %20? How can you configure the encoding rules in those languages? Maybe there are two encode versions?

@lexmag
Copy link
Member Author

lexmag commented Jun 10, 2014

I've checked Ruby, NodeJS, Haskell, C# - all do %20 encoding.

@josevalim
Copy link
Member

Thanks. I see that reserved and unreserved do not take all possible characters (they are mutually exclusive). What happens when a characters is not in any of those groups?

@lexmag
Copy link
Member Author

lexmag commented Jun 10, 2014

Haskell applies transformation depending of "predicate" function (like 448947abbc90d70c5ab76251133e8c6ec72ccd6c).
[source: http://hackage.haskell.org/package/network-2.2.1.7/docs/src/Network-URI.html]
It has a bunch of predefined predicate functions (Do we need them all?).

@lexmag
Copy link
Member Author

lexmag commented Jun 10, 2014

Most common character transformations are:

  • leave it as is for reserved + unreserved (NodeJS has encodeURI)
  • leave it as is for only unreserved (NodeJS has encodeURIComponent)

@lexmag
Copy link
Member Author

lexmag commented Jun 10, 2014

What happens when a characters is not in any of those groups?

@josevalim character will be percent-encoded.

@josevalim
Copy link
Member

Yeah, I would go something similar. I don't think our URI.encode should encode : or / by default.

@alco
Copy link
Member

alco commented Jun 10, 2014

Does URI support the older convention for encoding spaces using + in the query string?

For instance, where can I find an example of the query string like in this test?

http://en.wikipedia.org/wiki/Percent-encoding#The_application.2Fx-www-form-urlencoded_type

@lexmag
Copy link
Member Author

lexmag commented Jun 10, 2014

Here are question came to mind:

  • what is the default behaviour?
  • if we go with URI.encode/2 do we need more "predicate" functions?
  • should I make "predicate" functions to accept lists of one char?

I'd go with:

  • leave it as is reserved + unreserved (like Ruby, NodeJS do)
  • reserved, unreserved, unescaped
  • I see no need

@josevalim @alco WDYT?

@lexmag
Copy link
Member Author

lexmag commented Jun 10, 2014

@alco Ruby has encode_www_form for that conversion.

@josevalim
Copy link
Member

@lexmag I agree with the answer to all questions. One other question is if we want to provide URI.encode and URI.encode! and I would say that yes. But we should provide that as part of another pull request.

`URI.encode/1` escapes character if it
is not satisfied `URI.char_unescaped?/1`
@lexmag
Copy link
Member Author

lexmag commented Jun 10, 2014

Updated.

@lexmag
Copy link
Member Author

lexmag commented Jun 10, 2014

I've checked Twitter API.
They interpret track=world+cup and track=world%20cup with www-form-urlencoded header equally.

c in ?0..?9 or
c in ?a..?z or
c in ?A..?Z or
c in '~_-.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's crying 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😁 it's intentional

josevalim pushed a commit that referenced this pull request Jun 11, 2014
Make `URI.encode/1` to conform to RFC 3986. Introduce `URI.encode/2` function.
@josevalim josevalim merged commit 8cb4c71 into elixir-lang:master Jun 11, 2014
@josevalim
Copy link
Member

Thank you @lexmag!

@alco and @lexmag what do you think about adding encode_www_form to handle + and spaces as per www-form-urlencoded? I believe we need to add this functionary anyway, otherwise our query_encode and query_decode functionality will be broken. It is already broken today as this line:

https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/uri.ex#L145

Must now be encode(query, &char_unreserved?/1).

Also, should we change decode to leave + intact and add decode_www_form?

@lexmag
Copy link
Member Author

lexmag commented Jun 11, 2014

I think we need functionary to handle + as space.
In Ruby encode_www_form is analog of our query_encode.
Haskell has importList from Data.URLEncoded.

@lexmag
Copy link
Member Author

lexmag commented Jun 11, 2014

And we need a pair for it :)

@josevalim
Copy link
Member

Our encode_query is meant to work with pairs, so I would rather leave it as is and added encode_www_form and decode_www_form to work on strings. Can you please submit a new pull request? :D It would be much appreciated.

@lexmag
Copy link
Member Author

lexmag commented Jun 11, 2014

Sure.

@alco
Copy link
Member

alco commented Jun 11, 2014

Can we call it encode_web_form? Much easier to pronounce.

@josevalim
Copy link
Member

I would keep it encode_www_form just because the header is application/x-www-form-urlencoded.

@lexmag lexmag deleted the improve-uri branch June 11, 2014 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants