New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI.js expects users to do their own URI-encoding, thus defeating its own purpose #79

Closed
djcsdy opened this Issue Apr 17, 2013 · 14 comments

Comments

Projects
None yet
2 participants
@djcsdy
Contributor

djcsdy commented Apr 17, 2013

URI("").segment(["%"]);
// => URIError: URI malformed

URI("%25").segment();
// => ["%25"]

This makes me want to cry.

The whole point of having a library to handle URIs is that the user of the library shouldn’t have to think about how to encode or decode URIs.

For example, if I know that there is a resource called “Business Plan (90%)” in a directory called “My Extremely Dull Documents”, I ought be able to create a relative URI referencing that resource with syntax somewhat like this:

URI("").segment(["My Extremely Dull Documents"], ["Business Plan (90%)"]);

which would then produce a URI whose text representation is: My%20Extremely%20Dull%20Documents/Business%20Plan%20%2890%25%29.

Similarly when parsing the resulting URI I ought to be able to do something somewhat like:

URI("My%20Extremely%20Dull%20Documents/Business%20Plan%20%2890%25%29").
    segment();

And get back the original strings: ["My Extremely Dull Documents"], ["Business Plan (90%)"].

I know I can use URI.encode and URI.decode on the path segments, but that requires the user to remember to do so, and to know and remember which of the several encoding schemes are used for which parts of the URI, so that’s pretty error-prone :(.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Apr 17, 2013

Member

What do you suggest we do about this then? .path() has the same problem (on another level). Simply decoding everything could be very confusing as an encoded / might be in there somewhere… so /a%2Fb/c would be decoded to /a/b/c. I see how that limitation does not apply to .segment() - but I would really don't want methods to return different encodings (unless absolutely necessary / obvious).

Member

rodneyrehm commented Apr 17, 2013

What do you suggest we do about this then? .path() has the same problem (on another level). Simply decoding everything could be very confusing as an encoded / might be in there somewhere… so /a%2Fb/c would be decoded to /a/b/c. I see how that limitation does not apply to .segment() - but I would really don't want methods to return different encodings (unless absolutely necessary / obvious).

@djcsdy

This comment has been minimized.

Show comment
Hide comment
@djcsdy

djcsdy Apr 17, 2013

Contributor

Yeah, I deliberately stopped short of proposing a specific solution because I’m aware of the problem you point out, and I didn’t want to distract from the broader point.

In fact .segment() isn’t quite immune to this problem either because URI path segments may themselves have substructure. RFC 3986 kinda-sorta obsoletes the path segment parameters that were defined by RFC2396, so it’s somewhat safe to pretend they don’t exist. That said, RFC 3986 does briefly describe the syntax for path segment parameters and allows for other types of arbitrary substructure using the sub-delims characters, and that usage is relevant when deciding whether to encode a given character :-/.

One possible solution is to acknowledge that just as a URI is a data structure, so many URI components are themselves data structures and should be treated as such. So, for instance, .path() could return a URIPath object with its own accessors, and with a .toString() method that returns the serialization of the path. The .path(p) setter would accept a URIPath object or an encoded string. Digging far enough down the hierarchy would eventually yield an accessor that just returns a decoded string. That wouldn’t be a very backwards-compatible solution, although it might just work in cases where users expect existing accessors to return a string.

Another solution is to do something similar to what you’ve done for query parameters. The .query() accessor returns the query string as-is but there are convenience methods to get and set query parameters without having to think about encoding. It’s possible to imagine something similar for path handling, although I’m not sure how it would look just yet.

Contributor

djcsdy commented Apr 17, 2013

Yeah, I deliberately stopped short of proposing a specific solution because I’m aware of the problem you point out, and I didn’t want to distract from the broader point.

In fact .segment() isn’t quite immune to this problem either because URI path segments may themselves have substructure. RFC 3986 kinda-sorta obsoletes the path segment parameters that were defined by RFC2396, so it’s somewhat safe to pretend they don’t exist. That said, RFC 3986 does briefly describe the syntax for path segment parameters and allows for other types of arbitrary substructure using the sub-delims characters, and that usage is relevant when deciding whether to encode a given character :-/.

One possible solution is to acknowledge that just as a URI is a data structure, so many URI components are themselves data structures and should be treated as such. So, for instance, .path() could return a URIPath object with its own accessors, and with a .toString() method that returns the serialization of the path. The .path(p) setter would accept a URIPath object or an encoded string. Digging far enough down the hierarchy would eventually yield an accessor that just returns a decoded string. That wouldn’t be a very backwards-compatible solution, although it might just work in cases where users expect existing accessors to return a string.

Another solution is to do something similar to what you’ve done for query parameters. The .query() accessor returns the query string as-is but there are convenience methods to get and set query parameters without having to think about encoding. It’s possible to imagine something similar for path handling, although I’m not sure how it would look just yet.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Apr 17, 2013

Member

I think we can leave .path() as it is. .segment() on the other hand is accessing a subset of the path that can be decoded safely (analogous to .query() and its helpers) - like you correctly pointed out. So, I'm ready to retract my previous statement that they should return similar encoded data.

That just leaves the problem of backward-compatibility. I do not want to add another (boolean) parameter to the signature. So we'd probably be left with exposing a new method. Ideas for a name?

Member

rodneyrehm commented Apr 17, 2013

I think we can leave .path() as it is. .segment() on the other hand is accessing a subset of the path that can be decoded safely (analogous to .query() and its helpers) - like you correctly pointed out. So, I'm ready to retract my previous statement that they should return similar encoded data.

That just leaves the problem of backward-compatibility. I do not want to add another (boolean) parameter to the signature. So we'd probably be left with exposing a new method. Ideas for a name?

@djcsdy

This comment has been minimized.

Show comment
Hide comment
@djcsdy

djcsdy Apr 17, 2013

Contributor

.pathsegment()?

I can’t think of a name that adequately describes the difference without being excessively long, but if you marked .segment() as deprecated and gave the new method a different-but-still-appropriate name that should suffice.

Contributor

djcsdy commented Apr 17, 2013

.pathsegment()?

I can’t think of a name that adequately describes the difference without being excessively long, but if you marked .segment() as deprecated and gave the new method a different-but-still-appropriate name that should suffice.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Apr 17, 2013

Member

I don't want to deprecate .segment() as some people might actually need the encoded pieces. In fact, I simply forgot about duplicateQueryParameters. This is something we did to URI.buildQuery() in order to allow the developer to define how duplicate parameters must be handled.

So, I'd propose we do the same thing to .segment() and call the option decodedSegments. That way you'd simply have to add URI.decodedSegments = true; to your script and .segment() would return decoded values and encode any values passed in.

Member

rodneyrehm commented Apr 17, 2013

I don't want to deprecate .segment() as some people might actually need the encoded pieces. In fact, I simply forgot about duplicateQueryParameters. This is something we did to URI.buildQuery() in order to allow the developer to define how duplicate parameters must be handled.

So, I'd propose we do the same thing to .segment() and call the option decodedSegments. That way you'd simply have to add URI.decodedSegments = true; to your script and .segment() would return decoded values and encode any values passed in.

@djcsdy

This comment has been minimized.

Show comment
Hide comment
@djcsdy

djcsdy Apr 17, 2013

Contributor

That sounds like a sensible idea as long as it’s a property of the URI instance and not set globally for the whole library...

Contributor

djcsdy commented Apr 17, 2013

That sounds like a sensible idea as long as it’s a property of the URI instance and not set globally for the whole library...

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Apr 17, 2013

Member

If we follow the example given, it is both. the global setting by default, but definable on instance-level. So this will be it then :)

Member

rodneyrehm commented Apr 17, 2013

If we follow the example given, it is both. the global setting by default, but definable on instance-level. So this will be it then :)

@djcsdy

This comment has been minimized.

Show comment
Hide comment
@djcsdy

djcsdy Apr 17, 2013

Contributor

I will just have to hope that nobody uses the global version and breaks unrelated code from a distance :).

Contributor

djcsdy commented Apr 17, 2013

I will just have to hope that nobody uses the global version and breaks unrelated code from a distance :).

@djcsdy

This comment has been minimized.

Show comment
Hide comment
@djcsdy

djcsdy Apr 18, 2013

Contributor

I’m happy to have a go at implementing this on Monday unless you’d rather do it? Let me know :).

Contributor

djcsdy commented Apr 18, 2013

I’m happy to have a go at implementing this on Monday unless you’d rather do it? Let me know :).

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Apr 18, 2013

Member

sure, go ahead! I won't be doing any coding / merging until May, though.

Member

rodneyrehm commented Apr 18, 2013

sure, go ahead! I won't be doing any coding / merging until May, though.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Aug 3, 2013

Member

@djcsdy are you still on this?

Member

rodneyrehm commented Aug 3, 2013

@djcsdy are you still on this?

@djcsdy

This comment has been minimized.

Show comment
Hide comment
@djcsdy

djcsdy Aug 3, 2013

Contributor

Obviously I didn’t get to look at this when I said I would, sorry!

I have some free time today and next week, but perhaps I’d better finish up #78 and #82 first.

You’re welcome to take this one off my hands if you want, I didn’t make a start on it.

Contributor

djcsdy commented Aug 3, 2013

Obviously I didn’t get to look at this when I said I would, sorry!

I have some free time today and next week, but perhaps I’d better finish up #78 and #82 first.

You’re welcome to take this one off my hands if you want, I didn’t make a start on it.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Aug 3, 2013

Member

Ok, you go ahead and take care of #78 and #82 whenever you can. I'll try to look into #79 tomorrow :)

Member

rodneyrehm commented Aug 3, 2013

Ok, you go ahead and take care of #78 and #82 whenever you can. I'll try to look into #79 tomorrow :)

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Aug 4, 2013

Member

I've fixed this in master - it will be included in the next release. thank you for your help!

the new method is called .segmentCoded() and behaves exactly the same as .segment() with the subtle difference of transparently en/decoding values

Member

rodneyrehm commented Aug 4, 2013

I've fixed this in master - it will be included in the next release. thank you for your help!

the new method is called .segmentCoded() and behaves exactly the same as .segment() with the subtle difference of transparently en/decoding values

@rodneyrehm rodneyrehm closed this Aug 4, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment