// => URIError: URI malformed
// => ["%25"]
This makes me want to cry.
The whole point of having a library to handle URIs is that the user of the library shouldn’t have to think about how to encode or decode URIs.
For example, if I know that there is a resource called “Business Plan (90%)” in a directory called “My Extremely Dull Documents”, I ought be able to create a relative URI referencing that resource with syntax somewhat like this:
URI("").segment(["My Extremely Dull Documents"], ["Business Plan (90%)"]);
which would then produce a URI whose text representation is: My%20Extremely%20Dull%20Documents/Business%20Plan%20%2890%25%29.
Similarly when parsing the resulting URI I ought to be able to do something somewhat like:
And get back the original strings: ["My Extremely Dull Documents"], ["Business Plan (90%)"].
["My Extremely Dull Documents"], ["Business Plan (90%)"]
I know I can use URI.encode and URI.decode on the path segments, but that requires the user to remember to do so, and to know and remember which of the several encoding schemes are used for which parts of the URI, so that’s pretty error-prone :(.
What do you suggest we do about this then? .path() has the same problem (on another level). Simply decoding everything could be very confusing as an encoded / might be in there somewhere… so /a%2Fb/c would be decoded to /a/b/c. I see how that limitation does not apply to .segment() - but I would really don't want methods to return different encodings (unless absolutely necessary / obvious).
Yeah, I deliberately stopped short of proposing a specific solution because I’m aware of the problem you point out, and I didn’t want to distract from the broader point.
In fact .segment() isn’t quite immune to this problem either because URI path segments may themselves have substructure. RFC 3986 kinda-sorta obsoletes the path segment parameters that were defined by RFC2396, so it’s somewhat safe to pretend they don’t exist. That said, RFC 3986 does briefly describe the syntax for path segment parameters and allows for other types of arbitrary substructure using the sub-delims characters, and that usage is relevant when deciding whether to encode a given character :-/.
One possible solution is to acknowledge that just as a URI is a data structure, so many URI components are themselves data structures and should be treated as such. So, for instance, .path() could return a URIPath object with its own accessors, and with a .toString() method that returns the serialization of the path. The .path(p) setter would accept a URIPath object or an encoded string. Digging far enough down the hierarchy would eventually yield an accessor that just returns a decoded string. That wouldn’t be a very backwards-compatible solution, although it might just work in cases where users expect existing accessors to return a string.
Another solution is to do something similar to what you’ve done for query parameters. The .query() accessor returns the query string as-is but there are convenience methods to get and set query parameters without having to think about encoding. It’s possible to imagine something similar for path handling, although I’m not sure how it would look just yet.
I think we can leave .path() as it is. .segment() on the other hand is accessing a subset of the path that can be decoded safely (analogous to .query() and its helpers) - like you correctly pointed out. So, I'm ready to retract my previous statement that they should return similar encoded data.
That just leaves the problem of backward-compatibility. I do not want to add another (boolean) parameter to the signature. So we'd probably be left with exposing a new method. Ideas for a name?
I can’t think of a name that adequately describes the difference without being excessively long, but if you marked .segment() as deprecated and gave the new method a different-but-still-appropriate name that should suffice.
I don't want to deprecate .segment() as some people might actually need the encoded pieces. In fact, I simply forgot about duplicateQueryParameters. This is something we did to URI.buildQuery() in order to allow the developer to define how duplicate parameters must be handled.
So, I'd propose we do the same thing to .segment() and call the option decodedSegments. That way you'd simply have to add URI.decodedSegments = true; to your script and .segment() would return decoded values and encode any values passed in.
URI.decodedSegments = true;
That sounds like a sensible idea as long as it’s a property of the URI instance and not set globally for the whole library...
If we follow the example given, it is both. the global setting by default, but definable on instance-level. So this will be it then :)
I will just have to hope that nobody uses the global version and breaks unrelated code from a distance :).
I’m happy to have a go at implementing this on Monday unless you’d rather do it? Let me know :).
sure, go ahead! I won't be doing any coding / merging until May, though.
@djcsdy are you still on this?
Obviously I didn’t get to look at this when I said I would, sorry!
I have some free time today and next week, but perhaps I’d better finish up #78 and #82 first.
You’re welcome to take this one off my hands if you want, I didn’t make a start on it.
Ok, you go ahead and take care of #78 and #82 whenever you can. I'll try to look into #79 tomorrow :)
adding .segmentCoded() to provide en/decoding interface to .segment()…
… - closes #79
I've fixed this in master - it will be included in the next release. thank you for your help!
the new method is called .segmentCoded() and behaves exactly the same as .segment() with the subtle difference of transparently en/decoding values