Added more supporting methods for URN paths. #201

Merged
merged 6 commits into from Mar 31, 2015

Conversation

Projects
None yet
3 participants
@hardlyknowem
Contributor

hardlyknowem commented Mar 25, 2015

First, I apologize if this should really be an issue instead of a PR, so that the solution could be properly planned in advance.

I work on a product is required to accept URIs which may or may not be properly encoded and reformat them as properly-encoded URIs. This is a hard problem to solve on our own, and so URI.js has been a godsend. Unfortunately, the most common type of URI we handle is a URN; and URI.js does not currently have anything like recodePath for URNs. This patch implements a recodeURNPath method for URNs and calls it while normalizing a URN path.

This PR might carry a risk with it, unfortunately. There is some equivocation in the documentation about the difference between URNs and URIs, which is reflected in the overall structure of URI.js:

URLs are used to address the individual resources of your website. URNs are usually used for hooking into other applications, as mailto:, magnet: or spotify: suggest. While RFC 3986 defines the structure of an URL in depth, URNs are not. The structure (and meaning) of URNs are up to their distinct specifications.

Technically, URN refers to a very specific thing: Uniform Resource Names as defined by RFC 2141 (the syntactic requirements of which are implemented in this patch). Though there is some degree of freedom to the structure of URNs in a specific namespace, they cannot break the rules laid out in 2141 (such as which characters are valid in a URN). In addition, this means that mailto: and spotify: URIs are not actually URNs at all (and in fact, the RFC defining mailto: scheme, RFC 2368, calls it a URL).

I think URI.js is drawing its inspiration from this paragraph in RFC 3986 (which is mentioned in the quoted documentation above):

A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location"). The term "Uniform Resource Name" (URN) has been used historically to refer to both URIs under the "urn" scheme [RFC2141], which are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable, and to any other URI with the properties of a name.

The problem here is that even this is still a purely semantic distinction, while the distinction between URNs and URIs as implemented in URI.js is syntactic: URLs are anything that looks like protocol://user:password@hostname/path/with/slashes?query=string and URNs are anything that looks like scheme:path:with:colons?query=string.

All this is to say: introducing the syntactic requirements to normalize real URNs as defined in RFC 2141 may or may not actually apply to the things URI.js treats as URNs.

That said, in practice I think the risk isn't very significant. For one, users of URN-ish URIs already could not rely on URI.js for proper normalization of paths, and so I see no harm in URI.js applying its best effort. If a consumer needs a particular kind of syntax for their custom URI scheme, they always have had to implement that themselves, and so nothing changes here.

Matthew Lefavor
Added more supporting methods for URN paths.
Use case: I work on some software that needs to accept URNs that may be
improperly formatted and encode them properly. This patch introduces a new
method, `recodeURNPath`, that will break the URN's path into segments and
make sure that each part of it is encoded properly.

Risks: URI.js treats "URN" as a catch-all category for everything that doesn't
look like an HTTP URL. Many of these schemes may or may not follow the URN
syntactical rules.
@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 26, 2015

Member

I think this is the single best PR I've been sent on any of my projects so far. Thank you.

I agree with your assessments, thank you for the explanation. I have indeed considered the distinction between URLs and URNs to be syntactic. If mailto: et al are not URNs (like I assumed) we may want to discuss what other parts of URI.js need changeing. It's very likely I did not dig deep enough into what makes URNs tick - at least that's what I think when reading that : as path delim is semantics I imposed.

Member

rodneyrehm commented Mar 26, 2015

I think this is the single best PR I've been sent on any of my projects so far. Thank you.

I agree with your assessments, thank you for the explanation. I have indeed considered the distinction between URLs and URNs to be syntactic. If mailto: et al are not URNs (like I assumed) we may want to discuss what other parts of URI.js need changeing. It's very likely I did not dig deep enough into what makes URNs tick - at least that's what I think when reading that : as path delim is semantics I imposed.

+ // for usage in a URN. RFC2141 also calls out "-", ".", and "_" as acceptable characters, but
+ // these aren't encoded by encodeURIComponent, so we don't have to call them out here. Also
+ // note that the colon character is not featured in the encoding map; this is because URI.js
+ // gives the colons in URNs semantic meaning as the delimiters of path segements, and so it

This comment has been minimized.

@rodneyrehm

rodneyrehm Mar 26, 2015

Member

If : is neither the URN path delimiter, nor the "industry default path delimiter", we need to talk about what is.

@rodneyrehm

rodneyrehm Mar 26, 2015

Member

If : is neither the URN path delimiter, nor the "industry default path delimiter", we need to talk about what is.

src/URI.js
@@ -366,6 +402,22 @@
return segments.join('/');
};
+ URI.recodeURNPath = function(string) {

This comment has been minimized.

@rodneyrehm

rodneyrehm Mar 26, 2015

Member

URI.recodeUrnPath

this method looks identical to recodePath() except for the delimiter and the recode callback. this should probably be refactored to a generator like generatePrefixAccessor() does for accessors

@rodneyrehm

rodneyrehm Mar 26, 2015

Member

URI.recodeUrnPath

this method looks identical to recodePath() except for the delimiter and the recode callback. this should probably be refactored to a generator like generatePrefixAccessor() does for accessors

This comment has been minimized.

@hardlyknowem

hardlyknowem Mar 27, 2015

Contributor

I'll do that. Interestingly, both of those look like their decodePath counterparts, so I might be able to make a generator for those as well.

@hardlyknowem

hardlyknowem Mar 27, 2015

Contributor

I'll do that. Interestingly, both of those look like their decodePath counterparts, so I might be able to make a generator for those as well.

src/URI.js
@@ -389,6 +441,10 @@
URI[_part + 'PathSegment'] = generateAccessor('pathname', _parts[_part]);
}
+ for (_part in _parts) {

This comment has been minimized.

@rodneyrehm

rodneyrehm Mar 26, 2015

Member

is there a reason we need two loops here?

@rodneyrehm

rodneyrehm Mar 26, 2015

Member

is there a reason we need two loops here?

This comment has been minimized.

@hardlyknowem

hardlyknowem Mar 27, 2015

Contributor

Well, that's embarrassing. Fixed in upcoming commit.

@hardlyknowem

hardlyknowem Mar 27, 2015

Contributor

Well, that's embarrassing. Fixed in upcoming commit.

src/URI.js
- _was_relative = true;
- _path = '/' + _path;
- }
+ if (this._parts.urn) {

This comment has been minimized.

@rodneyrehm

rodneyrehm Mar 26, 2015

Member

can this be structured in a way that avoids the "christmas tree" effect of indentation?

@rodneyrehm

rodneyrehm Mar 26, 2015

Member

can this be structured in a way that avoids the "christmas tree" effect of indentation?

This comment has been minimized.

@hardlyknowem

hardlyknowem Mar 27, 2015

Contributor

It certainly can be; in order to do that, we'd have to duplicate the this.build(!build) line. That's probably not so bad, but I think I can just pull the parent-resolving part of the code into its own subroutine, and it won't be so hideously overindented then.

@hardlyknowem

hardlyknowem Mar 27, 2015

Contributor

It certainly can be; in order to do that, we'd have to duplicate the this.build(!build) line. That's probably not so bad, but I think I can just pull the parent-resolving part of the code into its own subroutine, and it won't be so hideously overindented then.

@hardlyknowem

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 26, 2015

Contributor

So before I address any specific comments, I think it might be better to have the discussion about how much more would need to change—just so we don't do any work that gets obsoleted soon.

Personally, I think the distinction URI.js makes right now between URIs and URLs is fine, so long as it is explained in the docs and perhaps given a different name. There are some URI schemes out there that aren't technically URNs in the full RFC 2141 sense, but they are used to name things rather than locate them (and so they would fall under the "historical usage" talked about in RFC 3986). More importantly, many of these look a whole lot like URNs: see the DOI scheme (e.g., doi:10.1000/182).

So, in practice, I think it's perfectly valid to have URI.js distinguish between two kinds of URIs: those that look like HTTP URLs (protocol:// with '/' as a path separator) and those that look like URNs (scheme:, with ':' as a path separator, more on that below), which URI.js already has. It's not technically right, but it can't be the responsibility of URI.js to accommodate every possible URI scheme. Most URIs are going to fall into those two camps.

The change I might suggest instead is to change the documentation so that the distinction is made between URN-ish and URL-ish URIs.

As for the URN syntax and the use of the colon: According to the RFC, URNs have the following syntax: urn:<nid>:<nss>, where <nid> is the "Namespace Identifier" and <nss> is the "Namespace-Specific String." And so, the colon does have special syntactic meaning for URNs at least once. The place you could get into trouble is in the namespace-specific string.

I just went and skimmed the RFCs of the first twenty formal URN namespaces registered here and it appears that every single one of them uses the colon as something like a path separator. So I think for all practical purposes URI.js is totally valid in giving the colon character special significance as a path separator. Again, there's no way that URI.js can accommodate all possible URI syntaxes (and all possible URN namespace-specific syntaxes), so I think practicality can win the day here.

So, in conclusion, I think that after cleaning this current PR up, the only thing that would really need to change is the documentation, to make it clearer what URI.js really means by "URN". If you agree with that assessment, I'll go ahead and start cleaning up this PR.

Contributor

hardlyknowem commented Mar 26, 2015

So before I address any specific comments, I think it might be better to have the discussion about how much more would need to change—just so we don't do any work that gets obsoleted soon.

Personally, I think the distinction URI.js makes right now between URIs and URLs is fine, so long as it is explained in the docs and perhaps given a different name. There are some URI schemes out there that aren't technically URNs in the full RFC 2141 sense, but they are used to name things rather than locate them (and so they would fall under the "historical usage" talked about in RFC 3986). More importantly, many of these look a whole lot like URNs: see the DOI scheme (e.g., doi:10.1000/182).

So, in practice, I think it's perfectly valid to have URI.js distinguish between two kinds of URIs: those that look like HTTP URLs (protocol:// with '/' as a path separator) and those that look like URNs (scheme:, with ':' as a path separator, more on that below), which URI.js already has. It's not technically right, but it can't be the responsibility of URI.js to accommodate every possible URI scheme. Most URIs are going to fall into those two camps.

The change I might suggest instead is to change the documentation so that the distinction is made between URN-ish and URL-ish URIs.

As for the URN syntax and the use of the colon: According to the RFC, URNs have the following syntax: urn:<nid>:<nss>, where <nid> is the "Namespace Identifier" and <nss> is the "Namespace-Specific String." And so, the colon does have special syntactic meaning for URNs at least once. The place you could get into trouble is in the namespace-specific string.

I just went and skimmed the RFCs of the first twenty formal URN namespaces registered here and it appears that every single one of them uses the colon as something like a path separator. So I think for all practical purposes URI.js is totally valid in giving the colon character special significance as a path separator. Again, there's no way that URI.js can accommodate all possible URI syntaxes (and all possible URN namespace-specific syntaxes), so I think practicality can win the day here.

So, in conclusion, I think that after cleaning this current PR up, the only thing that would really need to change is the documentation, to make it clearer what URI.js really means by "URN". If you agree with that assessment, I'll go ahead and start cleaning up this PR.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 26, 2015

Member

… it can't be the responsibility of URI.js to accommodate every possible URI scheme

well. You should be able to work with every scheme. But that doesn't mean that stuff like .segment() should work for every scheme (unless we make the separator a scheme-configurable thing, with : being the default).

If you agree with that assessment, I'll go ahead and start cleaning up this PR.

I do, for better or worse. Do you also want take a shot at the docs (may be a second PR if you wish)?

Member

rodneyrehm commented Mar 26, 2015

… it can't be the responsibility of URI.js to accommodate every possible URI scheme

well. You should be able to work with every scheme. But that doesn't mean that stuff like .segment() should work for every scheme (unless we make the separator a scheme-configurable thing, with : being the default).

If you agree with that assessment, I'll go ahead and start cleaning up this PR.

I do, for better or worse. Do you also want take a shot at the docs (may be a second PR if you wish)?

@hardlyknowem

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 27, 2015

Contributor

I'll take a shot at the docs, but it may or may not be this PR. Let me figure out how hard it would be to do that.

Contributor

hardlyknowem commented Mar 27, 2015

I'll take a shot at the docs, but it may or may not be this PR. Let me figure out how hard it would be to do that.

Matthew Lefavor added some commits Mar 27, 2015

Matthew Lefavor
Added finally clause to iso8859 and unicode methods.
My local changes created a bug that caused an exception to be thrown during the `normalize` call of
these methods, which then caused later tests to fail because the wrong decoding/encoding function
was being used.
@hardlyknowem

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 27, 2015

Contributor

Alright, I've responded to the comments and made the changes to the docs. Let me know if you'd like other things changed.

Contributor

hardlyknowem commented Mar 27, 2015

Alright, I've responded to the comments and made the changes to the docs. Let me know if you'd like other things changed.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 27, 2015

the `finally´ here doesn't make much sense. If you want to mask an error (which I doubt you should), try:

try {
  doSomethingExplosive();
} catch(e) {}

but really, hiding errors like this is a debugging headache down the road. why/when exactly are exceptions thrown?

the `finally´ here doesn't make much sense. If you want to mask an error (which I doubt you should), try:

try {
  doSomethingExplosive();
} catch(e) {}

but really, hiding errors like this is a debugging headache down the road. why/when exactly are exceptions thrown?

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 27, 2015

I'm not trying to mask an error. Quite the opposite—I explicitly want the code higher up the stack to know about the exception. All finally means is "if something would cause us to leave the try block (either an exception, a return statement, or if the block simply finishes), execute this block before doing anything else."

In the case of an exception, if an exception is thrown in lower-level code (in this case this.normalize()), it would propagate up the stack to here. The interpreter would then execute the finally block, and then afterwards it would continue propagating the exception up the stack. The point isn't to handle errors as much as it is to clean up the mess we made in the beginning of this function before letting whoever called the function handle the error instead.

The issue I was having was that in the process of testing, this.normalize() was throwing an exception because of a silly mistake. I didn't mind the fact that it threw an exception—the exception was telling me the test was failing, which is useful information—but the issue is that without the finally block, the encode and decode functions are never restored to what they are supposed to be. This meant that other tests also mysteriously failed (but when I re-ran those tests individually they would pass, which is because it reloads the page and thus the URI module is restored to the way it was originally.).

I'm not trying to mask an error. Quite the opposite—I explicitly want the code higher up the stack to know about the exception. All finally means is "if something would cause us to leave the try block (either an exception, a return statement, or if the block simply finishes), execute this block before doing anything else."

In the case of an exception, if an exception is thrown in lower-level code (in this case this.normalize()), it would propagate up the stack to here. The interpreter would then execute the finally block, and then afterwards it would continue propagating the exception up the stack. The point isn't to handle errors as much as it is to clean up the mess we made in the beginning of this function before letting whoever called the function handle the error instead.

The issue I was having was that in the process of testing, this.normalize() was throwing an exception because of a silly mistake. I didn't mind the fact that it threw an exception—the exception was telling me the test was failing, which is useful information—but the issue is that without the finally block, the encode and decode functions are never restored to what they are supposed to be. This meant that other tests also mysteriously failed (but when I re-ran those tests individually they would pass, which is because it reloads the page and thus the URI module is restored to the way it was originally.).

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 27, 2015

understood, agreed.

understood, agreed.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 27, 2015

can we go with proper camelCased decodeUrnPath?

can we go with proper camelCased decodeUrnPath?

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 27, 2015

<em> for emphasis?

<em> for emphasis?

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 30, 2015

Ah, the fact that I'm not a front-end developer is showing through. Fixed in upcoming commit.

Ah, the fact that I'm not a front-end developer is showing through. Fixed in upcoming commit.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 27, 2015

In theory the distinction…
<strong>semantics</strong>

In theory the distinction…
<strong>semantics</strong>

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 30, 2015

Fixed in upcoming commit.

Fixed in upcoming commit.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 27, 2015

URIs not URIS

URIs not URIS

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 30, 2015

Fixed in upcoming commit.

Fixed in upcoming commit.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 27, 2015

<strong>syntax</strong>

<strong>syntax</strong>

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 27, 2015

should this be transformed to a <ol>?

should this be transformed to a <ol>?

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 30, 2015

I don't think it needs to be. I wrote it this way not so much because it ought to be read this way, but rather to follow the convention in the surrounding text: one sentence per line, but if the sentence is too long, break it up into separate clauses on each line. I'll take the tabbing out to make it clearer.

I don't think it needs to be. I wrote it this way not so much because it ought to be read this way, but rather to follow the convention in the surrounding text: one sentence per line, but if the sentence is too long, break it up into separate clauses on each line. I'll take the tabbing out to make it clearer.

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 27, 2015

Member

great stuff!

What resources besides RFC 2141, the list of official URNs, RFC 2368 (other RFC for popular schemes?) do you think should go into the Readme.md?

Member

rodneyrehm commented Mar 27, 2015

great stuff!

What resources besides RFC 2141, the list of official URNs, RFC 2368 (other RFC for popular schemes?) do you think should go into the Readme.md?

@hardlyknowem

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 27, 2015

Contributor

Thanks! I'll think on that last question. I'll address the new comments (as well as compile a list of resources) over the weekend.

Contributor

hardlyknowem commented Mar 27, 2015

Thanks! I'll think on that last question. I'll address the new comments (as well as compile a list of resources) over the weekend.

Matthew Lefavor
Responded to PR review comments (round 2).
1. Used more meaningful tags in the documentation.
2. method names with "URN" in them have been camel-cased to use "Urn" instead.
3. A list of resources and a changelog has been added to the README.
@hardlyknowem

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 30, 2015

Contributor

Responded to all above comments. Do you want the above work to be rebased/squashed into a single commit?

Contributor

hardlyknowem commented Mar 30, 2015

Responded to all above comments. Do you want the above work to be rebased/squashed into a single commit?

README.md
@@ -243,6 +246,7 @@ URI.js is published under the [MIT license](http://www.opensource.org/licenses/m
### master (will become 1.15.0)
+* URNs are now normalized based on the syntax given by [RFC 2141](https://www.ietf.org/rfc/rfc2141.txt)

This comment has been minimized.

@rodneyrehm

rodneyrehm Mar 31, 2015

Member

proper format and content would be:

@rodneyrehm

rodneyrehm Mar 31, 2015

Member

proper format and content would be:

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 31, 2015

Member

nice!

Responded to all above comments. Do you want the above work to be rebased/squashed into a single commit?

I can't say I have an opinion on that. But sure, you can squash the commits if you want to.

Member

rodneyrehm commented Mar 31, 2015

nice!

Responded to all above comments. Do you want the above work to be rebased/squashed into a single commit?

I can't say I have an opinion on that. But sure, you can squash the commits if you want to.

@hardlyknowem

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Mar 31, 2015

Contributor

README file updated.

I won't bother with squashing, since there are some comments on commits. (Rebasing changes commit hashes, which means the comments become hard to locate.)

Contributor

hardlyknowem commented Mar 31, 2015

README file updated.

I won't bother with squashing, since there are some comments on commits. (Rebasing changes commit hashes, which means the comments become hard to locate.)

rodneyrehm added a commit that referenced this pull request Mar 31, 2015

Merge pull request #201 from RusticiSoftware/master
Added more supporting methods for URN paths.

@rodneyrehm rodneyrehm merged commit 55d5a98 into medialize:master Mar 31, 2015

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 31, 2015

Member

thank you!

Member

rodneyrehm commented Mar 31, 2015

thank you!

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Mar 31, 2015

Member

released v1.15.0, thank you for your support!

Member

rodneyrehm commented Mar 31, 2015

released v1.15.0, thank you for your support!

@hardlyknowem

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Apr 1, 2015

Contributor

Thanks for including it!

Contributor

hardlyknowem commented Apr 1, 2015

Thanks for including it!

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Apr 10, 2015

Member

Hey @mlefoster , follow-up question: @Munter has created a repository identifying various official iana and unoffical schemes. By itself I don't see how it could be helpful to URI.js in providing better URN support. The question is what data would we want to add to scheme in order to accomplish anything significant? Or is there nothing left to be improved/simplified?

Member

rodneyrehm commented Apr 10, 2015

Hey @mlefoster , follow-up question: @Munter has created a repository identifying various official iana and unoffical schemes. By itself I don't see how it could be helpful to URI.js in providing better URN support. The question is what data would we want to add to scheme in order to accomplish anything significant? Or is there nothing left to be improved/simplified?

@Munter

This comment has been minimized.

Show comment
Hide comment
@Munter

Munter Apr 10, 2015

I can see that you have an ad-hoc scheme to default port mapping in URI.js. That part might be nice to add to the schemes data object, if there are official resources I can scrape for them

Munter commented Apr 10, 2015

I can see that you have an ad-hoc scheme to default port mapping in URI.js. That part might be nice to add to the schemes data object, if there are official resources I can scrape for them

@hardlyknowem

This comment has been minimized.

Show comment
Hide comment
@hardlyknowem

hardlyknowem Apr 11, 2015

Contributor

For my own part, I mainly use URI.js as a syntactical helper (in particular, normalizing and sanitizing bad inputs, particularly with respect to percent-encoding). From that perspective, the two things I can think of would be 1) a better mapping of schemes to default ports for those URIs that have ports and 2) maybe a mapping of schemes to their general syntax: whether the scheme supports a query string, what the path separator is, etc.

The former is something URI.js already supports; it's just a matter of filling it out more if appropriate. As for 2), the question is whether that's practical. Are there URI schemes that fall outside the general HTTP-like and URN-like dichotomy? And if there are, are those schemes common enough that making those changes would be worth it? Are there enough handwritten/poorly-formatted URIs of those schemes floating around that people have a need to normalize them?

Contributor

hardlyknowem commented Apr 11, 2015

For my own part, I mainly use URI.js as a syntactical helper (in particular, normalizing and sanitizing bad inputs, particularly with respect to percent-encoding). From that perspective, the two things I can think of would be 1) a better mapping of schemes to default ports for those URIs that have ports and 2) maybe a mapping of schemes to their general syntax: whether the scheme supports a query string, what the path separator is, etc.

The former is something URI.js already supports; it's just a matter of filling it out more if appropriate. As for 2), the question is whether that's practical. Are there URI schemes that fall outside the general HTTP-like and URN-like dichotomy? And if there are, are those schemes common enough that making those changes would be worth it? Are there enough handwritten/poorly-formatted URIs of those schemes floating around that people have a need to normalize them?

@rodneyrehm

This comment has been minimized.

Show comment
Hide comment
@rodneyrehm

rodneyrehm Apr 11, 2015

Member

Are there enough handwritten/poorly-formatted URIs of those schemes floating around that people have a need to normalize them?

I don't think normalizing is the main feature to go for. Could a proper scheme mapping (whatever that would look like) support semantics of a particular scheme? Think about mailto:<address>?subject=&body&… provided a map { address: …, subject: …, body: … } and thus simpler to handle. That said, I have no idea if people actually need it.

Member

rodneyrehm commented Apr 11, 2015

Are there enough handwritten/poorly-formatted URIs of those schemes floating around that people have a need to normalize them?

I don't think normalizing is the main feature to go for. Could a proper scheme mapping (whatever that would look like) support semantics of a particular scheme? Think about mailto:<address>?subject=&body&… provided a map { address: …, subject: …, body: … } and thus simpler to handle. That said, I have no idea if people actually need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment