Skip to content
This repository

URL.parse fails on valid query/path with single quote character #954

Closed
hueniverse opened this Issue · 1 comment

2 participants

Eran Hammer Isaac Z. Schlueter
Eran Hammer

Parsing of URIs with valid single-quote character in path or query fails. For example:

http://x/path?message=that's&x=4#frag

RFC 3986 allows the following characters:

  pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
  query       = *( pchar / "/" / "?" )

However the delims variable incorrectly includes single-quote which is allowed:

delims = ['<', '>', '"', '\'', '`', /\s/],

The delims variable is used here:

  // chop off any delim chars.
  if (!unsafeProtocol[lowerProto]) {
    var chop = rest.length;
    for (var i = 0, l = delims.length; i < l; i++) {
      var c = rest.indexOf(delims[i]);
      if (c !== -1) {
        chop = Math.min(c, chop);
      }
    }
    rest = rest.substr(0, chop);
    out.href += rest;
  }

Which is messing with the path, query, and fragment, even though that code was added to make the host parsing safe per issue 711.

I'm not sure what this code is actually for, but it is chopping off valid URI components if they include a legal single-quote.

Isaac Z. Schlueter
Collaborator

Got a patch that fixes this, and found a few other little nits while I was in there. Update forthcoming.

Isaac Z. Schlueter isaacs closed this issue from a commit
Isaac Z. Schlueter isaacs Close #954 URL parsing/formatting corrections
1. Allow single-quotes in urls, but escape them.
2. Add comments about which RFCs we're following for guidance.
3. Handle any invalid character in the hostname portion.
4. lcase protocol and hostname portions, since they are
case-insensitive.
90802d6
Isaac Z. Schlueter isaacs closed this in 90802d6
Isaac Z. Schlueter isaacs referenced this issue from a commit
Isaac Z. Schlueter isaacs Fix a url regression
The change for #954 introduced a regression that would cause
the url parser to fail on special chars found in the auth
segment.  Fix that, and also don't create invalid urls when
format() is called on an object containing an auth member
containing '@' characters or delimiters.
307f39c
Isaac Z. Schlueter isaacs referenced this issue from a commit in isaacs/node
Isaac Z. Schlueter isaacs Fix a url regression
The change for #954 introduced a regression that would cause
the url parser to fail on special chars found in the auth
segment.  Fix that, and also don't create invalid urls when
format() is called on an object containing an auth member
containing '@' characters or delimiters.
b1fa63d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.