Reject routes containing an unescaped space character #2511

bajtos · 2015-01-23T06:50:52Z

At the moment, when the developer provides a route path containing raw characters that must escaped by HTTP clients, for example a space character, the route is registered by express even though it will never match any request (URL):

var app = express();
app.get('/foo bar', function(req, res) { /* ... */ });

// clients send `GET /foo%20bar`, server responds with 404

Such situation is difficult to troubleshoot, especially if the invalid character was included by mistake.

I am proposing to modify Express and/or path-to-regexp to print a warning when a string path argument contains characters that will be always sent in the encoded form by HTTP clients. RFC3986 allows some characters to be sent either encoded or unencoded. Such characters should be accepted as valid (no error/warning).

In other words, only the following characters may be present in the unencoded form:

Unreserved characters: ALPHA / DIGIT / "-" / "." / "_" / "~"
Reserved characters: ":" / "/" / "?" / "#" / "[" / "]" / "@" / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

@dougwilson thoughts? StrongLoop can contribute the change, I just want to check with you that such change will be accepted.

I was initially thinking that express should throw an error instead of printing a warning, but since such change may be considered as breaking backwards compatibility, it's probably better to stay with the warning.

/cc @PradnyaBaviskar

The text was updated successfully, but these errors were encountered:

dougwilson · 2015-01-23T07:06:00Z

That's fine, but RFC 3986 does not dictate what character may appear in the req.url string, which is what the path matches against, RFC 7230 does:

request-line   = method SP request-target SP HTTP-version CRLF
request-target = origin-form
               / absolute-form
               / authority-form
               / asterisk-form
origin-form    = absolute-path [ "?" query ]
absolute-form  = absolute-URI
authority-form = authority
asterisk-form  = "*"
absolute-path = 1*( "/" segment )
segment       = *pchar
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="
query         = *( pchar / "/" / "?" )
absolute-URI  = scheme ":" hier-part [ "?" query ]
scheme        = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
hier-part   = "//" authority path-abempty
            / path-absolute
            / path-rootless
            / path-empty
authority   = [ userinfo "@" ] host [ ":" port ]
userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
host        = IP-literal / IPv4address / reg-name
IP-literal    = "[" ( IPv6address / IPvFuture  ) "]"
IPvFuture     = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
IPv6address   =                            6( h16 ":" ) ls32
              /                       "::" 5( h16 ":" ) ls32
              / [               h16 ] "::" 4( h16 ":" ) ls32
              / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
              / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
              / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
              / [ *4( h16 ":" ) h16 ] "::"              ls32
              / [ *5( h16 ":" ) h16 ] "::"              h16
              / [ *6( h16 ":" ) h16 ] "::"
port          = *DIGIT
path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
segment-nz    = 1*pchar
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>

I'm OK with throwing on anything that cannot be request-target. The change will have to be in path-to-regexp, in the 0.1.x series and we can pull it into Express 4.x. A change to Express 3.x would be a direct change to this repository, though.

dougwilson · 2015-01-23T07:08:38Z

As far as throw vs warning, it would be probably be better to do warning in the existing majors, but even then, there is no proper warning framework for Node.js, so I would say we should deprecate it rather than warn on it (because then that opens us to to throw in 5.x).

bajtos · 2015-01-23T07:18:12Z

I'm OK with throwing on anything that cannot be request-target. The change will have to be in path-to-regexp

Cool!

Implementing a fully-compliant request-target parser seems to me like a too much work with low return. Do you have any objections if we simplify the implementation to only check for characters that are not allowed in any part of request-target?

That way we won't reject some invalid targets (e.g. a scheme containing an underscore character), but we will still accept all valid targets and the implementation will be vastly simpler.

The change will have to be in path-to-regexp, in the 0.1.x series and we can pull it into Express 4.x. A change to Express 3.x would be a direct change to this repository, though.

As far as I am concerned, it is enough to fix this problem in path-to-regexp and Express 4.x.

We should deprecate it rather than warn on it (because then that opens us to to throw in 5.x).

Makes sense, let's do that.

dougwilson · 2015-01-23T07:19:32Z

Do you have any objections if we simplify the implementation to only check for characters that are not allowed in any part of request-target?

Right, that's what I was getting at. I just gave the spec so you would know when making the blacklist exactly what is on the whitelist :)

dougwilson · 2015-01-23T07:24:19Z

And if it's, for some reason, unfeasible to get a change that works will into path-to-regexp 0.1.x itself, we could probably compromise on just running it in the Layer constructor.

bajtos · 2015-01-23T07:46:54Z

We have discussed the matter with @dougwilson:

Path parameter names don't follow the request-target spec, e.g. app.get(“/:šošovica”, fn) is a valid path. If it turns out that the implementation supporting this exception is too difficult, then we can simplify the solution and reject only whitespace characters, since they are not allowed anywhere in the path (including parameters).

dougwilson · 2015-02-18T06:32:14Z

3.20.0, the last 3.x that will ever have an enhancement like this, will be released 2/18. Without an implementation for 3.x before then, this certainly won't make it into the 3.x series.

dougwilson · 2015-02-18T19:54:40Z

I'm bumping this from the 3.x series.

bajtos · 2015-02-19T09:23:49Z

I'm bumping this from the 3.x series.

@dougwilson that's fine 👍

blakeembrey · 2015-02-26T15:08:09Z

Interestingly, this brings up a slightly different conversation that might be interesting to have. Should paths be automatically url encoded? For example, supporting unicode characters and accents in the url? Right now people would need to write out 我 as %E6%88%91 in the path string. We could easily allow unicode expansion and I think it'd be useful for international audiences. In that respect, would automatically expand to%20. Discussion can continue with pillarjs/path-to-regexp#42.

dougwilson added the discuss label Jan 23, 2015

dougwilson added 4.x 3.x enhancement labels Jan 23, 2015

dougwilson self-assigned this Jan 23, 2015

dougwilson added 5.x and removed 3.x labels Feb 18, 2015

PradnyaBaviskar mentioned this issue Feb 26, 2015

Deprecate usage of white-space character in path pillarjs/path-to-regexp#41

Closed

dougwilson mentioned this issue Feb 28, 2015

space in path regex not working #2572

Closed

dougwilson added the module:router label Feb 16, 2016

255kb mentioned this issue Nov 7, 2018

Escape Endpoint mockoon/mockoon#73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject routes containing an unescaped space character #2511

Reject routes containing an unescaped space character #2511

bajtos commented Jan 23, 2015

dougwilson commented Jan 23, 2015

dougwilson commented Jan 23, 2015

bajtos commented Jan 23, 2015

dougwilson commented Jan 23, 2015

dougwilson commented Jan 23, 2015

bajtos commented Jan 23, 2015

dougwilson commented Feb 18, 2015

dougwilson commented Feb 18, 2015

bajtos commented Feb 19, 2015

blakeembrey commented Feb 26, 2015

Reject routes containing an unescaped space character #2511

Reject routes containing an unescaped space character #2511

Comments

bajtos commented Jan 23, 2015

dougwilson commented Jan 23, 2015

dougwilson commented Jan 23, 2015

bajtos commented Jan 23, 2015

dougwilson commented Jan 23, 2015

dougwilson commented Jan 23, 2015

bajtos commented Jan 23, 2015

dougwilson commented Feb 18, 2015

dougwilson commented Feb 18, 2015

bajtos commented Feb 19, 2015

blakeembrey commented Feb 26, 2015