Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject routes containing an unescaped space character #2511

Open
bajtos opened this issue Jan 23, 2015 · 10 comments
Open

Reject routes containing an unescaped space character #2511

bajtos opened this issue Jan 23, 2015 · 10 comments

Comments

@bajtos
Copy link

bajtos commented Jan 23, 2015

At the moment, when the developer provides a route path containing raw characters that must escaped by HTTP clients, for example a space character, the route is registered by express even though it will never match any request (URL):

var app = express();
app.get('/foo bar', function(req, res) { /* ... */ });

// clients send `GET /foo%20bar`, server responds with 404

Such situation is difficult to troubleshoot, especially if the invalid character was included by mistake.

I am proposing to modify Express and/or path-to-regexp to print a warning when a string path argument contains characters that will be always sent in the encoded form by HTTP clients. RFC3986 allows some characters to be sent either encoded or unencoded. Such characters should be accepted as valid (no error/warning).

In other words, only the following characters may be present in the unencoded form:

  • Unreserved characters: ALPHA / DIGIT / "-" / "." / "_" / "~"
  • Reserved characters: ":" / "/" / "?" / "#" / "[" / "]" / "@" / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

@dougwilson thoughts? StrongLoop can contribute the change, I just want to check with you that such change will be accepted.

I was initially thinking that express should throw an error instead of printing a warning, but since such change may be considered as breaking backwards compatibility, it's probably better to stay with the warning.

/cc @PradnyaBaviskar

@dougwilson
Copy link
Contributor

That's fine, but RFC 3986 does not dictate what character may appear in the req.url string, which is what the path matches against, RFC 7230 does:

request-line   = method SP request-target SP HTTP-version CRLF
request-target = origin-form
               / absolute-form
               / authority-form
               / asterisk-form
origin-form    = absolute-path [ "?" query ]
absolute-form  = absolute-URI
authority-form = authority
asterisk-form  = "*"
absolute-path = 1*( "/" segment )
segment       = *pchar
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="
query         = *( pchar / "/" / "?" )
absolute-URI  = scheme ":" hier-part [ "?" query ]
scheme        = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
hier-part   = "//" authority path-abempty
            / path-absolute
            / path-rootless
            / path-empty
authority   = [ userinfo "@" ] host [ ":" port ]
userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
host        = IP-literal / IPv4address / reg-name
IP-literal    = "[" ( IPv6address / IPvFuture  ) "]"
IPvFuture     = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
IPv6address   =                            6( h16 ":" ) ls32
              /                       "::" 5( h16 ":" ) ls32
              / [               h16 ] "::" 4( h16 ":" ) ls32
              / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
              / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
              / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
              / [ *4( h16 ":" ) h16 ] "::"              ls32
              / [ *5( h16 ":" ) h16 ] "::"              h16
              / [ *6( h16 ":" ) h16 ] "::"
port          = *DIGIT
path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
segment-nz    = 1*pchar
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>

I'm OK with throwing on anything that cannot be request-target. The change will have to be in path-to-regexp, in the 0.1.x series and we can pull it into Express 4.x. A change to Express 3.x would be a direct change to this repository, though.

@dougwilson
Copy link
Contributor

As far as throw vs warning, it would be probably be better to do warning in the existing majors, but even then, there is no proper warning framework for Node.js, so I would say we should deprecate it rather than warn on it (because then that opens us to to throw in 5.x).

@bajtos
Copy link
Author

bajtos commented Jan 23, 2015

I'm OK with throwing on anything that cannot be request-target. The change will have to be in path-to-regexp

Cool!

Implementing a fully-compliant request-target parser seems to me like a too much work with low return. Do you have any objections if we simplify the implementation to only check for characters that are not allowed in any part of request-target?

That way we won't reject some invalid targets (e.g. a scheme containing an underscore character), but we will still accept all valid targets and the implementation will be vastly simpler.

The change will have to be in path-to-regexp, in the 0.1.x series and we can pull it into Express 4.x. A change to Express 3.x would be a direct change to this repository, though.

As far as I am concerned, it is enough to fix this problem in path-to-regexp and Express 4.x.

We should deprecate it rather than warn on it (because then that opens us to to throw in 5.x).

Makes sense, let's do that.

@dougwilson
Copy link
Contributor

Do you have any objections if we simplify the implementation to only check for characters that are not allowed in any part of request-target?

Right, that's what I was getting at. I just gave the spec so you would know when making the blacklist exactly what is on the whitelist :)

@dougwilson
Copy link
Contributor

And if it's, for some reason, unfeasible to get a change that works will into path-to-regexp 0.1.x itself, we could probably compromise on just running it in the Layer constructor.

@bajtos
Copy link
Author

bajtos commented Jan 23, 2015

We have discussed the matter with @dougwilson:

Path parameter names don't follow the request-target spec, e.g. app.get(“/:šošovica”, fn) is a valid path. If it turns out that the implementation supporting this exception is too difficult, then we can simplify the solution and reject only whitespace characters, since they are not allowed anywhere in the path (including parameters).

@dougwilson
Copy link
Contributor

3.20.0, the last 3.x that will ever have an enhancement like this, will be released 2/18. Without an implementation for 3.x before then, this certainly won't make it into the 3.x series.

@dougwilson
Copy link
Contributor

I'm bumping this from the 3.x series.

@dougwilson dougwilson added 5.x and removed 3.x labels Feb 18, 2015
@bajtos
Copy link
Author

bajtos commented Feb 19, 2015

I'm bumping this from the 3.x series.

@dougwilson that's fine 👍

@blakeembrey
Copy link
Member

Interestingly, this brings up a slightly different conversation that might be interesting to have. Should paths be automatically url encoded? For example, supporting unicode characters and accents in the url? Right now people would need to write out 我 as %E6%88%91 in the path string. We could easily allow unicode expansion and I think it'd be useful for international audiences. In that respect, would automatically expand to%20. Discussion can continue with pillarjs/path-to-regexp#42.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants