400 on malformed URIs #1352

Closed
tj opened this Issue Oct 2, 2012 · 26 comments

7 participants

@tj
expressjs member
tj commented Oct 2, 2012

nicer than a 500

@JohnMcLear

Thanks

@chriso

An alternative would be to use the require('querystring').unescapeBuffer() in lib/router/route.js instead of decodeURIComponent.

@defunctzombie

@visionmedia do you want to expand on this a bit or provide a test case? Or is this no longer an issue you care about?

@rlidwka
expressjs member

GET /% HTTP/1.0

still an issue

@rlidwka
expressjs member

Hmm, in fact no, it's not. It's "fixed" by 288176b / #1734

And now it just silently ignores this error.

Shouldn't it throw 400 instead? It's kinda creepy that /%20foobar is parsed as / foobar, but /%20foobar% is parsed as is.

@jonathanong
expressjs member

oh, that's a weird case. how do you suggest we handle that? in my app, i always respond with a 400 if req.path !== decodeURIComponent(req.path) or if that throws, but that would probably be too strict for everyone.

@rlidwka
expressjs member

how do you suggest we handle that?

Obviously, two choices here:

  1. swallow the error and let people who want to be strict write their own middleware (or a config option)
  2. respond with 400 and and let people who don't want to be strict write their own middleware (or a config option)

After a bit of thought, I don't know what's better any more.

@chriso

There's a method in the querystring module which is a safe alternative to decodeURIComponent.

~ $ node -pe 'require("querystring").unescapeBuffer("/%20foobar%").toString()'
/ foobar%
~ $ node -pe 'decodeURIComponent("/%20foobar%/")'

[eval]:1
decodeURIComponent("/%20foobar%/")
^
URIError: URI malformed
    at decodeURIComponent (native)
    at [eval]:1:1
    at Object.<anonymous> ([eval]-wrapper:6:22)
    at Module._compile (module.js:456:26)
    at evalScript (node.js:532:25)
    at startup (node.js:80:7)
    at node.js:901:3
@jonathanong
expressjs member

looks private though. why not just use .unescape() then?

@chriso

Actually it looks like a vanilla unescape() does the same thing

~ $ node -pe 'unescape("/%20foobar%/")'
/ foobar%/
@chriso

Nginx seems to respond with a 400 - http://node.io/%20foobar%/

@jonathanong
expressjs member

i don't mind moving to that function since it removes the need for a try/catch. but does anyone know which one is "more correct"? test cases of how it should be handled would be great.

@dougwilson
expressjs member

Whatever things like Apache, Nginx, et. al. does is "more correct" i.m.o. If They are going with 400, then connect probably should too.

@rlidwka
expressjs member
  • unescape is less strict, it also accepts neat '%uXXXX' escapes for unicode characters, otherwise it seem to have bad unicode handling
  • querystring.unescapeBuffer is pretty simple, still better than ignoring throws
  • decodeURIComponent throws at everything

From my point of view, if you want to be strict, use decodeURIComponent and return 400. If you don't want to be strict, probably use querystring.unescapeBuffer.

Things to look for:

  • "+"
  • "%c" (unescapeBuffer swallows "c")
  • "%c0%80" (incorrect unicode)
  • "%d1%8b" (correct unicode)
  • "%u1234" (unescape extension)
@rlidwka
expressjs member

Here is a why unescape shouldn't be used:

> ['\u03b1', new Buffer('\u03b1')]
[ 'α', <Buffer ce b1> ]
> escape('\u03b1')
'%u03B1'
> unescape('%u03B1')
'α'
> unescape('%ce%b1')
'α'
> decodeURI('%ce%b1')
'α'
@dougwilson
expressjs member

unescape does not actually follow the schematics for URLs (like supporting an encoding like %uXXXX), so this shouldn't be used here. This is even discussed in https://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations

@rlidwka
expressjs member

unescape does not actually follow the schematics for URLs (like supporting an encoding like %uXXXX), so this shouldn't be used here.

%uXXXX is not the issue. If we'll support this, nothing would break, and this is a lovely syntax by the way.

Bad unicode support, however, is the issue because it will break all existing browsers. This is the only reason why unescape shouldn't be used, not because some standards say so.

@dougwilson
expressjs member

The problem with suddenly accepting %uXXXX by the way is suddenly people will realize they can start sending URLs like this to connect servers.

Anyway, unescape can give you Unicode with the help of Buffer: (new Buffer(unescape('%ce%b1'), 'binary')).toString('utf8').

But like I said, accepting something weird now will just make people get used to it being there, even when it shouldn't.

@dougwilson
expressjs member

Also, if you are simply looking for a URI-compliment and lenient method, use require('querystring').unescape('%ce%b1') which is a part of node and a public function.

@rlidwka
expressjs member

The problem with suddenly accepting %uXXXX by the way is suddenly people will realize they can start sending URLs like this to connect servers.

Looks good to me. If more people start to use something, it will more likely be a standard. As far as I know, server can process URL path as it wants, and such feature is not violating anything.

Anyway, unescape can give you Unicode with the help of Buffer:

binary encoding is kinda deprecated, and I smell issues like %ce%b1%u03b1 being processed incorrectly.

require('querystring').unescape

http://nodejs.org/api/querystring.html#querystring_querystring_unescape

Yes, and internally it's just an alias for unescapeBuffer. I think it's the best solution we have here (unless you want to be strict).

@dougwilson
expressjs member

At the very least, we need to switch to require('querystring').unescape over the current behavior, because leaving valid escapes unescaped if there is a single error anywhere seems very counter-intuitive.

@jonathanong
expressjs member

@rlidwka @dougwilson did you guys decide on the best course of action? got kind of lost in this conversation, but i'd still like to see this issue closed

@rlidwka
expressjs member

Yep, if you want to ignore errors, querystring.unescape looks better than anything else mentioned.

@jonathanong
expressjs member

@rlidwka want to open a PR for it? otherwise I can do it, but probably not until v4 comes around.

@rlidwka
expressjs member

@jonathanong , suggesting to close this one, because there is no more bikeshedding to do around here. :(

@jonathanong
expressjs member

oh yeah! woo!

@dougwilson dougwilson locked and limited conversation to collaborators Feb 2, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.