-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Do not accept PUN/GEM methods as PUT/GET. #156
Conversation
* Encountering them returns an error, `HPE_INVALID_METHOD` * Tests have been added.
@chrisdickinson This does not at all solve the problem described at nodejs/node-v0.x-archive#6078 Node is doing HTTP parsing completely the wrong the way, which I guess is for performance but it's silly stupid and should be fixed. And not by adding more if-logic and goto statements. |
@flexd Ah, good point -- this does not add To your second statement, given certain constraints, yes, it could be argued that However, one bug -- or even one gnarly part -- does not a bad library make. From 0, it was easy to jump into http-parser and make the required fix. It presents a zero-allocation, zero-syscall, conditionally compliant handwritten streaming HTTP parser (NB: extension verbs are optional, as is the 501 response -- which is the purview of node itself, in this case). I'll see about adding LINK as a separate PR. |
@chrisdickinson LGTM, thanks. Can you sign the CLA? It's a different one from the node.js CLA. @flexd Constructive criticism is welcome but if 'it's silly stupid and should be fixed' is all you have to offer, then please stop now.
LINK and UNLINk are still drafts. I won't be accepting pull requests for that. |
@bnoordhuis My point was more that I do not think adding more complexity to this is the best way of solving the problem. As someone mentioned in the HN discussion (https://news.ycombinator.com/item?id=6240437), the whole thing could be replaced with a Ragel parser. What exactly is the overhead of checking just one character VS the whole verb? Has it really come to strncmp being the bottleneck here? Edit: English. |
My plan is to eventually phase out http_parser in favor of a pure JS parser (if only because then I can hand off maintenance to others) but that depends on either me finding the time for it (unlikely in the short/medium term) or someone stepping up.
I wrote a JS backend for Ragel a while ago and did some experiments but the results were inconclusive. (That is, the JS parser did pretty bad but then I didn't spend days on it.)
http_parser is a state machine. It has to process characters one at a time because the input may be split over multiple messages. |
It would be nice to treat all unrecognized method names the same, instead of simply checking the first three characters to rule out PUN and GEM but still misassign PURSE and UNSURPRISED. Edit: never mind, I see that those cases should be caught by
|
Ah, I've read the HN thread now (and feeling dumber for it). The Internet peanut gallery is out in full force and as clueless as ever, it seems... To make a long story short: very old versions of http_parser were based on a Ragel grammar but the handwritten version outperformed it by a significant margin, as in over 10% fewer branches taken. Ragel is great but it can't take shortcuts like a human programmer can and once you start jumping around in your grammar, things become complicated fast (and impossible to debug.) |
Does Joyent intend on maintaining http-parser independently of Node? Or are you basically saying this is going to be abandoned? |
What do you mean when you say "abandoned"? As it is, no one works on this full-time, and it gets only a few commits a year. If Node isn't using it, then we'll probably just call this project "done" (as it effectively already is), and stop worrying about it. It's a good piece of software, and some other projects find it useful. We won't stop taking pull requests that improve it, if that's what you mean. |
@bnoordhuis Signed! Thanks! |
Can someone explain why node is trying to internally compare the HTTP Methods against known values at all? 99% of all HTTP servers in node do compare the HTTP method to some other value in pure JS anyways. Something along the lines of: if(req.method === 'GET'){
// Handle GET
} So trying to squeeze a few nanoseconds by checking it in the http parser is not going to make much of a difference. It will be even less noticeable as node is already extracting dozens of strings from the request anyways (the url, 2 strings for each header...). |
Because it's faster, and trivial to make secure.
Because it'd have to add logic to limit the length of the method, as well as take many more code paths in the parser itself.
Have you benchmarked the difference? |
HPE_INVALID_METHOD
There's not a clear distinction between what erroneous methods should trigger
HPE_INVALID_METHOD
vs.HPE_UNKNOWN
(there are extant tests forHPE_UNKNOWN
against methods like "C*****"), advice would be appreciated. Tested withmake test
on OSX 10.7.5.This fixes nodejs/node-v0.x-archive#6078.