Skip to content
This repository has been archived by the owner on Nov 6, 2022. It is now read-only.

Do not accept PUN/GEM methods as PUT/GET. #156

Closed
wants to merge 1 commit into from

Conversation

chrisdickinson
Copy link
Contributor

  • Encountering them returns an error, HPE_INVALID_METHOD
  • Tests have been added.

There's not a clear distinction between what erroneous methods should trigger HPE_INVALID_METHOD vs. HPE_UNKNOWN (there are extant tests for HPE_UNKNOWN against methods like "C*****"), advice would be appreciated. Tested with make test on OSX 10.7.5.

This fixes nodejs/node-v0.x-archive#6078.

* Encountering them returns an error, `HPE_INVALID_METHOD`
* Tests have been added.
@flexd
Copy link

flexd commented Aug 20, 2013

@chrisdickinson This does not at all solve the problem described at nodejs/node-v0.x-archive#6078

Node is doing HTTP parsing completely the wrong the way, which I guess is for performance but it's silly stupid and should be fixed. And not by adding more if-logic and goto statements.

@chrisdickinson
Copy link
Contributor Author

@flexd Ah, good point -- this does not add LINK and UNLINK; rather it addresses the immediate problem of accepting unacceptable HTTP methods.

To your second statement, given certain constraints, yes, it could be argued that http_parser is wrong. It would be worthwhile to drop in another existing HTTP parser that "does it right" and do an apples-to-apples exhaustive comparison.

However, one bug -- or even one gnarly part -- does not a bad library make. From 0, it was easy to jump into http-parser and make the required fix. It presents a zero-allocation, zero-syscall, conditionally compliant handwritten streaming HTTP parser (NB: extension verbs are optional, as is the 501 response -- which is the purview of node itself, in this case).

I'll see about adding LINK as a separate PR.

@bnoordhuis
Copy link
Member

@chrisdickinson LGTM, thanks. Can you sign the CLA? It's a different one from the node.js CLA.

@flexd Constructive criticism is welcome but if 'it's silly stupid and should be fixed' is all you have to offer, then please stop now.

I'll see about adding LINK as a separate PR.

LINK and UNLINk are still drafts. I won't be accepting pull requests for that.

@flexd
Copy link

flexd commented Aug 20, 2013

@bnoordhuis My point was more that I do not think adding more complexity to this is the best way of solving the problem.
This seems to me at least to be a less than optimal approach, especially since it in the first place allowed this kind of thing to be possible. Adding if-else to be able to catch someone doing a PUN or whatever other unacceptable/non-existing HTTP method will just lead to more bits that could go wrong and more places where there could be logic error.

As someone mentioned in the HN discussion (https://news.ycombinator.com/item?id=6240437), the whole thing could be replaced with a Ragel parser.

What exactly is the overhead of checking just one character VS the whole verb? Has it really come to strncmp being the bottleneck here?

Edit: English.

@bnoordhuis
Copy link
Member

My plan is to eventually phase out http_parser in favor of a pure JS parser (if only because then I can hand off maintenance to others) but that depends on either me finding the time for it (unlikely in the short/medium term) or someone stepping up.

As someone mentioned in the HN discussion (https://news.ycombinator.com/item?id=6240437), the whole thing could be replaced with a Ragel parser.

I wrote a JS backend for Ragel a while ago and did some experiments but the results were inconclusive. (That is, the JS parser did pretty bad but then I didn't spend days on it.)

What exactly is the overhead of checking just one character VS the whole verb? Has it really come to strncmp being the bottleneck here?

http_parser is a state machine. It has to process characters one at a time because the input may be split over multiple messages.

@jjclark1982
Copy link

It would be nice to treat all unrecognized method names the same, instead of simply checking the first three characters to rule out PUN and GEM but still misassign PURSE and UNSURPRISED.

Edit: never mind, I see that those cases should be caught by

        } else if (ch == matcher[parser->index]) {
        ...
        } else {
          SET_ERRNO(HPE_INVALID_METHOD);
          goto error;
        }

@bnoordhuis
Copy link
Member

As someone mentioned in the HN discussion (https://news.ycombinator.com/item?id=6240437), the whole thing could be replaced with a Ragel parser.

Ah, I've read the HN thread now (and feeling dumber for it). The Internet peanut gallery is out in full force and as clueless as ever, it seems...

To make a long story short: very old versions of http_parser were based on a Ragel grammar but the handwritten version outperformed it by a significant margin, as in over 10% fewer branches taken.

Ragel is great but it can't take shortcuts like a human programmer can and once you start jumping around in your grammar, things become complicated fast (and impossible to debug.)

@jamesamcl
Copy link

My plan is to eventually phase out http_parser in favor of a pure JS parser (if only because then I can hand off maintenance to others)

Does Joyent intend on maintaining http-parser independently of Node? Or are you basically saying this is going to be abandoned?

@isaacs
Copy link

isaacs commented Aug 20, 2013

Does Joyent intend on maintaining http-parser independently of Node? Or are you basically saying this is going to be abandoned?

What do you mean when you say "abandoned"? As it is, no one works on this full-time, and it gets only a few commits a year.

If Node isn't using it, then we'll probably just call this project "done" (as it effectively already is), and stop worrying about it. It's a good piece of software, and some other projects find it useful. We won't stop taking pull requests that improve it, if that's what you mean.

@chrisdickinson
Copy link
Contributor Author

@bnoordhuis Signed! Thanks!

@laino
Copy link

laino commented Aug 20, 2013

Can someone explain why node is trying to internally compare the HTTP Methods against known values at all?
I mean, why not just extract the value and hand it to the user?

99% of all HTTP servers in node do compare the HTTP method to some other value in pure JS anyways. Something along the lines of:

if(req.method === 'GET'){
    // Handle GET
}

So trying to squeeze a few nanoseconds by checking it in the http parser is not going to make much of a difference. It will be even less noticeable as node is already extracting dozens of strings from the request anyways (the url, 2 strings for each header...).

@isaacs
Copy link

isaacs commented Aug 21, 2013

Can someone explain why node is trying to internally compare the HTTP Methods against known values at all?

Because it's faster, and trivial to make secure.

I mean, why not just extract the value and hand it to the user?

Because it'd have to add logic to limit the length of the method, as well as take many more code paths in the parser itself.

So trying to squeeze a few nanoseconds by checking it in the http parser is not going to make much of a difference.

Have you benchmarked the difference?

@bnoordhuis
Copy link
Member

Thanks Chris, landed in c6ee6ad. I added some more request method checks in 547553b. Thanks again!

@bnoordhuis bnoordhuis closed this Aug 21, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LINK and UNLINK HTTP Methods are not-supported
7 participants