-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Forcing a pause at the end of HTTP headers #97
Comments
Confirmed. It's annoying: while it's obviously a bug, fixing it breaks a number of assumptions that http-parser users may have made about the order of events or when it's safe to call the parser (evidenced by the number of tests the fix breaks). Rock, meet hard place. EDIT: I've added a failing test in bnoordhuis/http-parser@e3028fe. |
I've hacked a workaround in AndreLouisCaron/httpxx@608af01, but it's really ugly. I have to manually fix off-by-one errors because of this bug, and the count is unreliable until the If someone has a clear idea on how to fix this issue without breaking everything, I might be able to put in some time to fix it. |
Yeah, this is tricky to fix because a single byte can trigger multiple callbacks, and if a callback other than the last results in the parser being paused, we need a way to track that these addition callbacks need to be executed. We're doing this now by not consuming the byte until we're done with all of the callbacks that it should trigger. For callbacks where we do not re-execute the same byte, this isn't an issue, of course. We could fix this to return that all bytes were consumed (e.g. by stashing the last byte in Mercifully, these changes in execution behavior might okay-ish because these assumptions about return value/byte consumption only break if you're using Any other ideas @bnoordhuis, @ry or @clifffrey? |
I'll go out on a limb here, but it seems to me that we're approaching the problem from the wrong angle. Rather than trying to fix the thorny issue of marking the last byte, wouldn't it be easier to play on the pausing semantics? I don't really care if more callbacks are invoked after I pause the parser (e.g. Pausing is currently implemented by using a special error code, which happens to be temporary (the client can reset it by calling Any ideas on this? |
Is node (or any other library) relying on http-parser not executing |
I don't really care if more callbacks are invoked after I pause the parser (e.g. on_message_complete() for an empty message), so long as it pauses "soon". I think having a immediate, non-advisory pause is pretty useful, particularly as evidenced by the Node community's continued struggles to deal with its absence: the fact that That said, there's certainly a trade-off between implementation complexity and value, but I don't think we've crossed that threshold. The real issue here it that the parser does not distinguish between HPE_PAUSED and real error codes. Is your expectation that calling |
Is node (or any other library) relying on http-parser not executing on_message_complete after pausing from inside on_headers_complete? Not that I know of. AFAIK there are very few users of If not, I think the easiest 'fix' would be to implement pause as "don't consume any more bytes (but call whatever callbacks are necessary)" instead of "stop doing anything". Yeah, that's certainly an option. I do think it's worth spending some amount of effort to get the API to work with the more rigid semantics, though, as this provides a more powerful API. I'll spend some on that this weekend and see how far I get. |
Couldn't we store the pause state somewhere explicitly? It would be more I think that pause needs to be absolute, it would be very surprising to a On Mon, Feb 27, 2012 at 11:50 AM, Peter Griess <
|
Sorry if I wasn't clear when I said I wanted the parser to stop "soon". This is what I had in mind. |
I've worked around the issue to implement exactly this in AndreLouisCaron/httpxx@fbd0422. It only works for processing the last byte after the headers, but it fixes the only instance of this problem I've encountered so far (somehow, pausing in I would still prefer this was fixed in |
I think everyone agrees that it should be fixed on our side, the question is how? |
I looked at bnoordhuis's new test case and it prevents pausing in I pushed a sample fix that just addresses the specific case of letting
First, we need to define the semantics of pausing. Should pausing be allowed all callbacks, or should it be limited to specific cases? AFAICT, the only interesting places to call this are in Also, in what (other) cases is this a problem? So far, the only problematic instance is for Since it already works for the |
I feel like I'm missing something.. why is it that "... empty requests On Tue, Feb 28, 2012 at 9:21 PM, Andr Caron <
|
After further reading of the code, it seems to me that the only risky bits are parser states that use
FWIW, the latter is our problematic case and the former already has somewhat of a hack: /* Mimic CALLBACK_DATA_NOADVANCE() but with one extra byte.
*
* The alternative to doing this is to wait for the next byte to
* trigger the data callback, just as in every other case. The
* problem with this is that this makes it difficult for the test
* harness to distinguish between complete-on-EOF and
* complete-on-length. It's not clear that this distinction is
* important for applications, but let's keep it for now.
*/
CALLBACK_DATA_(body, p - body_mark + 1, p - data);
goto reexecute_byte; |
Hadn't thought of introducing anoher state! However, I don't think this will work, since the byte will no longer available on the next call to |
Yeah, @AndreLouisCaron is right that the particular devil here is the extra byte that you need to stash: where to stash it, and how to kick off parsing with it. I don't think we need to add any more parser states to accomplish this, as there is already a state for every place where we might have paused (i.e. after every callback). We just need to tweak these states to look at the stashed byte. I'm starting to think that renaming |
@indutny @bnoordhuis ... any reason to keep this open? |
I'm about to open a new issue. I'd like my C++ wrapper to operate the parser in two stages, first to process the message header as a complete unit, and then to process the body. Returning 1 from I'm going to try using the pause feature to implement two stage parsing. Was the bug described fixed? Am I going to encounter a problem with "that last byte?" Is there another way to implement the functionality I desire? |
I've implemented a simple C++ wrapper for the
http-parser
library and ran into a problem when using the parser pause feature.Basically, I have a
Request
object that wraps ahttp_parser
instance. This request object has afeed()
method which invokeshttp_parser_execute()
. The request object also has aheaders_complete()
which reports the end of HTTP headers. To make this test reliable (mainly for proxying purposes), I forcehttp_parser_execute()
to return by pausing the parser callinghttp_parser_pause()
from theon_headers_complete()
callback. All this seemingly works, except that it doesn't consume the last byte of a request without a body. To finish processing the request, I have to callhttp_parser_execute()
again with the last byte of data.Now, calling
feed()
/http_parser_execute()
one extra time is not that much of a problem, except that it makes it prevents clients from accurately finding the exact position of the end of HTTP headers. In a proxying scenario (HTTP proxy, CGI/SCGI/FastCGI or even WebSockets), where you want to forward the body byte-for-byte, clients end up prefixing the HTTP request body with an extra byte.This was reported to me on the
httpxx
issue tracker, but I believe it's an issue inhttp-parser
. Basically, theon_headers_complete()
callback is called upon examining the last byte of the header data, and pausing in that callback prevents the library from marking the last byte (the one the triggered the callback) as consumed.Visit the
httpx
issue #5 for a detailed discussion. If it's unclear, I might be able to concoct an SSCCE that illustrates the problem.Note: this issue seems related to pull request 89.
The text was updated successfully, but these errors were encountered: