fix handling of long HTTP header lines #65

Merged
merged 1 commit into from Nov 1, 2011

Conversation

Projects
None yet
2 participants
@vinoski
Contributor

vinoski commented Oct 28, 2011

Due to Erlang bug OTP-9389, the handling of HTTP request header lines
that exceed the size of the gen_tcp receive buffer is broken when
using {packet,http} and {packet,httph} modes.

Modify mochiweb to work around this bug by first using {packet,line}
mode instead to collect request header lines and then using
erlang:decode_packet to parse them. Using {packet,line} allows us to
avoid reading past the headers into the request body (if present),
since doing that would require us to carry the body data from the
header handling functions over into the body handling functions, which
could be a pretty intrusive and complicated change.

Also add regression unit tests that verify that long HTTP request
lines and long HTTP header lines are handled correctly.

@etrepum

This comment has been minimized.

Show comment Hide comment
@etrepum

etrepum Oct 28, 2011

Owner

Tests pass, but I believe there is a small problem in collect_headers/5 - it will accumulate an infinite number of headers before moving on to parse_headers/6, and the ?MAX_HEADERS check will only work if it does this one at a time.

We could do better by doing the ?MAX_HEADERS check in collect_headers/5 before even parsing them. Since we've also changed the semantics of parsing a header to allow one of infinite length there's probably a different/better check on the amount of bytes we should allow of HTTP headers before it's considered to be invalid.

For Basho's use case, how big is that? Maybe this is something that needs to be parameterized rather than fixed at compile time.

Owner

etrepum commented Oct 28, 2011

Tests pass, but I believe there is a small problem in collect_headers/5 - it will accumulate an infinite number of headers before moving on to parse_headers/6, and the ?MAX_HEADERS check will only work if it does this one at a time.

We could do better by doing the ?MAX_HEADERS check in collect_headers/5 before even parsing them. Since we've also changed the semantics of parsing a header to allow one of infinite length there's probably a different/better check on the amount of bytes we should allow of HTTP headers before it's considered to be invalid.

For Basho's use case, how big is that? Maybe this is something that needs to be parameterized rather than fixed at compile time.

@vinoski

This comment has been minimized.

Show comment Hide comment
@vinoski

vinoski Oct 28, 2011

Contributor

You're right about the max_headers check of course; I'll fix that. But I didn't want to try to address max line length without getting your advice first. I don't know the innards of Mochiweb that well, so wasn't sure where the best place to configure it would be. All advice welcomed. At Basho we had a case the other day of a Link header exceeding 8k in length, but it's hard to put an upper limit on it, which is why it would be nice to make it configurable.

Contributor

vinoski commented Oct 28, 2011

You're right about the max_headers check of course; I'll fix that. But I didn't want to try to address max line length without getting your advice first. I don't know the innards of Mochiweb that well, so wasn't sure where the best place to configure it would be. All advice welcomed. At Basho we had a case the other day of a Link header exceeding 8k in length, but it's hard to put an upper limit on it, which is why it would be nice to make it configurable.

@vinoski

This comment has been minimized.

Show comment Hide comment
@vinoski

vinoski Oct 28, 2011

Contributor

BTW the primary reason I didn't make collect_headers count the number of headers and fail on ?MAX_HEADERS is because the original handle_invalid_request call takes the already-processed headers as an argument, and I wasn't sure you'd want to lose that.

Contributor

vinoski commented Oct 28, 2011

BTW the primary reason I didn't make collect_headers count the number of headers and fail on ?MAX_HEADERS is because the original handle_invalid_request call takes the already-processed headers as an argument, and I wasn't sure you'd want to lose that.

@vinoski

This comment has been minimized.

Show comment Hide comment
@vinoski

vinoski Oct 28, 2011

Contributor

I've amended my commit to move the ?MAX_HEADERS handling to where the headers are collected. But I haven't added anything regarding max line length, pending your advice. I suppose I could just make a hard-coded default of 16k for now, and configurability could be added later?

Contributor

vinoski commented Oct 28, 2011

I've amended my commit to move the ?MAX_HEADERS handling to where the headers are collected. But I haven't added anything regarding max line length, pending your advice. I suppose I could just make a hard-coded default of 16k for now, and configurability could be added later?

@etrepum

This comment has been minimized.

Show comment Hide comment
@etrepum

etrepum Oct 28, 2011

Owner

Instead of putting a limit on header lines, maybe an easier to configure setting would be to limit the total size of the headers, like let's say 256KB or 1MB or something like that by default.

Unfortunately there is currently no precedent for setting HTTP-specific options, all of the options that are currently supported are in mochiweb_socket_server. The way that I would probably implement this would be to modify ?DEFAULTS and parse_options/1 in mochiweb_http to look for the HTTP-specific options and export a loop/N that knows what do to with them.

Owner

etrepum commented Oct 28, 2011

Instead of putting a limit on header lines, maybe an easier to configure setting would be to limit the total size of the headers, like let's say 256KB or 1MB or something like that by default.

Unfortunately there is currently no precedent for setting HTTP-specific options, all of the options that are currently supported are in mochiweb_socket_server. The way that I would probably implement this would be to modify ?DEFAULTS and parse_options/1 in mochiweb_http to look for the HTTP-specific options and export a loop/N that knows what do to with them.

fix handling of long HTTP header lines
Due to Erlang bug OTP-9389, the handling of HTTP request header lines
that exceed the size of the gen_tcp receive buffer is broken when
using {packet,http} and {packet,httph} modes.

Modify mochiweb to work around this bug by first using {packet,line}
mode instead to collect request header lines and then using
erlang:decode_packet to parse them. Using {packet,line} allows us to
avoid reading past the headers into the request body (if present),
since doing that would require us to carry the body data from the
header handling functions over into the body handling functions, which
could be a pretty intrusive and complicated change.

Also add regression unit tests that verify that long HTTP request
lines and long HTTP header lines are handled correctly.
@vinoski

This comment has been minimized.

Show comment Hide comment
@vinoski

vinoski Oct 29, 2011

Contributor

OK, I've pushed an attempt to limit the overall header size as suggested. The default is 256k. I couldn't modify ?DEFAULTS in mochiweb_http without also changing mochiweb_socket_server, which didn't seem like a good idea, so I took a slightly different approach that I think is still in the spirit you suggested. I also added another unit test to check that an oversize header is caught. Please have a look and if it's still not right, just let me know.

Contributor

vinoski commented Oct 29, 2011

OK, I've pushed an attempt to limit the overall header size as suggested. The default is 256k. I couldn't modify ?DEFAULTS in mochiweb_http without also changing mochiweb_socket_server, which didn't seem like a good idea, so I took a slightly different approach that I think is still in the spirit you suggested. I also added another unit test to check that an oversize header is caught. Please have a look and if it's still not right, just let me know.

@etrepum

This comment has been minimized.

Show comment Hide comment
@etrepum

etrepum Oct 31, 2011

Owner

I think there's one more case here, an oversized request line. I'll go ahead and take it from here, I've got a few cycles today

Owner

etrepum commented Oct 31, 2011

I think there's one more case here, an oversized request line. I'll go ahead and take it from here, I've got a few cycles today

@etrepum

This comment has been minimized.

Show comment Hide comment
@etrepum

etrepum Oct 31, 2011

Owner

This is awfully peculiar! I was going to unify the header parsing and collection code but it looks like erlang:decode_packet(httph, …, []) is stupid, at least in R14B01.

> erlang:decode_packet(httph, <<"Content-Length: 123\r\n">>, []).
{more,undefined}

It will succeed if there's extra stuff there, but this is a very strange implementation.

Owner

etrepum commented Oct 31, 2011

This is awfully peculiar! I was going to unify the header parsing and collection code but it looks like erlang:decode_packet(httph, …, []) is stupid, at least in R14B01.

> erlang:decode_packet(httph, <<"Content-Length: 123\r\n">>, []).
{more,undefined}

It will succeed if there's extra stuff there, but this is a very strange implementation.

@etrepum

This comment has been minimized.

Show comment Hide comment
@etrepum

etrepum Oct 31, 2011

Owner

Must be related to this part of the RFC: http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2

"HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab."

Owner

etrepum commented Oct 31, 2011

Must be related to this part of the RFC: http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2

"HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab."

@vinoski

This comment has been minimized.

Show comment Hide comment
@vinoski

vinoski Oct 31, 2011

Contributor

Yep, I found that peculiar as well, which is why I separated those
functions. This decode_packet situation prevents us from reading a
single header line and parsing it immediately, which is why I collect
them all together and then parse them. I guess I should have made that
clear up front.

--steve

On Mon, Oct 31, 2011 at 7:46 PM, Bob Ippolito
reply@reply.github.com
wrote:

Must be related to this part of the RFC: http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2

"HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab."

Reply to this email directly or view it on GitHub:
#65 (comment)

Contributor

vinoski commented Oct 31, 2011

Yep, I found that peculiar as well, which is why I separated those
functions. This decode_packet situation prevents us from reading a
single header line and parsing it immediately, which is why I collect
them all together and then parse them. I guess I should have made that
clear up front.

--steve

On Mon, Oct 31, 2011 at 7:46 PM, Bob Ippolito
reply@reply.github.com
wrote:

Must be related to this part of the RFC: http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2

"HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab."

Reply to this email directly or view it on GitHub:
#65 (comment)

@etrepum

This comment has been minimized.

Show comment Hide comment
@etrepum

etrepum Nov 1, 2011

Owner

@vinoski - can you take a quick look at my branch for this? I refactored a couple things to make it shorter and added some more tests. master...long-http-headers-fix

Owner

etrepum commented Nov 1, 2011

@vinoski - can you take a quick look at my branch for this? I refactored a couple things to make it shorter and added some more tests. master...long-http-headers-fix

@vinoski

This comment has been minimized.

Show comment Hide comment
@vinoski

vinoski Nov 1, 2011

Contributor

@etrepum: I looked over all your commits and everything looks right to me.

Contributor

vinoski commented Nov 1, 2011

@etrepum: I looked over all your commits and everything looks right to me.

@etrepum etrepum merged commit 35a9035 into mochi:master Nov 1, 2011

@etrepum

This comment has been minimized.

Show comment Hide comment
@etrepum

etrepum Nov 1, 2011

Owner

I'm going to revert this for now, we started having some very strange problems in production after deploying it (GC pauses, maybe?). It will remain on the long-http-headers-fix branch for now until we can root cause.

Owner

etrepum commented Nov 1, 2011

I'm going to revert this for now, we started having some very strange problems in production after deploying it (GC pauses, maybe?). It will remain on the long-http-headers-fix branch for now until we can root cause.

@vinoski

This comment has been minimized.

Show comment Hide comment
@vinoski

vinoski Nov 1, 2011

Contributor

Understandable.

The real way to fix this is of course to fix the real problem down in the TCP driver. I've been talking to the OTP guys about that, and I think we know how to do it. Fixing it there means we can avoid workarounds in mochiweb, yaws, and other web servers.

Contributor

vinoski commented Nov 1, 2011

Understandable.

The real way to fix this is of course to fix the real problem down in the TCP driver. I've been talking to the OTP guys about that, and I think we know how to do it. Fixing it there means we can avoid workarounds in mochiweb, yaws, and other web servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment