-
Notifications
You must be signed in to change notification settings - Fork 554
RFC 7230: server MUST reject messages with BWS after field-name #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Obey the RFC requirement to reject HTTP request messages with whitespace between field-name and colon delimiter. Also obey requirement to always strip such whitespace from HTTP response messages. Removing need to be using the relaxed parser for this action. For now non-HTTP protocols retain teh old behavioor of removal only when using the relaxed parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please adjust this PR description to explain why we are now rejecting requests with spaces after fields names. Simply referring to an "RFC requirement" is not sufficient IMO because our relaxed parser should and does violate RFC requirements. We ought to have a more serious/compelling motivation for this change than just following RFC requirements.
I also have one API fix request. The rest is optional polishing.
src/HttpHeader.cc
Outdated
| @@ -469,15 +469,12 @@ HttpHeader::parse(const char *header_start, size_t hdrLen, Http::ContentLengthIn | |||
| break; /* terminating blank line */ | |||
| } | |||
|
|
|||
| HttpHeaderEntry *e; | |||
| if ((e = HttpHeaderEntry::parse(field_start, field_end)) == NULL) { | |||
| auto *e = HttpHeaderEntry::parse(field_start, field_end, owner); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not reassign e, then please use const:
| auto *e = HttpHeaderEntry::parse(field_start, field_end, owner); | |
| const auto e = HttpHeaderEntry::parse(field_start, field_end, owner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This begins a cascade of side effects via push_back inside addEntry() needing the type of HttpHeaderEntry::entries vector to match the const'ness of this auto. Fixing the const'ness is going to have to be left until later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This begins a cascade of side effects
It does not. You are probably thinking about const auto *e. I am asking for const auto e which is a constant e (the pointer) and not the constant entry (the thing e is pointing to).
What are you looking for here? I think these being a MUST requirement should be enough justification. The added code docs and RFC reference lead one to the specific details without having to lookup the commit message. The relaxed parser should never be violating RFC requirements. That is what the --enable-http-violations is for. Relaxed is for abnormal syntax where the spec leaves interpretation up to the implementation, or explicitly outlines a tolerant handling for it. If you can point out code using relaxed parser to directly violate the spec in a --disable-http-violations build that is a bug we need to fix. |
* better const correctness. * more debug info.
I do not. And I suspect that admins that will start complaining about this change will agree with me. If you can make these violations conditional on |
|
Very well, since you insist:
These are currently the behaviour when relaxed parser is enabled (on by default) but can be disabled by turning it off. Placing it behind --enable-http-violations would leave these as the default behaviour of nearly all Squid but remove the config option to disable, since you and others insist that option remain as set by default on builds. |
|
It feels like my basic concern is agitating you for some unknown to me reason, but I have to ask to understand: Assuming Squid recognizes and removes that extra space before interpreting and forwarding header fields, do those horrors require something else, like a second broken proxy or a broken origin server? I do not see how an extra space alone can do that damage if all agents are aware of the problem and avoid misinterpreting header fields with extra spaces. |
* drop output to level 2, access.log will show a DENIED/4xx entry * remove reject/ignore wording. That is an action by the caller which is reported there anyway.
|
Did you get the squid-bugs report from regis with the details? All that is required is that Squid tolerates whitespace before the colon. Either by "fixing" it or by ignoring the header entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All that is required is that Squid tolerates whitespace before the colon
I reread the report you were referring to and it appears to directly support my suspicion that all these horrors that you are describing do require a second/broken actor. If all actors, including fixed Squid, remove or just ignore broken header fields, then nothing bad happens.
Thus, our justification for rejecting malformed messages cannot be simply that removing malformed fields would expose Squid to various horrors. Our justification would be that removing malformed fields would expose (Squid plus some other broken agent) combination to various horrors. This is still bad, of course, but the requirement to have a broken agent in the mix is important in our considerations because many Squid deployments do not have a broken agent behind them and, hence, are immune from all these horrors (and may not want to be responsible for other broken agents out there). We will be penalizing those deployments because we do not want to make this behavior conditional/configurable.
We can still penalize immune deployments if you think that is the best way forward. I suggest using the following first paragraph for this PR description:
Obey the RFC requirement to reject HTTP requests with whitespace between
field-name and the colon delimiter. Rejection is critical in the
presence of broken HTTP agents that mishandle malformed messages.
If you insist on your PR description wording, I can live with that as well (now that I confirmed what the trade-offs really are).
Please see inlined change requests for a new bug fix (with an easy-to-implement option) and a trivial polishing change that was mishandled during the previous round.
src/HttpHeader.cc
Outdated
| @@ -1424,19 +1421,40 @@ HttpHeaderEntry::parse(const char *field_start, const char *field_end) | |||
|
|
|||
| if (name_len > 65534) { | |||
| /* String must be LESS THAN 64K and it adds a terminating NULL */ | |||
| debugs(55, DBG_IMPORTANT, "WARNING: ignoring header name of " << name_len << " bytes"); | |||
| debugs(55, 2, "found header name of " << name_len << " bytes (" << Raw("value_start", value_start, name_len) << ")"); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you would rather not enhance this reporting in this PR, we can just use the old message (without Raw, at level 2) -- this debugging is not a big deal, of course. If you prefer to use Raw, then there is a PR bug to fix and a few improvements to make:
- The combination of value_start and name_len does not make sense. There are several ways to fix that, but if we only report one thing, then we should be reporting the offending name.
- Raw() already prints data length so you do not need to print
name_len << " bytesmanually. - It is best not to surround Raw output with your own decorations -- let Raw take care of that (and enhance Raw if you feel more decor is a good idea in some cases).
- It is wrong to print 65+K worth of name characters to cache.log, even at level 2. The first few characters should be more than enough to triage the problem in most cases, and printing huge volumes of information is likely to cause more problems than it will solve in this case. I would suggest printing the first 100 bytes or so.
This needs to be tested to get the minor details right, but we should be using something like this:
| debugs(55, 2, "found header name of " << name_len << " bytes (" << Raw("value_start", value_start, name_len) << ")"); | |
| debugs(55, 2, "ignoring huge header field" << Raw("name", field_start, name_len).maxSize(100)); |
Please let me know if you need my help adding Raw::maxSize(). It would be generally useful, of course. We may even want to make a limit like that the default Raw behavior, but adding that default would require going through (few) current Raw uses to disable that limit for exceptional cases like TLS Hello dumping where the limit is harmful.
src/HttpHeader.cc
Outdated
| @@ -469,15 +469,12 @@ HttpHeader::parse(const char *header_start, size_t hdrLen, Http::ContentLengthIn | |||
| break; /* terminating blank line */ | |||
| } | |||
|
|
|||
| HttpHeaderEntry *e; | |||
| if ((e = HttpHeaderEntry::parse(field_start, field_end)) == NULL) { | |||
| auto *e = HttpHeaderEntry::parse(field_start, field_end, owner); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This begins a cascade of side effects
It does not. You are probably thinking about const auto *e. I am asking for const auto e which is a constant e (the pointer) and not the constant entry (the thing e is pointing to).
Obey the RFC requirement to reject HTTP requests with whitespace between field-name and the colon delimiter. Rejection is critical in the presence of broken HTTP agents that mishandle malformed messages. Also obey requirement to always strip such whitespace from HTTP response messages. The relaxed parser is no longer necessary for this response change. For now non-HTTP protocols retain the old behaviour of removal only when using the relaxed parser.
Obey the RFC requirement to reject HTTP requests with whitespace between field-name and the colon delimiter. Rejection is critical in the presence of broken HTTP agents that mishandle malformed messages. Also obey requirement to always strip such whitespace from HTTP response messages. The relaxed parser is no longer necessary for this response change. For now non-HTTP protocols retain the old behaviour of removal only when using the relaxed parser.
Obey the RFC requirement to reject HTTP requests with whitespace between field-name and the colon delimiter. Rejection is critical in the presence of broken HTTP agents that mishandle malformed messages. Also obey requirement to always strip such whitespace from HTTP response messages. The relaxed parser is no longer necessary for this response change. For now non-HTTP protocols retain the old behaviour of removal only when using the relaxed parser.
|
@yadij, this PR is on your side. |
Obey the RFC requirement to reject HTTP requests with whitespace between field-name and the colon delimiter. Rejection is critical in the presence of broken HTTP agents that mishandle malformed messages. Also obey requirement to always strip such whitespace from HTTP response messages. The relaxed parser is no longer necessary for this response change. For now non-HTTP protocols retain the old behaviour of removal only when using the relaxed parser.
…d-cache#445) Obey the RFC requirement to reject HTTP requests with whitespace between field-name and the colon delimiter. Rejection is critical in the presence of broken HTTP agents that mishandle malformed messages. Also obey requirement to always strip such whitespace from HTTP response messages. The relaxed parser is no longer necessary for this response change. For now non-HTTP protocols retain the old behaviour of removal only when using the relaxed parser.
…d-cache#445) Obey the RFC requirement to reject HTTP requests with whitespace between field-name and the colon delimiter. Rejection is critical in the presence of broken HTTP agents that mishandle malformed messages. Also obey requirement to always strip such whitespace from HTTP response messages. The relaxed parser is no longer necessary for this response change. For now non-HTTP protocols retain the old behaviour of removal only when using the relaxed parser.
Obey the RFC requirement to reject HTTP requests with whitespace between field-name and the colon delimiter. Rejection is critical in the presence of broken HTTP agents that mishandle malformed messages. Also obey requirement to always strip such whitespace from HTTP response messages. The relaxed parser is no longer necessary for this response change. For now non-HTTP protocols retain the old behaviour of removal only when using the relaxed parser.
Obey the RFC requirement to reject HTTP requests with whitespace
between field-name and the colon delimiter. Rejection is
critical in the presence of broken HTTP agents that mishandle
malformed messages.
Also obey requirement to always strip such whitespace from HTTP
response messages. The relaxed parser is no longer necessary for
this response change.
For now non-HTTP protocols retain the old behaviour of removal
only when using the relaxed parser.