Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infinite recursion when parsing HTTP chunks #1553

Conversation

rousskov
Copy link
Contributor

This change stops infinite HttpStateData recursion with at-max-capacity
inBuf. Such inBuf prevents progress in the following call chain:

  • processReply()
  • processReplyBody() and decodeAndWriteReplyBody()
  • maybeReadVirginBody()
  • maybeMakeSpaceAvailable() -- tries but fails to quit processing
  • processReply()

HttpStateData::maybeMakeSpaceAvailable() no longer calls processReply(),
preventing recursion.

maybeReadVirginBody() now aborts transactions that would otherwise get
stalled due to full read buffer at its maximum capacity. This change
requires that all maybeReadVirginBody() callers do actually need more
response data to make progress. AFAICT, that (natural) invariant holds.

We moved transaction stalling check from maybeMakeSpaceAvailable() into
its previous callers. Without that move, maybeMakeSpaceAvailable() would
have to handle both abortTransaction() and delayRead() cases. Besides
increased code complexity, that would trigger some premature delayRead()
calls (at maybeReadVirginBody() time). Deciding whether to delay socket
reads is complicated, the delay mechanism is expensive, and delaying may
become unnecessary by the time the socket becomes readable, so it is
best to continue to only delayRead() at readReply() time, when there is
no other choice left.

maybeReadVirginBody() mishandled cases where progress was possible, but
not immediately -- it did nothing in those cases, probably stalling
transactions when maybeMakeSpaceAvailable() returned false but did not
call processReply(). This is now fixed: maybeReadVirginBody() now starts
waiting for the socket to be ready for reading in those cases,
effectively passing control to readReply() that handles them.

maybeReadVirginBody() prematurely grew buffer for future socket reads.
As a (positive) side effect of the above refactoring, we now delay
buffer growth until the actual read(2) time, which is best for
performance. Most likely, this premature buffer growth was an accident:
maybeReadVirginBody() correctly called maybeMakeSpaceAvailable() with
doGrow set to false. However, maybeMakeSpaceAvailable() misinterpreted
doGrow as a "do not actually do it" parameter. That bug is now gone.

This recursion bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/
where it was filed as "Chunked Encoding Stack Overflow".

This change stops infinite HttpStateData recursion with at-max-capacity
inBuf. Such inBuf prevents progress in the following call chain:

* processReply()
* processReplyBody() and decodeAndWriteReplyBody()
* maybeReadVirginBody()
* maybeMakeSpaceAvailable() -- tries but fails to quit processing
* processReply()

To stop recursion, decodeAndWriteReplyBodyI() now predicts that
maybeMakeSpaceAvailable() will detect at-max-capacity inBuf and breaks
the recursion by declaring a fatal chunked parsing error.

XXX: This change addresses a few known infinite recursion cases, but it
is conceptually wrong for several (partially overlapping) reasons:

1. It assumes that at-max-capacity inBuf is a permanent state (i.e.
   things cannot improve with time). This assumption is false. For
   example, as more space frees up in the adaptation pipeline, we will
   clear responseBodyBuffer, and Client::calcBufferSpaceToReserve() may
   allow us to grow inBuf and read more chunked metadata.

2. New decodeAndWriteReplyBody() code assumption that inability (or lack
   of necessity) to provide usable buffer space implies that chunk
   metadata is huge is false: calcReadBufferSpaceToReserve() may become
   zero even with tiny buffered metadata.

3. The resulting recursion termination is arguably accidental. The call
   loop remains intact, and minor code changes (or unknown use cases)
   may resurrect the same infinite recursion problem.

----

This change is based on small portions of the stuck official PR squid-cache#736
(i.e. old Factory branch SQUID-533-Optimize-memalloc-server-reads). It
does not include many important fixes and improvements from that branch.
I am reverting previous commit b631d5a9c933fa849f6887c433020d0125bc08f4
to address the problems identified in that commit message.
This change stops infinite HttpStateData recursion with at-max-capacity
inBuf. Such inBuf prevents progress in the following call chain:

* processReply()
* processReplyBody() and decodeAndWriteReplyBody()
* maybeReadVirginBody()
* maybeMakeSpaceAvailable() -- tries but fails to quit processing
* processReply()

HttpStateData::maybeMakeSpaceAvailable() no longer calls processReply(),
preventing recursion.

maybeReadVirginBody() now aborts transactions that would otherwise get
stalled due to full read buffer at its maximum capacity. This change
requires that all maybeReadVirginBody() callers do actually need more
response data to make progress. AFAICT, that (natural) invariant holds.

We moved transaction stalling check from maybeMakeSpaceAvailable() into
its previous callers. Without that move, maybeMakeSpaceAvailable() would
have to handle both abortTransaction() and delayRead() cases. Besides
increased code complexity, that would trigger some premature delayRead()
calls (at maybeReadVirginBody() time). Deciding whether to delay socket
reads is complicated, the delay mechanism is expensive, and delaying may
become unnecessary by the time the socket becomes readable, so it is
best to continue to only delayRead() at readReply() time, when there is
no other choice left.

maybeReadVirginBody() mishandled cases where progress was possible, but
not _immediately_ -- it did nothing in those cases, probably stalling
transactions when maybeMakeSpaceAvailable() returned false but did not
call processReply(). This is now fixed: maybeReadVirginBody() now starts
waiting for the socket to be ready for reading in those cases,
effectively passing control to readReply() that handles them.

maybeReadVirginBody() prematurely grew buffer for future socket reads.
As a (positive) side effect of the above refactoring, we now delay
buffer growth until the actual read(2) time, which is best for
performance. Most likely, this premature buffer growth was an accident:
maybeReadVirginBody() correctly called maybeMakeSpaceAvailable() with
doGrow set to false. However, maybeMakeSpaceAvailable() misinterpreted
doGrow as a "do not actually do it" parameter. That bug is now gone.

This recursion bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/
where it was filed as "Chunked Encoding Stack Overflow".
These should probably wait until we resume official PR squid-cache#736 (i.e. old
Factory branch SQUID-533-Optimize-memalloc-server-reads) work or start
pushing tiny minimal PRs in preparation for that.
Copy link
Contributor Author

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes fix the problem in my tests. I am posting this PR as a Draft because I would like to run more tests and double check that there are no places where we should add the removed processReply() call. However, I think it is fairly safe to start reviewing this rushed fix when you get a chance, at least to make sure you are OK with various TODOs and XXXs being left as they are now.


const auto readSizeMax = maybeMakeSpaceAvailable(moreDataPermission.value());
// TODO: Move this logic inside maybeMakeSpaceAvailable():
const auto readSizeWanted = readSizeMax ? entry->bytesWanted(Range<size_t>(0, readSizeMax)) : 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is essentially unchanged by this PR, but if others think it should be moved inside maybeMakeSpaceAvailable() in this PR, I would be happy to do that.

src/http.cc Show resolved Hide resolved
src/http.cc Show resolved Hide resolved
// we may need to grow the buffer
inBuf.reserveSpace(read_size);
debugs(11, 8, (!flags.do_next_read ? "will not" : "may") <<
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed "will not" clause from this debugs() line because we only "make space available" when we are on a "true do_next_read" path. We may not actually read for various reasons, but not because flags.do_next_read is false.

@rousskov
Copy link
Contributor Author

Alex: I am posting this PR as a Draft because I would like to ... double check that there are no places where we should add the removed processReply() call.

Done, with the following conclusion: I do not think more processReply() calls are needed in this PR. We already call processReply() after we received more data from a peer and after we are done handling a 1xx control message. The only other condition/event I can think of that would require us to parse already buffered from-peer bytes is when the adaptation drains a previously overflowed BodyPipe buffer. In that Client::noteMoreBodySpaceAvailable() code, we call maybeReadVirginBody() instead of processReply() but that should work OK for now AFAICT, and, more importantly improving that fragile code (with some rather heavy implicit assumptions!) is outside this PR scope.

I am marking this PR as ready for review. We still plan on doing more testing in the background, and I do not intend to merge this PR (even if it is approved) until that testing is completed.

@rousskov rousskov marked this pull request as ready for review October 30, 2023 01:25
return;
}

const auto readSizeMax = maybeMakeSpaceAvailable(moreDataPermission.value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is there a similar problem in Server code, i.e., do we need some 'canBufferMoreRequestBytes()' check before calling Server::maybeMakeSpaceAvailable()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server code probably suffers from various buffering problems (with a different set of constraints). There is a related check (with a misleading comment) there already, but I doubt it is enough:

// we can only read if there is more than 1 byte of space free
if (Config.maxRequestBufferSize - inBuf.length() < 2)
    return;

Somebody will need to study Server issues and address them separately. Hopefully, there is no infinite recursion there (the subject of this PR).

@eduard-bagdasaryan
Copy link
Contributor

maybeReadVirginBody() correctly called maybeMakeSpaceAvailable() with doGrow set to false. [...] maybeMakeSpaceAvailable() misinterpreted doGrow as a "do not actually do it" parameter.

This should also mean that the other maybeMakeSpaceAvailable(true) call in HttpStateData::readReply() worked incorrectly, i.e., it did not grow the buffer.

@yadij yadij added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels S-could-use-an-approval An approval may speed this PR merger (but is not required) labels Oct 31, 2023
squid-anubis pushed a commit that referenced this pull request Oct 31, 2023
This change stops infinite HttpStateData recursion with at-max-capacity
inBuf. Such inBuf prevents progress in the following call chain:

* processReply()
* processReplyBody() and decodeAndWriteReplyBody()
* maybeReadVirginBody()
* maybeMakeSpaceAvailable() -- tries but fails to quit processing
* processReply()

HttpStateData::maybeMakeSpaceAvailable() no longer calls processReply(),
preventing recursion.

maybeReadVirginBody() now aborts transactions that would otherwise get
stalled due to full read buffer at its maximum capacity. This change
requires that all maybeReadVirginBody() callers do actually need more
response data to make progress. AFAICT, that (natural) invariant holds.

We moved transaction stalling check from maybeMakeSpaceAvailable() into
its previous callers. Without that move, maybeMakeSpaceAvailable() would
have to handle both abortTransaction() and delayRead() cases. Besides
increased code complexity, that would trigger some premature delayRead()
calls (at maybeReadVirginBody() time). Deciding whether to delay socket
reads is complicated, the delay mechanism is expensive, and delaying may
become unnecessary by the time the socket becomes readable, so it is
best to continue to only delayRead() at readReply() time, when there is
no other choice left.

maybeReadVirginBody() mishandled cases where progress was possible, but
not _immediately_ -- it did nothing in those cases, probably stalling
transactions when maybeMakeSpaceAvailable() returned false but did not
call processReply(). This is now fixed: maybeReadVirginBody() now starts
waiting for the socket to be ready for reading in those cases,
effectively passing control to readReply() that handles them.

maybeReadVirginBody() prematurely grew buffer for future socket reads.
As a (positive) side effect of the above refactoring, we now delay
buffer growth until the actual read(2) time, which is best for
performance. Most likely, this premature buffer growth was an accident:
maybeReadVirginBody() correctly called maybeMakeSpaceAvailable() with
doGrow set to false. However, maybeMakeSpaceAvailable() misinterpreted
doGrow as a "do not actually do it" parameter. That bug is now gone.

This recursion bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/
where it was filed as "Chunked Encoding Stack Overflow".
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Oct 31, 2023
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels labels Oct 31, 2023
@rousskov rousskov removed the S-could-use-an-approval An approval may speed this PR merger (but is not required) label Oct 31, 2023
Copy link
Contributor Author

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex: We still plan on doing more testing in the background, and I do not intend to merge this PR (even if it is approved) until that testing is completed.

Others have merged this PR. We are still running those tests. Eduard or I will follow up if those tests expose any significant problems associated with this PR changes.

Eduard: This should also mean that the other maybeMakeSpaceAvailable(true) call in HttpStateData::readReply() worked incorrectly, i.e., it did not grow the buffer.

Yes, here is how #736 described that bug:

Also fixed commit 1ad6851 to delay buffer allocations until we are ready
to read from the server. That old commit made buffer allocations in
maybeMakeSpaceAvailable() optional so that they can be delayed until we
actually read from the peer socket. Delaying buffer allocation until
read(2) is the right thing to do. The callers (correctly) passed a
boolean flag to trigger the allocation, but the method interpreted the
flag as if it meant "do not allocate". That is, commit tried to delay
allocations but actually allocated early.

There is more in that PR description, including the explanation why the code "worked" despite doing the opposite of what was intended/expected.

return;
}

const auto readSizeMax = maybeMakeSpaceAvailable(moreDataPermission.value());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server code probably suffers from various buffering problems (with a different set of constraints). There is a related check (with a misleading comment) there already, but I doubt it is enough:

// we can only read if there is more than 1 byte of space free
if (Config.maxRequestBufferSize - inBuf.length() < 2)
    return;

Somebody will need to study Server issues and address them separately. Hopefully, there is no infinite recursion there (the subject of this PR).

@yadij
Copy link
Contributor

yadij commented Dec 14, 2023

@rousskov preparing the advisory for announcement. Was this patch sufficient to fix the issue or are/were there additional changes necessary?

@rousskov
Copy link
Contributor Author

rousskov commented Dec 14, 2023

Was this patch sufficient to fix the issue or are/were there additional changes necessary?

AFAIK, commit 50c5af8 fixes infinite processReply-maybeMakeSpaceAvailable-processReply recursion when parsing HTTP response chunks. Additional changes are necessary to fix other known problems in response reading/buffering code, along the lines of #736, but none of those "other known problems" are infinite chunks parsing recursion IIRC. The Server (i.e. request handling) code is a separate can of worms that we have not studied from the chunk parsing recursion point of view, but the published vulnerability was specific to Client handling of responses.

@kinkie kinkie added the backport-to-v6 maintainer has approved these changes for v6 backporting label Feb 9, 2024
squidadm pushed a commit to squidadm/squid that referenced this pull request Feb 9, 2024
This change stops infinite HttpStateData recursion with at-max-capacity
inBuf. Such inBuf prevents progress in the following call chain:

* processReply()
* processReplyBody() and decodeAndWriteReplyBody()
* maybeReadVirginBody()
* maybeMakeSpaceAvailable() -- tries but fails to quit processing
* processReply()

HttpStateData::maybeMakeSpaceAvailable() no longer calls processReply(),
preventing recursion.

maybeReadVirginBody() now aborts transactions that would otherwise get
stalled due to full read buffer at its maximum capacity. This change
requires that all maybeReadVirginBody() callers do actually need more
response data to make progress. AFAICT, that (natural) invariant holds.

We moved transaction stalling check from maybeMakeSpaceAvailable() into
its previous callers. Without that move, maybeMakeSpaceAvailable() would
have to handle both abortTransaction() and delayRead() cases. Besides
increased code complexity, that would trigger some premature delayRead()
calls (at maybeReadVirginBody() time). Deciding whether to delay socket
reads is complicated, the delay mechanism is expensive, and delaying may
become unnecessary by the time the socket becomes readable, so it is
best to continue to only delayRead() at readReply() time, when there is
no other choice left.

maybeReadVirginBody() mishandled cases where progress was possible, but
not _immediately_ -- it did nothing in those cases, probably stalling
transactions when maybeMakeSpaceAvailable() returned false but did not
call processReply(). This is now fixed: maybeReadVirginBody() now starts
waiting for the socket to be ready for reading in those cases,
effectively passing control to readReply() that handles them.

maybeReadVirginBody() prematurely grew buffer for future socket reads.
As a (positive) side effect of the above refactoring, we now delay
buffer growth until the actual read(2) time, which is best for
performance. Most likely, this premature buffer growth was an accident:
maybeReadVirginBody() correctly called maybeMakeSpaceAvailable() with
doGrow set to false. However, maybeMakeSpaceAvailable() misinterpreted
doGrow as a "do not actually do it" parameter. That bug is now gone.

This recursion bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/
where it was filed as "Chunked Encoding Stack Overflow".
yadij pushed a commit that referenced this pull request Feb 9, 2024
This change stops infinite HttpStateData recursion with at-max-capacity
inBuf. Such inBuf prevents progress in the following call chain:

* processReply()
* processReplyBody() and decodeAndWriteReplyBody()
* maybeReadVirginBody()
* maybeMakeSpaceAvailable() -- tries but fails to quit processing
* processReply()

HttpStateData::maybeMakeSpaceAvailable() no longer calls processReply(),
preventing recursion.

maybeReadVirginBody() now aborts transactions that would otherwise get
stalled due to full read buffer at its maximum capacity. This change
requires that all maybeReadVirginBody() callers do actually need more
response data to make progress. AFAICT, that (natural) invariant holds.

We moved transaction stalling check from maybeMakeSpaceAvailable() into
its previous callers. Without that move, maybeMakeSpaceAvailable() would
have to handle both abortTransaction() and delayRead() cases. Besides
increased code complexity, that would trigger some premature delayRead()
calls (at maybeReadVirginBody() time). Deciding whether to delay socket
reads is complicated, the delay mechanism is expensive, and delaying may
become unnecessary by the time the socket becomes readable, so it is
best to continue to only delayRead() at readReply() time, when there is
no other choice left.

maybeReadVirginBody() mishandled cases where progress was possible, but
not _immediately_ -- it did nothing in those cases, probably stalling
transactions when maybeMakeSpaceAvailable() returned false but did not
call processReply(). This is now fixed: maybeReadVirginBody() now starts
waiting for the socket to be ready for reading in those cases,
effectively passing control to readReply() that handles them.

maybeReadVirginBody() prematurely grew buffer for future socket reads.
As a (positive) side effect of the above refactoring, we now delay
buffer growth until the actual read(2) time, which is best for
performance. Most likely, this premature buffer growth was an accident:
maybeReadVirginBody() correctly called maybeMakeSpaceAvailable() with
doGrow set to false. However, maybeMakeSpaceAvailable() misinterpreted
doGrow as a "do not actually do it" parameter. That bug is now gone.

This recursion bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/
where it was filed as "Chunked Encoding Stack Overflow".
@yadij yadij removed the backport-to-v6 maintainer has approved these changes for v6 backporting label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-merged https://github.com/measurement-factory/anubis#pull-request-labels
Projects
None yet
5 participants