-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 #1608
Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 #1608
Conversation
Stop calling retryOrBail() when the tunneled Squid-server connection (that we have committed to use) closes. Our retryOrBail() is dedicated to handling errors. Most[^1] serverClosed() calls are _not_ related to errors because our tunneling code abuses asynchronous connection closure callbacks for TunnelStateData work termination. Depending on the transaction details (e.g., TLS interception vs. true CONNECT), calling retryOrBail() on these no-error code paths may result in retryOrBail() "catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors. Most tunneling errors are already detailed, and retryOrBail() does not have enough information to correctly detail the remaining ones anyway. Removing this retryOrBail() call selects the arguably lesser evil. The client-Squid connection closure callback, clientClosed(), already uses the same logic. This change does not resurrect Bug 5132 fixed by commit 752fa20 that added the now-replaced retryOrBail() call to serverClosed(). That commit fixed the leak by calling deleteThis() (via retryOrBail()). Our finishWritingAndDelete() call preserves that logic. That commit also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries. [^1]: The fact that severClosed() is called for both successful and problematic outcomes prevents TunnelStateData from properly handling certain (rare) errors. We tried to fix that as well, but the changes quickly snowballed, so we left a few XXXs instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR description: ... commit 752fa20 ... also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries.
@eduard-bagdasaryan has spotted that false claim much earlier than I did, but the corresponding PR had already been merged (and I probably did not even realize that Eduard's comment was invalidating a PR description claim).
// serverClosed() and clientClosed() callbacks will continue to mishandle | ||
// those rare closures as regular ones, and access.log records will continue | ||
// to lack some tunneling error indicators/details. | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did try to address both XXXs above. This PR represents a third attempt to fix this bug. The two previous attempts snowballed too much. I abandoned the first attempt after realizing that fixing tunnel reading code would require switching internal buffers to SBuf, triggering lots of complex and out-of-scope changes. The second attempt was a lot more modest -- just fixing close() callers -- but that snowballed as well because there are many of them and some are rather problematic. There are just too many problem in this tunneling code... We will fix them, but doing that using minimal PRs requires attacking a few low-level problems first.
Even this small PR that only changes serverClosed() to call the last portion of the old retryOrBail() code path has to select the lesser of the two evils: After these changes, common successful outcomes should be fixed, but shutdown and similar rare "external" premature Squid-server connection closures may not be detailed at all (as opposed to being detailed incorrectly).
This comment does not request any changes.
@@ -324,12 +327,48 @@ void | |||
TunnelStateData::clientClosed() | |||
{ | |||
client.noteClosure(); | |||
finishWritingAndDelete(server); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More debugging aside, this PR does not change how clientClosed() works. We just moved the old code into finishWritingAndDelete() and rearranged it a bit to add debugging and XXXs.
This comment does not request any changes.
Stop calling retryOrBail() when the tunneled Squid-server connection (that we have committed to use) closes. Our retryOrBail() is dedicated to handling errors. Most[^1] serverClosed() calls are _not_ related to errors because our tunneling code abuses asynchronous connection closure callbacks for TunnelStateData work termination. Depending on the transaction details (e.g., TLS interception vs. true CONNECT), calling retryOrBail() on these no-error code paths may result in retryOrBail() "catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors. Most tunneling errors are already detailed, and retryOrBail() does not have enough information to correctly detail the remaining ones anyway. Removing this retryOrBail() call selects the arguably lesser evil. The client-Squid connection closure callback, clientClosed(), already uses the same logic. This change does not resurrect Bug 5132 fixed by commit 752fa20 that added the now-replaced retryOrBail() call to serverClosed(). That commit fixed the leak by calling deleteThis() (via retryOrBail()). Our finishWritingAndDelete() call preserves that logic. That commit also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries. [^1]: The fact that severClosed() is called for both successful and problematic outcomes prevents TunnelStateData from properly handling certain (rare) errors. We tried to fix that as well, but the changes quickly snowballed, so we left a few XXXs instead.
Hi Maintainers, When can I expect an official backport for this fix? |
let's see how clean the backport is |
Stop calling retryOrBail() when the tunneled Squid-server connection (that we have committed to use) closes. Our retryOrBail() is dedicated to handling errors. Most[^1] serverClosed() calls are _not_ related to errors because our tunneling code abuses asynchronous connection closure callbacks for TunnelStateData work termination. Depending on the transaction details (e.g., TLS interception vs. true CONNECT), calling retryOrBail() on these no-error code paths may result in retryOrBail() "catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors. Most tunneling errors are already detailed, and retryOrBail() does not have enough information to correctly detail the remaining ones anyway. Removing this retryOrBail() call selects the arguably lesser evil. The client-Squid connection closure callback, clientClosed(), already uses the same logic. This change does not resurrect Bug 5132 fixed by commit 752fa20 that added the now-replaced retryOrBail() call to serverClosed(). That commit fixed the leak by calling deleteThis() (via retryOrBail()). Our finishWritingAndDelete() call preserves that logic. That commit also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries. [^1]: The fact that severClosed() is called for both successful and problematic outcomes prevents TunnelStateData from properly handling certain (rare) errors. We tried to fix that as well, but the changes quickly snowballed, so we left a few XXXs instead.
Clarification: What Francesco means is "let's see whether cherry-picking this PR commit into v6 succeeds and the result passes CI tests". It probably does -- you can track #1641 progress that attempts to merge these changes automatically. If it fails, one can certainly backport these changes manually, but that (obviously) requires more work. |
queued for backport to v6 |
Stop calling retryOrBail() when the tunneled Squid-server connection (that we have committed to use) closes. Our retryOrBail() is dedicated to handling errors. Most[^1] serverClosed() calls are _not_ related to errors because our tunneling code abuses asynchronous connection closure callbacks for TunnelStateData work termination. Depending on the transaction details (e.g., TLS interception vs. true CONNECT), calling retryOrBail() on these no-error code paths may result in retryOrBail() "catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors. Most tunneling errors are already detailed, and retryOrBail() does not have enough information to correctly detail the remaining ones anyway. Removing this retryOrBail() call selects the arguably lesser evil. The client-Squid connection closure callback, clientClosed(), already uses the same logic. This change does not resurrect Bug 5132 fixed by commit 752fa20 that added the now-replaced retryOrBail() call to serverClosed(). That commit fixed the leak by calling deleteThis() (via retryOrBail()). Our finishWritingAndDelete() call preserves that logic. That commit also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries. [^1]: The fact that severClosed() is called for both successful and problematic outcomes prevents TunnelStateData from properly handling certain (rare) errors. We tried to fix that as well, but the changes quickly snowballed, so we left a few XXXs instead.
Team, this issue exists in v6.6 which is verified and now officially part of AWS images. Is there a work around for this issue for squid v6.6? |
Not that I know of (unless you accept an untested source code patch as a workaround). |
Stop calling retryOrBail() when the tunneled Squid-server connection
(that we have committed to use) closes. Our retryOrBail() is dedicated
to handling errors. Most1 serverClosed() calls are not related to
errors because our tunneling code abuses asynchronous connection closure
callbacks for TunnelStateData work termination. Depending on the
transaction details (e.g., TLS interception vs. true CONNECT), calling
retryOrBail() on these no-error code paths may result in retryOrBail()
"catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors.
Most tunneling errors are already detailed, and retryOrBail() does not
have enough information to correctly detail the remaining ones anyway.
Removing this retryOrBail() call selects the arguably lesser evil.
The client-Squid connection closure callback, clientClosed(), already
uses the same logic.
This change does not resurrect Bug 5132 fixed by commit 752fa20 that
added the now-replaced retryOrBail() call to serverClosed(). That commit
fixed the leak by calling deleteThis() (via retryOrBail()). Our
finishWritingAndDelete() call preserves that logic. That commit also
claimed to allow more retries, but that claim was a mistake: To-server
closure callback registration (e.g. commitToServer()) bans retries.
Footnotes
The fact that severClosed() is called for both successful and
problematic outcomes prevents TunnelStateData from properly handling
certain (rare) errors. We tried to fix that as well, but the changes
quickly snowballed, so we left a few XXXs instead. ↩