Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 #1608

rousskov · 2023-11-30T21:32:47Z

Stop calling retryOrBail() when the tunneled Squid-server connection
(that we have committed to use) closes. Our retryOrBail() is dedicated
to handling errors. Most¹ serverClosed() calls are not related to
errors because our tunneling code abuses asynchronous connection closure
callbacks for TunnelStateData work termination. Depending on the
transaction details (e.g., TLS interception vs. true CONNECT), calling
retryOrBail() on these no-error code paths may result in retryOrBail()
"catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors.

Most tunneling errors are already detailed, and retryOrBail() does not
have enough information to correctly detail the remaining ones anyway.
Removing this retryOrBail() call selects the arguably lesser evil.

The client-Squid connection closure callback, clientClosed(), already
uses the same logic.

This change does not resurrect Bug 5132 fixed by commit 752fa20 that
added the now-replaced retryOrBail() call to serverClosed(). That commit
fixed the leak by calling deleteThis() (via retryOrBail()). Our
finishWritingAndDelete() call preserves that logic. That commit also
claimed to allow more retries, but that claim was a mistake: To-server
closure callback registration (e.g. commitToServer()) bans retries.

The fact that severClosed() is called for both successful and
problematic outcomes prevents TunnelStateData from properly handling
certain (rare) errors. We tried to fix that as well, but the changes
quickly snowballed, so we left a few XXXs instead. ↩

Stop calling retryOrBail() when the tunneled Squid-server connection (that we have committed to use) closes. Our retryOrBail() is dedicated to handling errors. Most[^1] serverClosed() calls are _not_ related to errors because our tunneling code abuses asynchronous connection closure callbacks for TunnelStateData work termination. Depending on the transaction details (e.g., TLS interception vs. true CONNECT), calling retryOrBail() on these no-error code paths may result in retryOrBail() "catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors. Most tunneling errors are already detailed, and retryOrBail() does not have enough information to correctly detail the remaining ones anyway. Removing this retryOrBail() call selects the arguably lesser evil. The client-Squid connection closure callback, clientClosed(), already uses the same logic. This change does not resurrect Bug 5132 fixed by commit 752fa20 that added the now-replaced retryOrBail() call to serverClosed(). That commit fixed the leak by calling deleteThis() (via retryOrBail()). Our finishWritingAndDelete() call preserves that logic. That commit also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries. [^1]: The fact that severClosed() is called for both successful and problematic outcomes prevents TunnelStateData from properly handling certain (rare) errors. We tried to fix that as well, but the changes quickly snowballed, so we left a few XXXs instead.

rousskov

PR description: ... commit 752fa20 ... also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries.

@eduard-bagdasaryan has spotted that false claim much earlier than I did, but the corresponding PR had already been merged (and I probably did not even realize that Eduard's comment was invalidating a PR description claim).

rousskov · 2023-11-30T21:53:13Z

src/tunnel.cc

+    // serverClosed() and clientClosed() callbacks will continue to mishandle
+    // those rare closures as regular ones, and access.log records will continue
+    // to lack some tunneling error indicators/details.
+    //


I did try to address both XXXs above. This PR represents a third attempt to fix this bug. The two previous attempts snowballed too much. I abandoned the first attempt after realizing that fixing tunnel reading code would require switching internal buffers to SBuf, triggering lots of complex and out-of-scope changes. The second attempt was a lot more modest -- just fixing close() callers -- but that snowballed as well because there are many of them and some are rather problematic. There are just too many problem in this tunneling code... We will fix them, but doing that using minimal PRs requires attacking a few low-level problems first.

Even this small PR that only changes serverClosed() to call the last portion of the old retryOrBail() code path has to select the lesser of the two evils: After these changes, common successful outcomes should be fixed, but shutdown and similar rare "external" premature Squid-server connection closures may not be detailed at all (as opposed to being detailed incorrectly).

This comment does not request any changes.

rousskov · 2023-11-30T21:55:32Z

src/tunnel.cc

@@ -324,12 +327,48 @@ void
 TunnelStateData::clientClosed()
 {
    client.noteClosure();
+    finishWritingAndDelete(server);


More debugging aside, this PR does not change how clientClosed() works. We just moved the old code into finishWritingAndDelete() and rearranged it a bit to add debugging and XXXs.

This comment does not request any changes.

Stop calling retryOrBail() when the tunneled Squid-server connection (that we have committed to use) closes. Our retryOrBail() is dedicated to handling errors. Most[^1] serverClosed() calls are _not_ related to errors because our tunneling code abuses asynchronous connection closure callbacks for TunnelStateData work termination. Depending on the transaction details (e.g., TLS interception vs. true CONNECT), calling retryOrBail() on these no-error code paths may result in retryOrBail() "catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors. Most tunneling errors are already detailed, and retryOrBail() does not have enough information to correctly detail the remaining ones anyway. Removing this retryOrBail() call selects the arguably lesser evil. The client-Squid connection closure callback, clientClosed(), already uses the same logic. This change does not resurrect Bug 5132 fixed by commit 752fa20 that added the now-replaced retryOrBail() call to serverClosed(). That commit fixed the leak by calling deleteThis() (via retryOrBail()). Our finishWritingAndDelete() call preserves that logic. That commit also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries. [^1]: The fact that severClosed() is called for both successful and problematic outcomes prevents TunnelStateData from properly handling certain (rare) errors. We tried to fix that as well, but the changes quickly snowballed, so we left a few XXXs instead.

inventvenkat · 2024-01-17T10:32:51Z

Hi Maintainers, When can I expect an official backport for this fix?

kinkie · 2024-01-18T10:34:40Z

Hi Maintainers, When can I expect an official backport for this fix?

let's see how clean the backport is

Stop calling retryOrBail() when the tunneled Squid-server connection (that we have committed to use) closes. Our retryOrBail() is dedicated to handling errors. Most[^1] serverClosed() calls are _not_ related to errors because our tunneling code abuses asynchronous connection closure callbacks for TunnelStateData work termination. Depending on the transaction details (e.g., TLS interception vs. true CONNECT), calling retryOrBail() on these no-error code paths may result in retryOrBail() "catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors. Most tunneling errors are already detailed, and retryOrBail() does not have enough information to correctly detail the remaining ones anyway. Removing this retryOrBail() call selects the arguably lesser evil. The client-Squid connection closure callback, clientClosed(), already uses the same logic. This change does not resurrect Bug 5132 fixed by commit 752fa20 that added the now-replaced retryOrBail() call to serverClosed(). That commit fixed the leak by calling deleteThis() (via retryOrBail()). Our finishWritingAndDelete() call preserves that logic. That commit also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries. [^1]: The fact that severClosed() is called for both successful and problematic outcomes prevents TunnelStateData from properly handling certain (rare) errors. We tried to fix that as well, but the changes quickly snowballed, so we left a few XXXs instead.

rousskov · 2024-01-18T14:07:39Z

Hi Maintainers, When can I expect an official backport for this fix?

let's see how clean the backport is

Clarification: What Francesco means is "let's see whether cherry-picking this PR commit into v6 succeeds and the result passes CI tests". It probably does -- you can track #1641 progress that attempts to merge these changes automatically. If it fails, one can certainly backport these changes manually, but that (obviously) requires more work.

squidadm · 2024-01-18T18:15:35Z

queued for backport to v6

Stop calling retryOrBail() when the tunneled Squid-server connection (that we have committed to use) closes. Our retryOrBail() is dedicated to handling errors. Most[^1] serverClosed() calls are _not_ related to errors because our tunneling code abuses asynchronous connection closure callbacks for TunnelStateData work termination. Depending on the transaction details (e.g., TLS interception vs. true CONNECT), calling retryOrBail() on these no-error code paths may result in retryOrBail() "catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors. Most tunneling errors are already detailed, and retryOrBail() does not have enough information to correctly detail the remaining ones anyway. Removing this retryOrBail() call selects the arguably lesser evil. The client-Squid connection closure callback, clientClosed(), already uses the same logic. This change does not resurrect Bug 5132 fixed by commit 752fa20 that added the now-replaced retryOrBail() call to serverClosed(). That commit fixed the leak by calling deleteThis() (via retryOrBail()). Our finishWritingAndDelete() call preserves that logic. That commit also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries. [^1]: The fact that severClosed() is called for both successful and problematic outcomes prevents TunnelStateData from properly handling certain (rare) errors. We tried to fix that as well, but the changes quickly snowballed, so we left a few XXXs instead.

shehenshah14 · 2024-04-30T08:36:47Z

Team, this issue exists in v6.6 which is verified and now officially part of AWS images.
Fixed version 6.7+ is not yet been included in any of AWS images.

Is there a work around for this issue for squid v6.6?
That would be of great help. Thanks in advance

rousskov · 2024-05-02T19:08:53Z

Is there a work around for this issue for squid v6.6?

Not that I know of (unless you accept an untested source code patch as a workaround).

rousskov commented Nov 30, 2023

View reviewed changes

rousskov added the M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels label Dec 5, 2023

squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Dec 10, 2023

squid-anubis closed this Dec 11, 2023

kinkie added the backport-to-v6 maintainer has approved these changes for v6 backporting label Jan 18, 2024

squidadm removed the backport-to-v6 maintainer has approved these changes for v6 backporting label Jan 18, 2024

dbr3200 mentioned this pull request Apr 29, 2024

[Package Update Request] - Upgrade Squid V6.6 to Higher Versions amazonlinux/amazon-linux-2023#708

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 #1608

Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 #1608

rousskov commented Nov 30, 2023

rousskov left a comment

rousskov Nov 30, 2023

rousskov Nov 30, 2023

inventvenkat commented Jan 17, 2024

kinkie commented Jan 18, 2024

rousskov commented Jan 18, 2024 •

edited

Loading

squidadm commented Jan 18, 2024

shehenshah14 commented Apr 30, 2024

rousskov commented May 2, 2024

Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 #1608

Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 #1608

Conversation

rousskov commented Nov 30, 2023

Footnotes

rousskov left a comment

Choose a reason for hiding this comment

rousskov Nov 30, 2023

Choose a reason for hiding this comment

rousskov Nov 30, 2023

Choose a reason for hiding this comment

inventvenkat commented Jan 17, 2024

kinkie commented Jan 18, 2024

rousskov commented Jan 18, 2024 • edited Loading

squidadm commented Jan 18, 2024

shehenshah14 commented Apr 30, 2024

rousskov commented May 2, 2024

rousskov commented Jan 18, 2024 •

edited

Loading