Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 #1608

Conversation

rousskov
Copy link
Contributor

Stop calling retryOrBail() when the tunneled Squid-server connection
(that we have committed to use) closes. Our retryOrBail() is dedicated
to handling errors. Most1 serverClosed() calls are not related to
errors because our tunneling code abuses asynchronous connection closure
callbacks for TunnelStateData work termination. Depending on the
transaction details (e.g., TLS interception vs. true CONNECT), calling
retryOrBail() on these no-error code paths may result in retryOrBail()
"catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors.

Most tunneling errors are already detailed, and retryOrBail() does not
have enough information to correctly detail the remaining ones anyway.
Removing this retryOrBail() call selects the arguably lesser evil.

The client-Squid connection closure callback, clientClosed(), already
uses the same logic.

This change does not resurrect Bug 5132 fixed by commit 752fa20 that
added the now-replaced retryOrBail() call to serverClosed(). That commit
fixed the leak by calling deleteThis() (via retryOrBail()). Our
finishWritingAndDelete() call preserves that logic. That commit also
claimed to allow more retries, but that claim was a mistake: To-server
closure callback registration (e.g. commitToServer()) bans retries.

Footnotes

  1. The fact that severClosed() is called for both successful and
    problematic outcomes prevents TunnelStateData from properly handling
    certain (rare) errors. We tried to fix that as well, but the changes
    quickly snowballed, so we left a few XXXs instead.

Stop calling retryOrBail() when the tunneled Squid-server connection
(that we have committed to use) closes. Our retryOrBail() is dedicated
to handling errors. Most[^1] serverClosed() calls are _not_ related to
errors because our tunneling code abuses asynchronous connection closure
callbacks for TunnelStateData work termination. Depending on the
transaction details (e.g., TLS interception vs. true CONNECT), calling
retryOrBail() on these no-error code paths may result in retryOrBail()
"catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors.

Most tunneling errors are already detailed, and retryOrBail() does not
have enough information to correctly detail the remaining ones anyway.
Removing this retryOrBail() call selects the arguably lesser evil.

The client-Squid connection closure callback, clientClosed(), already
uses the same logic.

This change does not resurrect Bug 5132 fixed by commit 752fa20 that
added the now-replaced retryOrBail() call to serverClosed(). That commit
fixed the leak by calling deleteThis() (via retryOrBail()). Our
finishWritingAndDelete() call preserves that logic. That commit also
claimed to allow more retries, but that claim was a mistake: To-server
closure callback registration (e.g. commitToServer()) bans retries.

[^1]: The fact that severClosed() is called for both successful and
problematic outcomes prevents TunnelStateData from properly handling
certain (rare) errors. We tried to fix that as well, but the changes
quickly snowballed, so we left a few XXXs instead.
Copy link
Contributor Author

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description: ... commit 752fa20 ... also claimed to allow more retries, but that claim was a mistake: To-server closure callback registration (e.g. commitToServer()) bans retries.

@eduard-bagdasaryan has spotted that false claim much earlier than I did, but the corresponding PR had already been merged (and I probably did not even realize that Eduard's comment was invalidating a PR description claim).

// serverClosed() and clientClosed() callbacks will continue to mishandle
// those rare closures as regular ones, and access.log records will continue
// to lack some tunneling error indicators/details.
//
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try to address both XXXs above. This PR represents a third attempt to fix this bug. The two previous attempts snowballed too much. I abandoned the first attempt after realizing that fixing tunnel reading code would require switching internal buffers to SBuf, triggering lots of complex and out-of-scope changes. The second attempt was a lot more modest -- just fixing close() callers -- but that snowballed as well because there are many of them and some are rather problematic. There are just too many problem in this tunneling code... We will fix them, but doing that using minimal PRs requires attacking a few low-level problems first.

Even this small PR that only changes serverClosed() to call the last portion of the old retryOrBail() code path has to select the lesser of the two evils: After these changes, common successful outcomes should be fixed, but shutdown and similar rare "external" premature Squid-server connection closures may not be detailed at all (as opposed to being detailed incorrectly).

This comment does not request any changes.

@@ -324,12 +327,48 @@ void
TunnelStateData::clientClosed()
{
client.noteClosure();
finishWritingAndDelete(server);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More debugging aside, this PR does not change how clientClosed() works. We just moved the old code into finishWritingAndDelete() and rearranged it a bit to add debugging and XXXs.

This comment does not request any changes.

@rousskov rousskov added the M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels label Dec 5, 2023
squid-anubis pushed a commit that referenced this pull request Dec 10, 2023
Stop calling retryOrBail() when the tunneled Squid-server connection
(that we have committed to use) closes. Our retryOrBail() is dedicated
to handling errors. Most[^1] serverClosed() calls are _not_ related to
errors because our tunneling code abuses asynchronous connection closure
callbacks for TunnelStateData work termination. Depending on the
transaction details (e.g., TLS interception vs. true CONNECT), calling
retryOrBail() on these no-error code paths may result in retryOrBail()
"catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors.

Most tunneling errors are already detailed, and retryOrBail() does not
have enough information to correctly detail the remaining ones anyway.
Removing this retryOrBail() call selects the arguably lesser evil.

The client-Squid connection closure callback, clientClosed(), already
uses the same logic.

This change does not resurrect Bug 5132 fixed by commit 752fa20 that
added the now-replaced retryOrBail() call to serverClosed(). That commit
fixed the leak by calling deleteThis() (via retryOrBail()). Our
finishWritingAndDelete() call preserves that logic. That commit also
claimed to allow more retries, but that claim was a mistake: To-server
closure callback registration (e.g. commitToServer()) bans retries.

[^1]: The fact that severClosed() is called for both successful and
problematic outcomes prevents TunnelStateData from properly handling
certain (rare) errors. We tried to fix that as well, but the changes
quickly snowballed, so we left a few XXXs instead.
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Dec 10, 2023
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels labels Dec 11, 2023
@inventvenkat
Copy link

Hi Maintainers, When can I expect an official backport for this fix?

@kinkie kinkie added the backport-to-v6 maintainer has approved these changes for v6 backporting label Jan 18, 2024
@kinkie
Copy link
Contributor

kinkie commented Jan 18, 2024

Hi Maintainers, When can I expect an official backport for this fix?

let's see how clean the backport is

squidadm pushed a commit to squidadm/squid that referenced this pull request Jan 18, 2024
Stop calling retryOrBail() when the tunneled Squid-server connection
(that we have committed to use) closes. Our retryOrBail() is dedicated
to handling errors. Most[^1] serverClosed() calls are _not_ related to
errors because our tunneling code abuses asynchronous connection closure
callbacks for TunnelStateData work termination. Depending on the
transaction details (e.g., TLS interception vs. true CONNECT), calling
retryOrBail() on these no-error code paths may result in retryOrBail()
"catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors.

Most tunneling errors are already detailed, and retryOrBail() does not
have enough information to correctly detail the remaining ones anyway.
Removing this retryOrBail() call selects the arguably lesser evil.

The client-Squid connection closure callback, clientClosed(), already
uses the same logic.

This change does not resurrect Bug 5132 fixed by commit 752fa20 that
added the now-replaced retryOrBail() call to serverClosed(). That commit
fixed the leak by calling deleteThis() (via retryOrBail()). Our
finishWritingAndDelete() call preserves that logic. That commit also
claimed to allow more retries, but that claim was a mistake: To-server
closure callback registration (e.g. commitToServer()) bans retries.

[^1]: The fact that severClosed() is called for both successful and
problematic outcomes prevents TunnelStateData from properly handling
certain (rare) errors. We tried to fix that as well, but the changes
quickly snowballed, so we left a few XXXs instead.
@rousskov
Copy link
Contributor Author

rousskov commented Jan 18, 2024

Hi Maintainers, When can I expect an official backport for this fix?

let's see how clean the backport is

Clarification: What Francesco means is "let's see whether cherry-picking this PR commit into v6 succeeds and the result passes CI tests". It probably does -- you can track #1641 progress that attempts to merge these changes automatically. If it fails, one can certainly backport these changes manually, but that (obviously) requires more work.

@squidadm squidadm removed the backport-to-v6 maintainer has approved these changes for v6 backporting label Jan 18, 2024
@squidadm
Copy link
Collaborator

queued for backport to v6

yadij pushed a commit that referenced this pull request Jan 19, 2024
Stop calling retryOrBail() when the tunneled Squid-server connection
(that we have committed to use) closes. Our retryOrBail() is dedicated
to handling errors. Most[^1] serverClosed() calls are _not_ related to
errors because our tunneling code abuses asynchronous connection closure
callbacks for TunnelStateData work termination. Depending on the
transaction details (e.g., TLS interception vs. true CONNECT), calling
retryOrBail() on these no-error code paths may result in retryOrBail()
"catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors.

Most tunneling errors are already detailed, and retryOrBail() does not
have enough information to correctly detail the remaining ones anyway.
Removing this retryOrBail() call selects the arguably lesser evil.

The client-Squid connection closure callback, clientClosed(), already
uses the same logic.

This change does not resurrect Bug 5132 fixed by commit 752fa20 that
added the now-replaced retryOrBail() call to serverClosed(). That commit
fixed the leak by calling deleteThis() (via retryOrBail()). Our
finishWritingAndDelete() call preserves that logic. That commit also
claimed to allow more retries, but that claim was a mistake: To-server
closure callback registration (e.g. commitToServer()) bans retries.

[^1]: The fact that severClosed() is called for both successful and
problematic outcomes prevents TunnelStateData from properly handling
certain (rare) errors. We tried to fix that as well, but the changes
quickly snowballed, so we left a few XXXs instead.
@shehenshah14
Copy link

Team, this issue exists in v6.6 which is verified and now officially part of AWS images.
Fixed version 6.7+ is not yet been included in any of AWS images.

Is there a work around for this issue for squid v6.6?
That would be of great help. Thanks in advance

@rousskov
Copy link
Contributor Author

rousskov commented May 2, 2024

Is there a work around for this issue for squid v6.6?

Not that I know of (unless you accept an untested source code patch as a workaround).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-merged https://github.com/measurement-factory/anubis#pull-request-labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants