-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore ECONNABORTED in accept(2) #404
Conversation
The accept(2) documentation indicate that the syscall should be restarted after this error.
Can one of the admins verify this patch? |
Hi, I recently upgraded the squid cache at work to OpenBSD 6.5 (and thus oldAccept FD 17, [::] [ job1]: ignoring: (53) Software caused After digging a bit in the source code and reading about ECONNABORTED Does this look ok ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that Squid should ignore ECONNABORTED
accept(2) errors. Thank you for fixing this!
FWIW, a similar fix was done in v2.6.STABLE13 (squid-cache/squid2@096e9e6) but it looks like the fix has not made it to Squid v3.
According to squid-cache/squid2@65d191b, you need to #define
ECONNABORTED on Windows. You do not need to test on Windows, but please port that v2 change as a part of this PR unless you are sure it is not needed. See compat/os/mswindows.h
.
Please see the inlined change requests for another simple adjustment.
I recently upgraded ... to OpenBSD 6.5 (and thus squid 4.6)
What OS/Squid versions were you running before? I wonder if this problem surfaced because of the OS rather than Squid upgrade. Knowing this may be useful for other admins.
After this, squid started locking up (and not accepting new connections) every few hours
Yes. That is expected in the official code that does not treat ECONNABORTED
specially.
with the error below logged in cache.log:
oldAccept FD 17, [::] [ job1]: ignoring: (53) Software caused connection abort
Hmm... AFAICT, the above "ignoring" line is never printed by the official code. It is printed by your PR! Perhaps you copy/pasted it from a patched Squid log? Knowing the exact bug symptoms may be useful for other admins...
The cache was running OpenBSD 6.4, which packaged squid 3.5.28
Yes I copied the wrong line from my cache.log. |
v2: accept rousskov suggestion to make the patch better Co-Authored-By: Alex Rousskov <rousskov@measurement-factory.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have now implicitly extended the scope to polish the existing errno/errcode checks it would be a good chance to fix the use of yoda style. IIRC the ancient compilers where it was a useful safety measure are no longer supported.
Also, the OpenBSD documentation and the Linux documentation differ a bit on how they describe accept() handling of errno. Linux has a fixed list of codes that may be relayed, whereas BSD manual specifically makes a point of saying that errno is passed through instead of being cleared. So these errors may be a case of pollution from somewhere else (eg inside the prior select(2) or poll(2) call). You may fine it interesting to test whether the issue disappears with errno=0 before the accept(2) syscall (where errcode is reset). |
ECONNABORTED can only be generated by accept(2) (at least on OpenBSD, but the Linux manual pages seem to agree). So it's not possible that the value is a leftovers from a previous system call. |
The way I've understood is that accept() triggers the SYN+ACK step for TCP. Squid logic separating the select() and accept() calls can lead to a (relatively) long delay between the client SYN and that SYN+ACK. At any point during that delay the client (or TCP stack itself) may use FIN or timeout the connection. |
Ok, so clients implementing the happy eyeballs algorithm and sending both v4 and v6 SYN paquets to the same squid process can cause this, especially since there are 2 different listening sockets for the 2 protocol versions there. |
This was the only area in this file where this style was used
Co-Authored-By: Amos Jeffries <yadij@users.noreply.github.com>
This was causing lockups on my squid cache at work. Upstreem PR: squid-cache/squid#404 ok sthen@, jca@
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing one of my two concerns. To avoid misunderstanding, I am repeating the other concern here:
According to squid-cache/squid2@65d191b, you need to #define
ECONNABORTED on Windows. You do not need to test on Windows, but please port that v2 change as a part of this PR unless you are sure it is not needed. See compat/os/mswindows.h
.
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my remaining concern.
OK to test |
@mherrb, I detailed the PR description (i.e. the future official commit message) a little. If you are OK with my changes, please let me know, and I will clear this PR for commit. Otherwise, please adjust to your liking and ping me. |
@rousskov OK by me. Thanks. |
An aborted connection attempt does not affect listening socket's ability to accept other connections. If the error is not ignored, Squid gets stuck after logging an oldAccept error like this one: oldAccept ...: (53) Software caused connection abort This bug fix was motivated by accept(2) changes in OpenBSD v6.5 that resulted in new ECONNABORTED errors under regular deployment conditions: openbsd/src@c255b5a
An aborted connection attempt does not affect listening socket's ability to accept other connections. If the error is not ignored, Squid gets stuck after logging an oldAccept error like this one: oldAccept ...: (53) Software caused connection abort This bug fix was motivated by accept(2) changes in OpenBSD v6.5 that resulted in new ECONNABORTED errors under regular deployment conditions: openbsd/src@c255b5a
An aborted connection attempt does not affect listening socket's ability to accept other connections. If the error is not ignored, Squid gets stuck after logging an oldAccept error like this one: oldAccept ...: (53) Software caused connection abort This bug fix was motivated by accept(2) changes in OpenBSD v6.5 that resulted in new ECONNABORTED errors under regular deployment conditions: openbsd/src@c255b5a
An aborted connection attempt does not affect listening socket's
ability to accept other connections. If the error is not ignored, Squid
gets stuck after logging an oldAccept error like this one:
This bug fix was motivated by accept(2) changes in OpenBSD v6.5 that
resulted in new ECONNABORTED errors under regular deployment conditions:
openbsd/src@c255b5a