-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HAproxy in front of postfix causing warning in postfix log after upgrade from 1.9.8 to 2.0.3 #187
Comments
Hmmm that's an interesting case. It's a side effect of this fix: fe4abe6 ("BUG/MEDIUM: connections: Don't call shutdown() if we want to disable linger.") which was merged between 2.0.2 and 2.0.3. It was also backported into 1.9.9. We don't have enough state to represent all desired conditions it seems. Or probably we'd need to be able to explicitly mention that certain checks must explicitly shutdown before closing, even if they did not send a message. The attached patch addresses this for me (based on the network capture), could you please confirm that it's OK with your postfix server as well ? 0001-WIP-BUG-MEDIUM-checks-make-sure-to-close-nicely-in-S.patch.txt |
hi, I applied your patch to a 2.0.3 source directory, rebuilt, installed, restarted HAProxy. Still get the warning messages in postfix. Happy to try out other things if you like. |
Hmm indeed, I can reproduce this here as well: Before the patch:
After the patch:
And indeed, we don't consume the pending data so we don't even see if the server responds. I'm digging. |
So it's in the case where the server doesn't respond fast enough that it fails, if it presents the banner immediately it's OK with the patch:
Are you sure your postfix doesn't have a delayed banner ? Otherwise it would be useful to take a capture between haproxy and the postfix server. I'm taking the patch above for 2.0.4 as it's needed anyway. This will eliminate some moving parts when you want to continue the troubleshooting. |
…speak In SMTP, MySQL and PgSQL checks, we're supposed to finish with a message to politely quit the server, otherwise some of them will log some errors. This is the case with Postfix as reported in GH issue #187. Since commit fe4abe6 ("BUG/MEDIUM: connections: Don't call shutdown() if we want to disable linger.") we are a bit more aggressive on outgoing connection closure and checks were not prepared for this. This patch makes the 3 checks above disable the linger_risk for these checks so that we close cleanly, with the side effect that it will leave some TIME_WAIT connections behind (hence why it should not be generalized to all checks). It's worth noting that in issue #187 it's mentioned that this patch doesn't seem to be sufficient for Postfix, however based only on local network activity this looks OK, so maybe this will need to be improved later. Given that the patch above was backported to 2.0 and 1.9, this one should as well.
I'm marking this as fixed as it's in 2.0.4 now. I'd appreciate it if you could try again. I'm fine with tagging it bogus again if it's not enough in your tests. |
Hi, sorry I am traveling and will have to look at it this weekend. We use postscreen in front of the main postfix smtp server process. Postscreen can indeed introduce a delay as described in this document: http://www.postfix.org/POSTSCREEN_README.html |
Is there a configuration option to be able to support the delayed response on the check? Postscreen must go through it’s checks which can take some time. |
You can use two methods for this, either increase the "inter" argument of the server line, which defines both the check interval and the default check timeout, or use "timeout check" to forcefully increase the check timeout without changing the interval. But given that in your case the delay will be long for each check you'd rather just increase "inter". Please note that it's very possible that postscreen will reject haproxy's check as it sends immediately without waiting for the banner. If this causes trouble, then I suggest that you switch tcp-checks which allow to define sequences of connect/send/expect. In this case you'd basically just expect a 2xx then send a QUIT and close. But if smtpchk works fine, no need to complicate your setup. |
Thanks for that note! I’ll keep an eye out for that postscreen nuance you mentioned, though I would imagine it would have affected us by now. |
I installed 2.0.4, updated config to have inter 5000, but the warnings still appear in postfix log, albeit less frequently.. But it seems like every check. I will run the packet trace and look deeper into it soon. |
On Fri, Aug 09, 2019 at 09:44:53PM -0700, amit777 wrote:
I installed 2.0.4, updated config to have inter 5000, but the warnings still
appear in postfix log, albeit less frequently.. But it seems like every
check. I will run the packet trace and look deeper into it soon.
Thank you. Please also take a capture of the version that does not cause
these warnings so that we can compare.
Willy
|
…speak In SMTP, MySQL and PgSQL checks, we're supposed to finish with a message to politely quit the server, otherwise some of them will log some errors. This is the case with Postfix as reported in GH issue haproxy#187. Since commit fe4abe6 ("BUG/MEDIUM: connections: Don't call shutdown() if we want to disable linger.") we are a bit more aggressive on outgoing connection closure and checks were not prepared for this. This patch makes the 3 checks above disable the linger_risk for these checks so that we close cleanly, with the side effect that it will leave some TIME_WAIT connections behind (hence why it should not be generalized to all checks). It's worth noting that in issue haproxy#187 it's mentioned that this patch doesn't seem to be sufficient for Postfix, however based only on local network activity this looks OK, so maybe this will need to be improved later. Given that the patch above was backported to 2.0 and 1.9, this one should as well. (cherry picked from commit 5488a62) Signed-off-by: Willy Tarreau <w@1wt.eu>
Hi! I am Amit's (@amit777 ) colleague. I have attached two files. elk1-haproxy.dat.gz is the tcpdump capture for haproxy version 1.9.10 (working fine) elk1-haproxy.dat.gz I had a look at the packets I noticed that version 2.0.7 sends RST,ACK packets frequently and they correspond with the postscreen warning messages. The origin of those aborted connects is an AWS Cloudwatch health check which connects to TCP port 25 and then closes the connection without sending any data. AWS Cloudwatch -> port:25 --> haproxy --> port:10024 (postscreen) Hope this helps. |
…speak In SMTP, MySQL and PgSQL checks, we're supposed to finish with a message to politely quit the server, otherwise some of them will log some errors. This is the case with Postfix as reported in GH issue haproxy#187. Since commit fe4abe6 ("BUG/MEDIUM: connections: Don't call shutdown() if we want to disable linger.") we are a bit more aggressive on outgoing connection closure and checks were not prepared for this. This patch makes the 3 checks above disable the linger_risk for these checks so that we close cleanly, with the side effect that it will leave some TIME_WAIT connections behind (hence why it should not be generalized to all checks). It's worth noting that in issue haproxy#187 it's mentioned that this patch doesn't seem to be sufficient for Postfix, however based only on local network activity this looks OK, so maybe this will need to be improved later. Given that the patch above was backported to 2.0 and 1.9, this one should as well. (cherry picked from commit 5488a62) Signed-off-by: Willy Tarreau <w@1wt.eu> (cherry picked from commit 5570678) Signed-off-by: Willy Tarreau <w@1wt.eu>
Any news on this issue ? |
We are currently using haproxy-2.3.6-7851701. I don't see the issue any longer. |
Thanks. So I'm closing the issue. |
I'm not sure if this is a bug. There seems to be a change in behavior that I'm having trouble tracking down. I get the following warning in my postfix maillog:
"warning: haproxy read: lost connection"
Is there a change in behavior between 1.9.8 and 2.0.3 regarding how TCP checks, and specifically smtpchk works?
haproxy -vvv
haproxy.cfg snippet
The text was updated successfully, but these errors were encountered: