New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait until s_client connects #5964
Wait until s_client connects #5964
Conversation
util/perl/TLSProxy/Proxy.pm
Outdated
kill(3, $self->{real_serverpid}); | ||
die "s_client didn't try to connect\n"; | ||
while (!$fdset->can_read(1)) { | ||
sleep(0.1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sleep(0.1) does exactly nothing, at least if it doesn't sleep for 0.1 seconds. Secondly, whole point of that condition is to avoid "infinite timeout" in case s_client doesn't connect. And suggestion is to honour "infinite timeout" with infinite loop? Original rationale was that if 1 second[!] is not enough for client to figure it out, then no amount of seconds would do. I mean, even 1 second is too much for a computer. But if 1 second[!] is not enough for some [antivirus-infested perhaps?], then increase the time in can_read...
As for "antivirus-infested". It's not typo, i.e. I'm not suggesting that computer is virus-infested, but rather that some antivirus programs are overly invasive and can act as real disruption to anything that is not "normal user activity" in vendor's eye.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, mcaffee. It eats >50% CPU. But I cannot uninstall that silliness by company rules ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH the wait has also an unlimited timeout. Anyway when the test stalls for too long
the user will press CTRL+C, there is certainly no fixed guarantee that a connect must
succeed in less than one second on a system under stress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about removing this loop completely?
In 1.1.0 this was just accept:
# Wait for incoming connection from client
my $client_sock;
if(!($client_sock = $self->{proxy_sock}->accept())) {
warn "Failed accepting incoming connection: $!\n";
return 0;
}
print "Connection opened\n";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about removing this loop completely?
What loop? There was no loop, there was just a wait for socket to become accept-able for one second. This is counter-measure against intermittent 10 minutes travis timeouts on osx. I mean it's about choice between 1 second delay and termination of hang test alone, or 10 minutes of pure waste of time with no tests executed past hang one. As already said, just replace can_read(1) with can_read(3) or 5...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does the test fail in this way on OSX.
We get ci for free and it's only appropriate to waste as little as possible. 12% is totally unnecessary waste in my view. If somebody reckons differently, they are free to approve. My approval wouldn't be decisive anyway :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've misinterpreted one of your remarks yesterday. At least now, when I revisit this, my mental picture is a little bit different. There are two "should-never-happen" timeouts on osx and only one is mitigated by can_read. Scale of impact is different. The one that is mitigated by can_read is 10 minutes timeout for whole build (replaced with 1 wasted second and single failed test). The second one is ~1 min timeout and single failed test. So it's not unreasonable to say that second case could as well be ~1 min. And one can as well argue that since these intermittent failures are not really our fault, but underlying OS, we don't have to feel bad about waste, because it's rather provider's responsibility than ours. Or one can wonder if there is anything one can do to turn that second minute to 1 second :-)
Yet I for one find 1 min timeout ridiculously high. Well, as implied, even 1 second is overly generous and actual timeout could and should be viewed as abnormal. This kind of means that I would consider antivirus as actual infestation :-)
Either way, since my approval wouldn't be decisive, I'm still inclined to let somebody else to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, the test result is useless anyway when this happens too often.
I see most of the time a completely different timeout, in connecting from proxy->s_server:
ACCEPT [::1]:52821
Server responds on [::1]:52821
unable to connect: Operation timed out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's referred to as "second" in my remark. If it's actually so that one can no longer observe "first" one, then one can wonder if running select prior accept actually solves the problem. I mean s_server goes straight to accept so that it spends all the waiting time in kernel. And so did TLSProxy. Now proxy waits for socket to become accept-able and then calls accept...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one can wonder if running select prior accept actually solves the problem.
"work around" ought to be more appropriate term in the context.
d5921be
to
d1ad0da
Compare
Merged. Thanks! |
Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #5964)
This seems to fix the test failures on my win32/cygwin machine.