Wait until s_client connects #5964

bernd-edlinger · 2018-04-16T09:57:07Z

This seems to fix the test failures on my win32/cygwin machine.

dot-asm · 2018-04-16T10:15:06Z

util/perl/TLSProxy/Proxy.pm

-        kill(3, $self->{real_serverpid});
-        die "s_client didn't try to connect\n";
+    while (!$fdset->can_read(1)) {
+        sleep(0.1);


sleep(0.1) does exactly nothing, at least if it doesn't sleep for 0.1 seconds. Secondly, whole point of that condition is to avoid "infinite timeout" in case s_client doesn't connect. And suggestion is to honour "infinite timeout" with infinite loop? Original rationale was that if 1 second[!] is not enough for client to figure it out, then no amount of seconds would do. I mean, even 1 second is too much for a computer. But if 1 second[!] is not enough for some [antivirus-infested perhaps?], then increase the time in can_read...

As for "antivirus-infested". It's not typo, i.e. I'm not suggesting that computer is virus-infested, but rather that some antivirus programs are overly invasive and can act as real disruption to anything that is not "normal user activity" in vendor's eye.

Yes, mcaffee. It eats >50% CPU. But I cannot uninstall that silliness by company rules ....

OTOH the wait has also an unlimited timeout. Anyway when the test stalls for too long
the user will press CTRL+C, there is certainly no fixed guarantee that a connect must
succeed in less than one second on a system under stress.

How about removing this loop completely?
In 1.1.0 this was just accept:

# Wait for incoming connection from client my $client_sock; if(!($client_sock = $self->{proxy_sock}->accept())) { warn "Failed accepting incoming connection: $!\n"; return 0; } print "Connection opened\n";

How about removing this loop completely?

What loop? There was no loop, there was just a wait for socket to become accept-able for one second. This is counter-measure against intermittent 10 minutes travis timeouts on osx. I mean it's about choice between 1 second delay and termination of hang test alone, or 10 minutes of pure waste of time with no tests executed past hang one. As already said, just replace can_read(1) with can_read(3) or 5...

How often does the test fail in this way on OSX.

We get ci for free and it's only appropriate to waste as little as possible. 12% is totally unnecessary waste in my view. If somebody reckons differently, they are free to approve. My approval wouldn't be decisive anyway :-)

I think I've misinterpreted one of your remarks yesterday. At least now, when I revisit this, my mental picture is a little bit different. There are two "should-never-happen" timeouts on osx and only one is mitigated by can_read. Scale of impact is different. The one that is mitigated by can_read is 10 minutes timeout for whole build (replaced with 1 wasted second and single failed test). The second one is ~1 min timeout and single failed test. So it's not unreasonable to say that second case could as well be ~1 min. And one can as well argue that since these intermittent failures are not really our fault, but underlying OS, we don't have to feel bad about waste, because it's rather provider's responsibility than ours. Or one can wonder if there is anything one can do to turn that second minute to 1 second :-)

Yet I for one find 1 min timeout ridiculously high. Well, as implied, even 1 second is overly generous and actual timeout could and should be viewed as abnormal. This kind of means that I would consider antivirus as actual infestation :-)

Either way, since my approval wouldn't be decisive, I'm still inclined to let somebody else to decide.

No problem, the test result is useless anyway when this happens too often.
I see most of the time a completely different timeout, in connecting from proxy->s_server:

ACCEPT [::1]:52821 Server responds on [::1]:52821 unable to connect: Operation timed out

Yes, it's referred to as "second" in my remark. If it's actually so that one can no longer observe "first" one, then one can wonder if running select prior accept actually solves the problem. I mean s_server goes straight to accept so that it spends all the waiting time in kernel. And so did TLSProxy. Now proxy waits for socket to become accept-able and then calls accept...

one can wonder if running select prior accept actually solves the problem.

"work around" ought to be more appropriate term in the context.

bernd-edlinger · 2018-04-26T16:36:01Z

Merged. Thanks!

Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #5964)

dot-asm reviewed Apr 16, 2018

View reviewed changes

Wait max. 60 seconds for s_client to connect

d1ad0da

bernd-edlinger force-pushed the wait_until_client_connects branch from d5921be to d1ad0da Compare April 16, 2018 12:17

mattcaswell approved these changes Apr 25, 2018

View reviewed changes

mattcaswell added the approval: done This pull request has the required number of approvals label Apr 25, 2018

mattcaswell added this to the 1.1.1 milestone Apr 25, 2018

bernd-edlinger closed this Apr 26, 2018

levitte pushed a commit that referenced this pull request Apr 26, 2018

Wait max. 60 seconds for s_client to connect

bc66144

Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #5964)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait until s_client connects #5964

Wait until s_client connects #5964

bernd-edlinger commented Apr 16, 2018

dot-asm Apr 16, 2018 •

edited

bernd-edlinger Apr 16, 2018

bernd-edlinger Apr 16, 2018

bernd-edlinger Apr 16, 2018

dot-asm Apr 16, 2018

dot-asm Apr 16, 2018

dot-asm Apr 17, 2018

bernd-edlinger Apr 17, 2018

dot-asm Apr 17, 2018

dot-asm Apr 17, 2018

bernd-edlinger commented Apr 26, 2018

Wait until s_client connects #5964

Wait until s_client connects #5964

Conversation

bernd-edlinger commented Apr 16, 2018

dot-asm Apr 16, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bernd-edlinger commented Apr 26, 2018

dot-asm Apr 16, 2018 •

edited