Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait until s_client connects #5964

Conversation

bernd-edlinger
Copy link
Member

This seems to fix the test failures on my win32/cygwin machine.

kill(3, $self->{real_serverpid});
die "s_client didn't try to connect\n";
while (!$fdset->can_read(1)) {
sleep(0.1);
Copy link
Contributor

@dot-asm dot-asm Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep(0.1) does exactly nothing, at least if it doesn't sleep for 0.1 seconds. Secondly, whole point of that condition is to avoid "infinite timeout" in case s_client doesn't connect. And suggestion is to honour "infinite timeout" with infinite loop? Original rationale was that if 1 second[!] is not enough for client to figure it out, then no amount of seconds would do. I mean, even 1 second is too much for a computer. But if 1 second[!] is not enough for some [antivirus-infested perhaps?], then increase the time in can_read...

As for "antivirus-infested". It's not typo, i.e. I'm not suggesting that computer is virus-infested, but rather that some antivirus programs are overly invasive and can act as real disruption to anything that is not "normal user activity" in vendor's eye.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mcaffee. It eats >50% CPU. But I cannot uninstall that silliness by company rules ....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH the wait has also an unlimited timeout. Anyway when the test stalls for too long
the user will press CTRL+C, there is certainly no fixed guarantee that a connect must
succeed in less than one second on a system under stress.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about removing this loop completely?
In 1.1.0 this was just accept:

    # Wait for incoming connection from client
    my $client_sock;
    if(!($client_sock = $self->{proxy_sock}->accept())) {
        warn "Failed accepting incoming connection: $!\n";
        return 0;
    }

    print "Connection opened\n";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about removing this loop completely?

What loop? There was no loop, there was just a wait for socket to become accept-able for one second. This is counter-measure against intermittent 10 minutes travis timeouts on osx. I mean it's about choice between 1 second delay and termination of hang test alone, or 10 minutes of pure waste of time with no tests executed past hang one. As already said, just replace can_read(1) with can_read(3) or 5...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often does the test fail in this way on OSX.

We get ci for free and it's only appropriate to waste as little as possible. 12% is totally unnecessary waste in my view. If somebody reckons differently, they are free to approve. My approval wouldn't be decisive anyway :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've misinterpreted one of your remarks yesterday. At least now, when I revisit this, my mental picture is a little bit different. There are two "should-never-happen" timeouts on osx and only one is mitigated by can_read. Scale of impact is different. The one that is mitigated by can_read is 10 minutes timeout for whole build (replaced with 1 wasted second and single failed test). The second one is ~1 min timeout and single failed test. So it's not unreasonable to say that second case could as well be ~1 min. And one can as well argue that since these intermittent failures are not really our fault, but underlying OS, we don't have to feel bad about waste, because it's rather provider's responsibility than ours. Or one can wonder if there is anything one can do to turn that second minute to 1 second :-)

Yet I for one find 1 min timeout ridiculously high. Well, as implied, even 1 second is overly generous and actual timeout could and should be viewed as abnormal. This kind of means that I would consider antivirus as actual infestation :-)

Either way, since my approval wouldn't be decisive, I'm still inclined to let somebody else to decide.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, the test result is useless anyway when this happens too often.
I see most of the time a completely different timeout, in connecting from proxy->s_server:

ACCEPT [::1]:52821

Server responds on [::1]:52821

unable to connect: Operation timed out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's referred to as "second" in my remark. If it's actually so that one can no longer observe "first" one, then one can wonder if running select prior accept actually solves the problem. I mean s_server goes straight to accept so that it spends all the waiting time in kernel. And so did TLSProxy. Now proxy waits for socket to become accept-able and then calls accept...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one can wonder if running select prior accept actually solves the problem.

"work around" ought to be more appropriate term in the context.

@mattcaswell mattcaswell added the approval: done This pull request has the required number of approvals label Apr 25, 2018
@mattcaswell mattcaswell added this to the 1.1.1 milestone Apr 25, 2018
@bernd-edlinger
Copy link
Member Author

Merged. Thanks!

levitte pushed a commit that referenced this pull request Apr 26, 2018
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #5964)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: done This pull request has the required number of approvals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants