Make BIO connect to allow multiple IP addresses #11971

DDvO · 2020-05-27T10:36:30Z

When doing CLI-based tests with the new CMP client and mock server I faced strange
Connection refused errors when using non-blocking I/O with timeout enabled.

It took me quite a while to find out that this only happened when connecting to, e.g., localhost (but not when using IPv4 addresses such as 127.0.0.1) and only when /etc/hosts contains both IPv4 and IPv6 addresses for localhost and when IPv6 is preferred.

It turns out that conn_state() in crypto/bio/bss_conn.c is simply not flexible enough to handle alternative IP addresses on connect (BIO_C_DO_STATE_MACHINE). This made BIO_connect_retry() fail on the first retry when the host is localhost and the timeout parameter is non-zero.

The patch given fixes this problem by making sure that all IP addresses (of any type) are tried in case a hostname (i.e., DNS name) is resolved to more than one IP address.

A workaround at least in my case is to comment out in /etc/hosts/ the line

::1             localhost

levitte · 2020-05-27T11:08:26Z

I find it weird to do this in the blocked state... and I have a hard time understanding why you'd have a configuration where one IP version is blocked while the other isn't. But there's new stuff going on that I frankly don't always understand... back when I was doing network management, a block was a block was a block.

DDvO · 2020-05-27T11:33:48Z

I find it weird to do this in the blocked state... >

For whatever reason, I got the issue only in non-blocking mode (when BIO_connect_retry() is given a timeout), but there might be situations where the fix is of interest also for blocking mode, and at least it should not hurt then.

and I have a hard time understanding why you'd have a configuration where one IP version is blocked while the other isn't.

That's not the point here.
I'm not really a network guy, but it appears to me that for non-blocking BIOs the very basic HTTP server in apps/lib/http_server.c accepts only IPv4 connections, or there must be some other reason why at least on some Linux systems IPv6 connections to the loopback device are not accepted while IPv4 works fine.

DDvO · 2020-05-27T11:34:01Z

But there's new stuff going on that I frankly don't always understand... back when I was doing network management, a block was a block was a block.

BTW, bss_conn.c and its siblings are not the most beautiful and readable piece of code.
And I have a hard time understanding why all that retry business is needed after all,
in particular why even a blocking connect sometimes needs to be retried, which required some ugly workarounds and lead to an apparent bug in BIO_connect_retry() - see #11449.

crypto/bio/bss_conn.c

bernd-edlinger · 2020-05-27T13:57:48Z

I think this should be back-ported to 1.1.1

kroeckx · 2020-05-27T18:32:49Z

I think you're just using the API wrong. On failure, the application calling BIO_connect() should use the next address, and that it what for instance s_client does in init_client(). It iterates over the addresses.

kroeckx · 2020-05-27T18:35:11Z

This also most likely seem to cause errors, since the address family of the socket you've created with BIO_socket() doesn't actually match with what BIO_connect() is trying to do now

kroeckx · 2020-05-27T18:35:49Z

Or am mixing up the APIs

bernd-edlinger · 2020-05-27T18:38:08Z

I may be wrong as well....
I think the error handling in the blocking case seems to pick the next address in a list.
The same should happen in the non-blocking case.

kroeckx · 2020-05-27T18:48:13Z

Clearly I was mixing up APIs.

That code at least looks weird. I think it's missing a BIO_sock_should_retry() call. I think we can be in 3 states: success, failure, still waiting. And it seems now it's only success or failure.

DDvO · 2020-05-27T18:50:24Z

@kroeckx, BIO_connect() seems to operate on a specific address only, while BIO_do_connect() should be more flexible AFAICS, but their documentation is pretty poor. Nothing is written there about the iterations potentially needed over several IP addresses, and it would be pretty strange to me if the application had to do the switch from one such address to the next. And even if so, there should be a properly documented API function for doing that.

DDvO · 2020-05-27T18:52:52Z

So it seems we agree the IP address iteration should be handled internally (like the fix now make sure)?

bernd-edlinger · 2020-05-27T18:52:57Z

see the case BIO_CONN_S_CONNECT:
that is where the BIO_connect happens,
in the case when BIO_sock_should_retry returns true,
enter the BIO_CONN_S_BLOCKED_CONNECT state.
unclear why the s_client does not use this API at all.

kroeckx · 2020-05-27T18:56:36Z

I think the BIO_CONN_S_BLOCKED_CONNECT is at least a confusing name, I think it's waiting for the connect to succeed there. I was also expecting that if you ran out of addresses you'd go to BIO_CONN_S_CONNECT_ERROR

bernd-edlinger · 2020-05-27T18:57:00Z

Yes, agree, maybe move the BIO_clear_retry_flags(b); before the
if ((c->addr_iter = BIO_ADDRINFO_next(c->addr_iter)) != NULL) {

bernd-edlinger · 2020-05-27T18:58:41Z

However while a connection to localhost will always fail quickly,
so no noticeable delay, when the wrong protocol is tried,
a connection to a remote node, may take longer, usually 1 minute,
until it fails, and then you will get complaints...

DDvO · 2020-05-27T19:00:03Z

A quick look at s_client.c does not really give enlightenment.
But meanwhile I found the iteration of BIO_connect() @kroeckx has been referring to: it's buried in apps/lib/s_socket.c and rather ugly, so certainly not something an ordinary app programmer should be bothered with to do.

bernd-edlinger · 2020-05-27T19:03:24Z

Yeah, s_client does not use this API, but it would be a good idea to use this there.
s_client just picks the first IPv4/IPv6 address, so it solves the same problem,
in a different way.

DDvO · 2020-05-27T19:06:53Z

s_client also does try alternative addresses, via init_client(), which is in s_socket.c as I just wrote.

DDvO · 2020-05-27T19:11:30Z

Yeah, s_client does not use this API, but it would be a good idea to use this there.

I also have the impression it would be better if s_client used BIO_do_connect(), with the improvement given here. Or even better, used the rather new BIO_connect_retry(), which I introduced along with the generalized HTTP client.

bernd-edlinger · 2020-05-28T07:38:12Z

I think the commit message should mention BIO_do_connect,
to avoid the impression that BIO_connect should try different addresses.

bernd-edlinger · 2020-05-28T07:57:14Z

Or even better, used the rather new BIO_connect_retry()

You should fix the BIO_socket_wait function before you do that:
This tv.tv_sec = (long)(max_time - now); /* might overflow */
cannot overflow, IMHO, and I don't understand why the type case is necessary at all.
But select(fd + 1 can overflow,
and especrially this openssl_fdset(fd, &confds);
will crash if fd > 1023, and why not check for fd == -1 ?

bernd-edlinger · 2020-05-28T08:02:01Z

Hmm, actually, I don't really see this function in libcrypto.
What is the point in doing non-blocking I/O when the function is designed to
only handle one socket at a time?

bernd-edlinger · 2020-05-28T08:16:41Z

Anyway, this PR is okay, also for 1.1.1,
please change the commit message, to mention BIO_do_connect and
move BIO_clear_retry_flags(b); to line 192.

kroeckx · 2020-05-28T08:17:15Z

I also don't see the point of BIO_connect_retry(). It seems to turn the non-blocking case in a blocking case. Maybe the behaviour should just change depending on BIO_set_nbio()?

DDvO · 2020-05-28T12:56:23Z

I think the commit message should mention BIO_do_connect,
to avoid the impression that BIO_connect should try different addresses.

Done.

DDvO · 2020-05-28T12:59:40Z

Or even better, used the rather new BIO_connect_retry()

You should fix the BIO_socket_wait function before you do that:
This tv.tv_sec = (long)(max_time - now); /* might overflow */
cannot overflow, IMHO, and I don't understand why the type case is necessary at all.

As far as I recall, the type cast was needed for Windows builds (AppVeyor),
and depending on the size of long and time_t in general this might overflow.

But select(fd + 1 can overflow,
and especrially this openssl_fdset(fd, &confds);
will crash if fd > 1023, and why not check for fd == -1 ?

So I've added the following input check to the function:

    if (fd < 0 || fd >= FD_SETSIZE - 1)
        return -1;

crypto/bio/b_sock.c

DDvO · 2020-05-28T13:21:18Z

I also don't see the point of BIO_connect_retry(). It seems to turn the non-blocking case in a blocking case. Maybe the behaviour should just change depending on BIO_set_nbio()?

The point of BIO_connect_retry() is two-fold:

add a timeout feature (which internally needs non-blocking)
deal with strange behavior of BIO_do_connect() (aka BIO_do_handshake()) as detailed in the comments included in its code

This is mentioned in its documentation:

BIO_connect_retry() connects via the given B<bio>, retrying BIO_do_connect()
until success or a timeout or error condition is reached.

crypto/bio/bss_conn.c

bernd-edlinger

OK, for master and 1.1.1

Backport of openssl#11971

t8m

This makes sense to me.

openssl-machine · 2020-05-30T08:00:15Z

This pull request is ready to merge

Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org> (Merged from #11971)

Backport of #11971 Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> (Merged from #11989)

DDvO · 2020-06-01T07:28:40Z

Merged - thanks all involved!

richsalz · 2020-06-01T15:28:30Z

This seems like a change that would be noticed. manpage and/or CHANGES entry?

DDvO · 2020-06-02T11:08:29Z

This seems like a change that would be noticed. manpage and/or CHANGES entry?

I've made a new issue since this PR is closed: #12017

bernd-edlinger · 2020-06-02T14:01:22Z

crypto/bio/bss_conn.c

+                    /*
+                     * if there are more addresses to try, do that first
+                     */
+                    BIO_closesocket(b->num);


This is copied from above.
b->num should be initialized to INVALID_SOCKET.
since it takes a risk that the socket is close twice.
I think both places should clear that value since the
BIO_socket could fail, and the following BIO_reset
will close the socket a second time.

We need either new PR or add it to #12017

bernd-edlinger reviewed May 27, 2020

View reviewed changes

crypto/bio/bss_conn.c Outdated Show resolved Hide resolved

DDvO force-pushed the fix_connect_multiple_ipaddr branch from 2d0dfa8 to 7293586 Compare May 28, 2020 12:54

bernd-edlinger reviewed May 28, 2020

View reviewed changes

crypto/bio/b_sock.c Outdated Show resolved Hide resolved

DDvO force-pushed the fix_connect_multiple_ipaddr branch from 7293586 to b7b3fc8 Compare May 28, 2020 13:11

DDvO force-pushed the fix_connect_multiple_ipaddr branch from b7b3fc8 to f8e8dad Compare May 28, 2020 13:14

DDvO mentioned this pull request May 28, 2020

Add range check for 'fd' argument of BIO_socket_wait() #11986

Closed

bernd-edlinger reviewed May 28, 2020

View reviewed changes

crypto/bio/bss_conn.c Outdated Show resolved Hide resolved

Make BIO_do_connect() and friends handle multiple IP addresses

33f4812

DDvO force-pushed the fix_connect_multiple_ipaddr branch from f8e8dad to 33f4812 Compare May 28, 2020 14:38

bernd-edlinger approved these changes May 28, 2020

View reviewed changes

DDvO added a commit to siemens/openssl that referenced this pull request May 28, 2020

Make BIO_do_connect() and friends handle multiple IP addresses

e10a0ea

Backport of openssl#11971

DDvO mentioned this pull request May 28, 2020

Make BIO_do_connect() and friends handle multiple IP addresses also in 1.1.1 #11989

Closed

DDvO requested a review from kroeckx May 28, 2020 17:08

DDvO force-pushed the fix_connect_multiple_ipaddr branch from c3f43a7 to 33f4812 Compare May 28, 2020 19:03

t8m approved these changes May 29, 2020

View reviewed changes

t8m added approval: done This pull request has the required number of approvals branch: master Merge to master branch labels May 29, 2020

DDvO removed the request for review from kroeckx May 29, 2020 11:36

openssl-machine added approval: ready to merge The 24 hour grace period has passed, ready to merge and removed approval: done This pull request has the required number of approvals labels May 30, 2020

openssl-machine pushed a commit that referenced this pull request Jun 1, 2020

Make BIO_do_connect() and friends handle multiple IP addresses

dc18e4d

Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org> (Merged from #11971)

DDvO closed this Jun 1, 2020

DDvO mentioned this pull request Jun 2, 2020

Consolidate doc of BIO_do_connect() and its alias BIO_do_handshake() #12017

Closed

2 tasks

bernd-edlinger reviewed Jun 2, 2020

View reviewed changes

Make BIO connect to allow multiple IP addresses #11971

Make BIO connect to allow multiple IP addresses #11971

Conversation

DDvO commented May 27, 2020

levitte commented May 27, 2020

DDvO commented May 27, 2020 • edited

DDvO commented May 27, 2020 • edited

bernd-edlinger commented May 27, 2020

kroeckx commented May 27, 2020

kroeckx commented May 27, 2020

kroeckx commented May 27, 2020

bernd-edlinger commented May 27, 2020

kroeckx commented May 27, 2020

DDvO commented May 27, 2020

DDvO commented May 27, 2020

bernd-edlinger commented May 27, 2020

kroeckx commented May 27, 2020

bernd-edlinger commented May 27, 2020

bernd-edlinger commented May 27, 2020

DDvO commented May 27, 2020 • edited

bernd-edlinger commented May 27, 2020

DDvO commented May 27, 2020

DDvO commented May 27, 2020 • edited

bernd-edlinger commented May 28, 2020

bernd-edlinger commented May 28, 2020 • edited

bernd-edlinger commented May 28, 2020

bernd-edlinger commented May 28, 2020

kroeckx commented May 28, 2020 via email

DDvO commented May 28, 2020

DDvO commented May 28, 2020

DDvO commented May 28, 2020

bernd-edlinger left a comment

Choose a reason for hiding this comment

t8m left a comment

Choose a reason for hiding this comment

openssl-machine commented May 30, 2020

DDvO commented Jun 1, 2020

richsalz commented Jun 1, 2020

DDvO commented Jun 2, 2020

bernd-edlinger Jun 2, 2020

Choose a reason for hiding this comment

t8m Jun 3, 2020

Choose a reason for hiding this comment

DDvO commented May 27, 2020 •

edited

DDvO commented May 27, 2020 •

edited

DDvO commented May 27, 2020 •

edited

DDvO commented May 27, 2020 •

edited

bernd-edlinger commented May 28, 2020 •

edited