Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around linux DGRAM reconnect misbehaviour #5120

Conversation

RaimoNiskanen
Copy link
Contributor

@RaimoNiskanen RaimoNiskanen commented Aug 13, 2021

In inet_drv.c I have added a simple connect call with AF_UNSPEC to clear the socket address before connecting.

For socket.erl I have added the possibility to use "native" addresses, so the same can be done from the compatibility module gen_udp_socket.erl. To be committed...

@RaimoNiskanen RaimoNiskanen linked an issue Aug 13, 2021 that may be closed by this pull request
@RaimoNiskanen RaimoNiskanen marked this pull request as draft August 13, 2021 14:41
@RaimoNiskanen RaimoNiskanen requested a review from bmk August 13, 2021 14:41
@RaimoNiskanen RaimoNiskanen self-assigned this Aug 13, 2021
@rickard-green rickard-green added the team:PS Assigned to OTP team PS label Aug 16, 2021
@RaimoNiskanen RaimoNiskanen force-pushed the raimo/work-around-linux-dgram-reconnect/GH-5092/OTP-17559 branch from 05d8d89 to 40d5209 Compare August 16, 2021 12:59
@RaimoNiskanen
Copy link
Contributor Author

I wrote a test case.

It turns out Linux does more fishy stuff... If I bind to the wildcard address with port 0 (requesting an ephemeral port), check which port I got, then connect to 127.0.0.0:53 using a clear with AF_UNSPEC length 0 first, the source port can actually be changed by the Linux network stack. But if I bind to the wildcard address with an explicit port number, the source port does not change during clearing destination reconnect...

@RaimoNiskanen RaimoNiskanen force-pushed the raimo/work-around-linux-dgram-reconnect/GH-5092/OTP-17559 branch from 40d5209 to ef4271f Compare August 16, 2021 14:05
@RoadRunnr
Copy link
Contributor

It turns out Linux does more fishy stuff... If I bind to the wildcard address with port 0 (requesting an ephemeral port), check which port I got, then connect to 127.0.0.0:53 using a clear with AF_UNSPEC length 0 first, the source port can actually be changed by the Linux network stack. But if I bind to the wildcard address with an explicit port number, the source port does not change during clearing destination reconnect...

I don't think this a bug, the behavior is documented in man 2 connect:

Some protocol sockets (e.g., TCP sockets as well as datagram sockets in the UNIX and Internet domains) may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC; thereafter, the socket can be connected to another address. (AF_UNSPEC is supported on Linux since kernel 2.2.)

There is also a old bug report about this in the kernels bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=6646
It does have a final verdict from DaveM:

This behavior is intentional and will not change.

@RaimoNiskanen
Copy link
Contributor Author

RaimoNiskanen commented Aug 16, 2021

@RoadRunnr: Bummer!

I just spent a few hours on registering myself at the appropriate Linux mailing list, and writing a pretty enough C program that demonstrates the bug.

On my Ubuntu 18 the phrasing is:

Connectionless sockets may dissolve the association by connecting to
an address with the sa_family member of sockaddr set to AF_UNSPEC (supported on
Linux since kernel 2.2).

So, nothing in that about "thereafter, the socket can be connected to another address".

It seems they have chosen to classify their queer behaviour as a documentation bug.
I am not surprised.
:-(

@RaimoNiskanen RaimoNiskanen force-pushed the raimo/work-around-linux-dgram-reconnect/GH-5092/OTP-17559 branch from ef4271f to d8d70be Compare August 18, 2021 11:54
@RaimoNiskanen RaimoNiskanen changed the base branch from master to maint August 18, 2021 11:55
@RaimoNiskanen RaimoNiskanen marked this pull request as ready for review August 18, 2021 12:47
@RaimoNiskanen RaimoNiskanen force-pushed the raimo/work-around-linux-dgram-reconnect/GH-5092/OTP-17559 branch 2 times, most recently from b259495 to b11c2de Compare August 27, 2021 13:18
Check that all fields in the sockaddr() value are part
of the address for the address family.

I have myself once too many used #{family =>inet, address => loopback}
and then gotten #{family => inet, addr => any} since the address
field is named 'addr' so 'address' is ignored and the default
for 'addr' is 'any' ({0,0,0,0}).
@RaimoNiskanen RaimoNiskanen force-pushed the raimo/work-around-linux-dgram-reconnect/GH-5092/OTP-17559 branch from b11c2de to aebf963 Compare August 30, 2021 12:28
Also, adjust some inconsistent error returns / exceptions.
@RaimoNiskanen RaimoNiskanen force-pushed the raimo/work-around-linux-dgram-reconnect/GH-5092/OTP-17559 branch from aebf963 to b8f6206 Compare August 31, 2021 13:20
@RaimoNiskanen RaimoNiskanen merged commit 10e1c57 into erlang:maint Sep 2, 2021
@RaimoNiskanen RaimoNiskanen deleted the raimo/work-around-linux-dgram-reconnect/GH-5092/OTP-17559 branch September 2, 2021 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DNS resolution fails due to einval error from gen_udp:connect
3 participants