Implement socket option recvtos, recvtclass, recvttl and pktoptions #1950

RaimoNiskanen · 2018-09-04T09:13:59Z

Since this feature causes an API change we would like community review of it.

There is a detailed explanation in the commit comment. The commit contains source code and documentation changes.

Test cases are still missing, but I will work on that.

Implement socket options recvtclass, recvtos, recvttl and pktoptions. Document the implemented socket options, new types and message formats. The options recvtclass, recvtos and recvttl are boolean options that when activated (true) for a socket will cause ancillary data to be received through recvmsg(). That is for packet oriented sockets (UDP and SCTP). The required options for this feature were recvtclass and recvtos, and recvttl was only added to test that the ancillary data parsing handled multiple data items in one message correctly. These options does not work on Windows since ancillary data is not handled by the Winsock2 API. For stream sockets (TCP) there is no clear connection between a received packet and what is returned when reading data from the socket, so recvmsg() is not useful. It is possible to get the same ancillary data through a getsockopt() call with the IPv6 socket option IPV6_PKTOPTIONS, on Linux named IPV6_2292PKTOPTIONS after the now obsoleted RFC where it originated. (unfortunately RFC 3542 that obsoletes it explicitly undefines this way to get packet ancillary data from a stream socket) Linux also has got a way to get packet ancillary data for IPv4 TCP sockets through a getsockopt() call with IP_PKTOPTIONS, which appears to be Linux specific. This implementation uses a flag field in the inet_drv.c socket internal data that records if any setsockopt() call with recvtclass, recvtos or recvttl (IPV6_RECVTCLASS, IP_RECVTOS or IP_RECVTTL) has been activated. If so recvmsg() is used instead of recvfrom(). Ancillary data is delivered to the application by a new return tuple format from gen_udp:recv/2,3 containing a list of ancillary data tuples [{tclass,TCLASS} | {tos,TOS} | {ttl,TTL}], as returned by recvmsg(). For a socket in active mode a new message format, containing the ancillary data list, delivers the data in the same way. For gen_sctp the ancillary data is delivered in the same way, except that the gen_sctp return tuple format already contained an ancillary data list so there are just more possible elements when using these socket options. Note that the active mode message format has got an extra tuple level for the ancillary data compared to what is now implemented gen_udp. The gen_sctp active mode format was considered to be the odd one - now all tuples containing ancillary data are flat, except for gen_sctp active mode. Note that testing has not shown that Linux SCTP sockets deliver any ancillary data for these socket options, so it is probably not implemented yet. Remains to be seen what FreeBSD does... For gen_tcp inet:getopts([pktoptions]) will deliver the latest received ancillary data for any activated socket option recvtclass, recvtos or recvttl, on platforms where IP_PKTOPTIONS is defined for an IPv4 socket, or where IPV6_PKTOPTIONS or IPV6_2292PKTOPTIONS is defined for an IPv6 socket. It will be delivered as a list of ancillary data items in the same way as for gen_udp (and gen_sctp). On some platforms, e.g the BSD:s, when you activate IP_RECVTOS you get ancillary data tagged IP_RECVTOS with the TOS value, but on Linux you get ancillary data tagged IP_TOS with the TOS value. Linux follows the style of RFC 2292, and the BSD:s use an older notion. For RFC 2292 that defines the IP_PKTOPTIONS socket option it is more logical to tag the items with the tag that is the item's, than with the tag that defines that you want the item. Therefore this implementation translates all BSD style ancillary data tags to the corresponding Linux style data tags, so the application will only see the tags 'tclass', 'tos' and 'ttl' on all platforms.

ferd · 2018-09-04T14:29:10Z

would it make sense to make the ancillary data take the 4th position in the tuple so that the packet data could always be obtained through the element(3, T) call?

RaimoNiskanen · 2018-09-05T08:12:53Z

@ferd: That would make perfect sense, but then UDP passive mode would differ from SCTP passive mode. The current suggestion has the same style for UPD passive and active mode, and SCTP active mode making SCTP active mode a single odd bird, with arity one less and an extra 2-tuple but still the same field order.

We can not change the SCTP receive tuples for backwards compatibility reasons.

Changing to your sensible suggestion would swap two arguments with respect to SCTP. The question is how annoying this difference will be compared to how practical it would be to be able to use element(Constant, ReceiveTuple) instead of element(tuple_size(ReceiveTuple), ReceiveTuple).

Current suggestion:

         {ok, {PeerIP, PeerPort,  AncData, Data}} % UDP passive mode
{udp,  Socket, PeerIP, PeerPort,  AncData, Data}  % UDP active mode
         {ok, {PeerIP, PeerPort,  AncData, Data}} % SCTP passive mode
{sctp, Socket, PeerIP, PeerPort, {AncData, Data}} % SCTP active mode

Placing AncData last in the tuple for UDP:

         {ok, {PeerIP, PeerPort,  Data, AncData}} % UDP passive mode
{udp,  Socket, PeerIP, PeerPort,  Data, AncData}  % UDP active mode
         {ok, {PeerIP, PeerPort,  AncData, Data}} % SCTP passive mode
{sctp, Socket, PeerIP, PeerPort, {AncData, Data}} % SCTP active mode

IngelaAndin · 2018-09-05T10:18:37Z

We discussed the alternative to use {AncData, Data} in UDP active mode to make it look like SCTP, but
the argument was made that this might be surprising to UDP user code that suddenly could
match content that is different from what it used to be. In the SCTP case there will always be some
AncData and it would be nice if SCTP active mode would be like the proposed UDP active mode, but that breaks backwards compatibility. Of course we could make an comparability option and phase in such a return tuple. That is a little more work, but maybe the "clean" thing to do. I think the {AncData, Data} in SCTP was a mistake. As always we have to live with legacy. Question is if this legacy is worth
phasing out or do we live with it?

ferd · 2018-09-05T13:18:28Z

Yeah, I'm fine with the current form as well (I do not know if I'll ever use that feature), I'm fine with the rationale given here.

KennethL

Ok with me.

vans163 · 2018-09-05T14:04:15Z

If netcode is being refactored what about UDP recv from multiple processes on same socket? Currently the implementation of a owner PID assigned to a socket breaks key load balancing functionality on UDP sockets. Basically it does not allow another pid to call recv on the same UDP socket. And this functionality of multiple recv calls on 1 udp socket is allowed by the kernel to load balance UDP.

RaimoNiskanen · 2018-09-05T15:02:12Z

@vans163: Well, the inet driver is still the bottleneck since it uses port locking so only one VM thread at the time executes in the driver. To factor that out is a huge rewrite of the inet driver.

This particular correction is due to an internal customer that needs a specific low level feature.

Besides this small feature addition, we are in the process of writing a new socket API that is aimed towards being more like the Unix C API, and for this it certainly would worth looking in to to have multiple processes in a receive call on the same socket...

RaimoNiskanen · 2018-09-19T13:12:29Z

Note that there were more commits added to this branch merged into 'maint' after this Pull Request closed.

The whole branch (latest first):
9674ece Elaborate the disclaimer for 'pktoptions'
2ad4b8f Improve platform filter
abb972b Fix endianness bug for CMSG parsing
6053725 Write testcases for recvtos and friends
84e1631 Fix term buffer overflow bug
These are the two commits visible in this Pull Request:
6a556ff Fix documentation due to feedback
64853dc Implement socket option recvtos and friends

IngelaAndin added team:PS Assigned to OTP team PS and removed team:PS Assigned to OTP team PS labels Sep 4, 2018

IngelaAndin requested review from KennethL, proxyles, IngelaAndin, rickard-green and garazdawi September 4, 2018 12:11

IngelaAndin requested a review from uabboli September 4, 2018 15:13

garazdawi approved these changes Sep 5, 2018

View reviewed changes

Fix documentation due to feedback

6a556ff

KennethL approved these changes Sep 5, 2018

View reviewed changes

IngelaAndin approved these changes Sep 11, 2018

View reviewed changes

RaimoNiskanen merged commit 6a556ff into erlang:maint Sep 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement socket option recvtos, recvtclass, recvttl and pktoptions #1950

Implement socket option recvtos, recvtclass, recvttl and pktoptions #1950

RaimoNiskanen commented Sep 4, 2018

ferd commented Sep 4, 2018

RaimoNiskanen commented Sep 5, 2018 •

edited

Loading

IngelaAndin commented Sep 5, 2018

ferd commented Sep 5, 2018

KennethL left a comment

vans163 commented Sep 5, 2018

RaimoNiskanen commented Sep 5, 2018

RaimoNiskanen commented Sep 19, 2018

Implement socket option recvtos, recvtclass, recvttl and pktoptions #1950

Implement socket option recvtos, recvtclass, recvttl and pktoptions #1950

Conversation

RaimoNiskanen commented Sep 4, 2018

ferd commented Sep 4, 2018

RaimoNiskanen commented Sep 5, 2018 • edited Loading

IngelaAndin commented Sep 5, 2018

ferd commented Sep 5, 2018

KennethL left a comment

Choose a reason for hiding this comment

vans163 commented Sep 5, 2018

RaimoNiskanen commented Sep 5, 2018

RaimoNiskanen commented Sep 19, 2018

RaimoNiskanen commented Sep 5, 2018 •

edited

Loading