-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement socket option recvtos, recvtclass, recvttl and pktoptions #1950
Implement socket option recvtos, recvtclass, recvttl and pktoptions #1950
Conversation
Implement socket options recvtclass, recvtos, recvttl and pktoptions. Document the implemented socket options, new types and message formats. The options recvtclass, recvtos and recvttl are boolean options that when activated (true) for a socket will cause ancillary data to be received through recvmsg(). That is for packet oriented sockets (UDP and SCTP). The required options for this feature were recvtclass and recvtos, and recvttl was only added to test that the ancillary data parsing handled multiple data items in one message correctly. These options does not work on Windows since ancillary data is not handled by the Winsock2 API. For stream sockets (TCP) there is no clear connection between a received packet and what is returned when reading data from the socket, so recvmsg() is not useful. It is possible to get the same ancillary data through a getsockopt() call with the IPv6 socket option IPV6_PKTOPTIONS, on Linux named IPV6_2292PKTOPTIONS after the now obsoleted RFC where it originated. (unfortunately RFC 3542 that obsoletes it explicitly undefines this way to get packet ancillary data from a stream socket) Linux also has got a way to get packet ancillary data for IPv4 TCP sockets through a getsockopt() call with IP_PKTOPTIONS, which appears to be Linux specific. This implementation uses a flag field in the inet_drv.c socket internal data that records if any setsockopt() call with recvtclass, recvtos or recvttl (IPV6_RECVTCLASS, IP_RECVTOS or IP_RECVTTL) has been activated. If so recvmsg() is used instead of recvfrom(). Ancillary data is delivered to the application by a new return tuple format from gen_udp:recv/2,3 containing a list of ancillary data tuples [{tclass,TCLASS} | {tos,TOS} | {ttl,TTL}], as returned by recvmsg(). For a socket in active mode a new message format, containing the ancillary data list, delivers the data in the same way. For gen_sctp the ancillary data is delivered in the same way, except that the gen_sctp return tuple format already contained an ancillary data list so there are just more possible elements when using these socket options. Note that the active mode message format has got an extra tuple level for the ancillary data compared to what is now implemented gen_udp. The gen_sctp active mode format was considered to be the odd one - now all tuples containing ancillary data are flat, except for gen_sctp active mode. Note that testing has not shown that Linux SCTP sockets deliver any ancillary data for these socket options, so it is probably not implemented yet. Remains to be seen what FreeBSD does... For gen_tcp inet:getopts([pktoptions]) will deliver the latest received ancillary data for any activated socket option recvtclass, recvtos or recvttl, on platforms where IP_PKTOPTIONS is defined for an IPv4 socket, or where IPV6_PKTOPTIONS or IPV6_2292PKTOPTIONS is defined for an IPv6 socket. It will be delivered as a list of ancillary data items in the same way as for gen_udp (and gen_sctp). On some platforms, e.g the BSD:s, when you activate IP_RECVTOS you get ancillary data tagged IP_RECVTOS with the TOS value, but on Linux you get ancillary data tagged IP_TOS with the TOS value. Linux follows the style of RFC 2292, and the BSD:s use an older notion. For RFC 2292 that defines the IP_PKTOPTIONS socket option it is more logical to tag the items with the tag that is the item's, than with the tag that defines that you want the item. Therefore this implementation translates all BSD style ancillary data tags to the corresponding Linux style data tags, so the application will only see the tags 'tclass', 'tos' and 'ttl' on all platforms.
would it make sense to make the ancillary data take the 4th position in the tuple so that the packet data could always be obtained through the |
@ferd: That would make perfect sense, but then UDP passive mode would differ from SCTP passive mode. The current suggestion has the same style for UPD passive and active mode, and SCTP active mode making SCTP active mode a single odd bird, with arity one less and an extra 2-tuple but still the same field order. We can not change the SCTP receive tuples for backwards compatibility reasons. Changing to your sensible suggestion would swap two arguments with respect to SCTP. The question is how annoying this difference will be compared to how practical it would be to be able to use Current suggestion: {ok, {PeerIP, PeerPort, AncData, Data}} % UDP passive mode
{udp, Socket, PeerIP, PeerPort, AncData, Data} % UDP active mode
{ok, {PeerIP, PeerPort, AncData, Data}} % SCTP passive mode
{sctp, Socket, PeerIP, PeerPort, {AncData, Data}} % SCTP active mode Placing {ok, {PeerIP, PeerPort, Data, AncData}} % UDP passive mode
{udp, Socket, PeerIP, PeerPort, Data, AncData} % UDP active mode
{ok, {PeerIP, PeerPort, AncData, Data}} % SCTP passive mode
{sctp, Socket, PeerIP, PeerPort, {AncData, Data}} % SCTP active mode |
We discussed the alternative to use {AncData, Data} in UDP active mode to make it look like SCTP, but |
Yeah, I'm fine with the current form as well (I do not know if I'll ever use that feature), I'm fine with the rationale given here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok with me.
If netcode is being refactored what about UDP recv from multiple processes on same socket? Currently the implementation of a owner PID assigned to a socket breaks key load balancing functionality on UDP sockets. Basically it does not allow another pid to call recv on the same UDP socket. And this functionality of multiple recv calls on 1 udp socket is allowed by the kernel to load balance UDP. |
@vans163: Well, the inet driver is still the bottleneck since it uses port locking so only one VM thread at the time executes in the driver. To factor that out is a huge rewrite of the inet driver. This particular correction is due to an internal customer that needs a specific low level feature. Besides this small feature addition, we are in the process of writing a new socket API that is aimed towards being more like the Unix C API, and for this it certainly would worth looking in to to have multiple processes in a receive call on the same socket... |
Note that there were more commits added to this branch merged into 'maint' after this Pull Request closed. The whole branch (latest first): |
Since this feature causes an API change we would like community review of it.
There is a detailed explanation in the commit comment. The commit contains source code and documentation changes.
Test cases are still missing, but I will work on that.