Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement socket option recvtos, recvtclass, recvttl and pktoptions #1950

Conversation

RaimoNiskanen
Copy link
Contributor

Since this feature causes an API change we would like community review of it.

There is a detailed explanation in the commit comment. The commit contains source code and documentation changes.

Test cases are still missing, but I will work on that.

Implement socket options recvtclass, recvtos, recvttl and pktoptions.

Document the implemented socket options, new types and message formats.

The options recvtclass, recvtos and recvttl are boolean options that
when activated (true) for a socket will cause ancillary data to be
received through recvmsg().  That is for packet oriented sockets
(UDP and SCTP).

The required options for this feature were recvtclass and recvtos,
and recvttl was only added to test that the ancillary data parsing
handled multiple data items in one message correctly.

These options does not work on Windows since ancillary data
is not handled by the Winsock2 API.

For stream sockets (TCP) there is no clear connection between
a received packet and what is returned when reading data from
the socket, so recvmsg() is not useful.  It is possible to get
the same ancillary data through a getsockopt() call with
the IPv6 socket option IPV6_PKTOPTIONS, on Linux named
IPV6_2292PKTOPTIONS after the now obsoleted RFC where it originated.
(unfortunately RFC 3542 that obsoletes it explicitly undefines
 this way to get packet ancillary data from a stream socket)

Linux also has got a way to get packet ancillary data for IPv4
TCP sockets through a getsockopt() call with IP_PKTOPTIONS,
which appears to be Linux specific.

This implementation uses a flag field in the inet_drv.c socket
internal data that records if any setsockopt() call with recvtclass,
recvtos or recvttl (IPV6_RECVTCLASS, IP_RECVTOS or IP_RECVTTL)
has been activated.  If so recvmsg() is used instead of recvfrom().

Ancillary data is delivered to the application by a new return
tuple format from gen_udp:recv/2,3 containing a list of
ancillary data tuples [{tclass,TCLASS} | {tos,TOS} | {ttl,TTL}],
as returned by recvmsg().  For a socket in active mode a new
message format, containing the ancillary data list, delivers
the data in the same way.

For gen_sctp the ancillary data is delivered in the same way,
except that the gen_sctp return tuple format already contained
an ancillary data list so there are just more possible elements
when using these socket options.  Note that the active mode
message format has got an extra tuple level for the ancillary
data compared to what is now implemented gen_udp.
The gen_sctp active mode format was considered to be the odd one
- now all tuples containing ancillary data are flat,
except for gen_sctp active mode.

Note that testing has not shown that Linux SCTP sockets deliver
any ancillary data for these socket options, so it is probably
not implemented yet.  Remains to be seen what FreeBSD does...

For gen_tcp inet:getopts([pktoptions]) will deliver the latest
received ancillary data for any activated socket option recvtclass,
recvtos or recvttl, on platforms where IP_PKTOPTIONS is defined
for an IPv4 socket, or where IPV6_PKTOPTIONS or IPV6_2292PKTOPTIONS
is defined for an IPv6 socket.  It will be delivered as a
list of ancillary data items in the same way as for gen_udp
(and gen_sctp).

On some platforms, e.g the BSD:s, when you activate IP_RECVTOS
you get ancillary data tagged IP_RECVTOS with the TOS value,
but on Linux you get ancillary data tagged IP_TOS with the
TOS value.  Linux follows the style of RFC 2292, and the BSD:s
use an older notion.  For RFC 2292 that defines the IP_PKTOPTIONS
socket option it is more logical to tag the items with the
tag that is the item's, than with the tag that defines that you
want the item.  Therefore this implementation translates all
BSD style ancillary data tags to the corresponding Linux style
data tags, so the application will only see the tags 'tclass',
'tos' and 'ttl' on all platforms.
@IngelaAndin IngelaAndin added team:PS Assigned to OTP team PS and removed team:PS Assigned to OTP team PS labels Sep 4, 2018
@ferd
Copy link
Contributor

ferd commented Sep 4, 2018

would it make sense to make the ancillary data take the 4th position in the tuple so that the packet data could always be obtained through the element(3, T) call?

@RaimoNiskanen
Copy link
Contributor Author

RaimoNiskanen commented Sep 5, 2018

@ferd: That would make perfect sense, but then UDP passive mode would differ from SCTP passive mode. The current suggestion has the same style for UPD passive and active mode, and SCTP active mode making SCTP active mode a single odd bird, with arity one less and an extra 2-tuple but still the same field order.

We can not change the SCTP receive tuples for backwards compatibility reasons.

Changing to your sensible suggestion would swap two arguments with respect to SCTP. The question is how annoying this difference will be compared to how practical it would be to be able to use element(Constant, ReceiveTuple) instead of element(tuple_size(ReceiveTuple), ReceiveTuple).

Current suggestion:

         {ok, {PeerIP, PeerPort,  AncData, Data}} % UDP passive mode
{udp,  Socket, PeerIP, PeerPort,  AncData, Data}  % UDP active mode
         {ok, {PeerIP, PeerPort,  AncData, Data}} % SCTP passive mode
{sctp, Socket, PeerIP, PeerPort, {AncData, Data}} % SCTP active mode

Placing AncData last in the tuple for UDP:

         {ok, {PeerIP, PeerPort,  Data, AncData}} % UDP passive mode
{udp,  Socket, PeerIP, PeerPort,  Data, AncData}  % UDP active mode
         {ok, {PeerIP, PeerPort,  AncData, Data}} % SCTP passive mode
{sctp, Socket, PeerIP, PeerPort, {AncData, Data}} % SCTP active mode

@IngelaAndin
Copy link
Contributor

We discussed the alternative to use {AncData, Data} in UDP active mode to make it look like SCTP, but
the argument was made that this might be surprising to UDP user code that suddenly could
match content that is different from what it used to be. In the SCTP case there will always be some
AncData and it would be nice if SCTP active mode would be like the proposed UDP active mode, but that breaks backwards compatibility. Of course we could make an comparability option and phase in such a return tuple. That is a little more work, but maybe the "clean" thing to do. I think the {AncData, Data} in SCTP was a mistake. As always we have to live with legacy. Question is if this legacy is worth
phasing out or do we live with it?

@ferd
Copy link
Contributor

ferd commented Sep 5, 2018

Yeah, I'm fine with the current form as well (I do not know if I'll ever use that feature), I'm fine with the rationale given here.

Copy link
Contributor

@KennethL KennethL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok with me.

@vans163
Copy link
Contributor

vans163 commented Sep 5, 2018

If netcode is being refactored what about UDP recv from multiple processes on same socket? Currently the implementation of a owner PID assigned to a socket breaks key load balancing functionality on UDP sockets. Basically it does not allow another pid to call recv on the same UDP socket. And this functionality of multiple recv calls on 1 udp socket is allowed by the kernel to load balance UDP.

@RaimoNiskanen
Copy link
Contributor Author

@vans163: Well, the inet driver is still the bottleneck since it uses port locking so only one VM thread at the time executes in the driver. To factor that out is a huge rewrite of the inet driver.

This particular correction is due to an internal customer that needs a specific low level feature.

Besides this small feature addition, we are in the process of writing a new socket API that is aimed towards being more like the Unix C API, and for this it certainly would worth looking in to to have multiple processes in a receive call on the same socket...

@RaimoNiskanen RaimoNiskanen merged commit 6a556ff into erlang:maint Sep 17, 2018
@RaimoNiskanen
Copy link
Contributor Author

Note that there were more commits added to this branch merged into 'maint' after this Pull Request closed.

The whole branch (latest first):
9674ece Elaborate the disclaimer for 'pktoptions'
2ad4b8f Improve platform filter
abb972b Fix endianness bug for CMSG parsing
6053725 Write testcases for recvtos and friends
84e1631 Fix term buffer overflow bug
These are the two commits visible in this Pull Request:
6a556ff Fix documentation due to feedback
64853dc Implement socket option recvtos and friends

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants