Skip to content

Add support for Additional Ethernet Capabilities TLV (Frame Preemption) #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

vladimiroltean
Copy link
Contributor

The Linux kernel gained support for configuring the IEEE 802.3 MAC Merge layer via ethtool netlink:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=f3c6e128936e11e62d0af3c52f756194d79cf2e2
The intended usage for this feature is to advertise and negotiate the Frame Preemption feature with the link partner via the Additional Ethernet Capabilities (IEEE 802.3-2018 clause 79.3.7) TLV.
This pull request adds support to lldpad and lldptool for this new TLV, and for talking to the kernel via the ethtool genetlink family. To that end, the lldpad program now links with libnl-genl-3.0 in addition to the libnl-3.0 it used so far.

This is a straight copy-paste from
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
as of commit a7b87d2a31dc ("Merge branch
'mlxsw-add-support-of-latency-tlv'").

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Use these definitions in preference to toolchain-provided ones, since
the MAC Merge layer has a brand new UAPI which is not available anywhere.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Similar to other lldpad modules, the IEEE 802.3 one needs a remote
change notification method, to parse the TLVs that the link partner has
advertised.

This doesn't exist, so write one, which for the moment just parses the
IEEE 802.3 Organizationally Specific TLVs that are already supported in
one form or another by lldpad.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
In the Linux kernel, the state associated with what's communicated via
the Additional Ethernet Capabilities TLV is held in the interface's
MAC Merge (MM) layer. This is accessed through the ETHTOOL_MSG_MM_GET
and ETHTOOL_MSG_MM_SET netlink messages. We need the GET to query the
kernel state in order to figure out what to advertise, and the SET to
reconfigure ourselves after we've figured out what the link partner has
advertised.

The lldpad program currently links with libnl-3.0, but not with the
generic netlink extension of the library (libnl-genl-3.0) which is
required for talking to the ethtool netlink interface.

Create a getter function and 3 setters in lldp_ethtool.{c,h} which
correspond with the API that will be needed by future changes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
IEEE 802.3-2018 clause 79.3.7 Additional Ethernet Capabilities TLV
defines a 2 octet payload with 4 fields:

- preemption capability support
- preemption capability status
- preemption capability active
- additional fragment size

The additional fragment size is the simplest - what the link partner
advertises is what the local device must use.

The purpose of this TLV is to enable TX preemption (i.o.w., to allow
preemptible packets to be sent). Each link partner decides when to
enable TX preemption for themself, based on what was advertised.
We do not control what the link partner does/enables.

This kind of handshake is necessary because preemptible packets use a
custom set of Start-of-Frame-Delimiter values (generically called SMD -
Start-of-mPacket-Delimiter). Link partners which lack a MAC Merge layer
will just see these packets as error frames.

My intention is to have a plug-and-play system which ends up enabling
the maximum set of capabilities, with no configuration. In fact, to the
best of my knowledge, this is kind of the only thing that's possible to
achieve anyway.

When we support the MAC Merge layer and the link partner supports it
too (confusingly called "preemption capability support" in IEEE 802.3),
we go ahead and try to enable preemption in the TX direction. The link
partner may do that too or may do something else.

The preemption verification is a feature through which the port, once TX
preemption was enabled, will generate some special SMD-V frames and wait
for SMD-R replies from the link partner. When an SMD-R frame comes back,
the verification state machine comples, and TX preemption is not only
enabled, but also active.

Using the verification feature allows us to ensure that, even though we
enabled TX preemption without explicit agreement from the LP, we have a
backpressure mechanism which prevents us from sending packets that the
LP is simply going to drop as invalid.

To support a link partner which does something similar to us, we need to
enable the RX side of the pMAC (the entity which processes preemptible
packets and SMD-V/SMD-R verification frames) as soon as we see that the
LP has advertised support for preemption. The LP might be transmitting
SMD-V frames already (it saw our TLV first). Luckily, IEEE defines the
verifyLimit constant of 3 SMD-V retries until a SMD-R is received.
The interval between retries is verifyTime (ranges from 1 to 128 ms).

I chose to max out the verifyTime on the local device, to give the link
partner the maximum possible amount of time - 384 ms - to respond.

To recap:

- When the "preemption capability support" bits from both our the LP
  device are true, we enable TX preemption + verification

- Doing that will set the "preemption capability status" bit in our
  subsequent TLVs. This is purely informational for the LP from our POV
  (we don't care what it does with it).

- If the verification succeeds (LP must respond outside of LLDP
  protocol), the "preemption capability active" bit will be set,
  otherwise it won't.

- It is perfectly valid to have TX preemption enabled/active in one
  direction but not the other

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Currently, lldpad advertises the addFragSize variable (minimum fragment
size for preemptible frames) based on the ETHTOOL_A_MM_RX_MIN_FRAG_SIZE
value reported by the kernel.

It may be useful to advertise to the link partner that we can only
receive a value larger than that. To that end, add getters and setters
for addFragSize, which are to be used with the addEthCaps TLV
(tlvid00120f07). The value of addFragSize is still compared to the
minimum requested by the kernel, and the larger of the two values is
used.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Usage model:

 ip link set eno0 up && ip link set swp0 up
 # Configure the interfaces to receive and transmit LLDPDUs
 lldptool -L -i eno0 adminStatus=rxtx
 lldptool -L -i swp0 adminStatus=rxtx
 # Enable the transmission of the TLVs
 lldptool -T -i eno0 -V addEthCap enableTx=yes
 lldptool -T -i swp0 -V addEthCap enableTx=yes
 # Query the received neighbor TLVs
 lldptool -i swp0 -t -n -V addEthCap
 Additional Ethernet Capabilities TLV
         Preemption capability supported
         Preemption capability not enabled
         Preemption capability not active
         Additional fragment size: 252 octets

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Document which options can be viewed and changed.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
case LLDP_8023_POWER_VIA_MDI:
case LLDP_8023_LINK_AGGREGATION:
case LLDP_8023_MAXIMUM_FRAME_SIZE:
return TLV_OK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we need to keep a reference to the TLV here and release it properly, or we need to introduce a new return code (something like TLV_OK_RELEASE) that will tell the rxProcessFrame to delete the TLV object. Otherwise, I think there could be a memory leak. Can you confirm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm the memory leaks. I don't think there is a need to store the TLV. We act upon it right away, and then let go. Will add free_unpkd_tlv() calls.

struct nl_msg *nlm;
int err;

err = ethtool_genl_start(ETHTOOL_MSG_MM_GET, ETHTOOL_A_MM_HEADER,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to keep creating / deleting the netlink socket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I didn't want an API that exposes a stateful struct ethtool_sock *, because that would be required with a reusable socket. I'll try and add it and see how it goes.

@@ -500,6 +699,9 @@ int ieee8023_rchange(struct port *port, struct lldp_agent *agent,
case LLDP_8023_LINK_AGGREGATION:
case LLDP_8023_MAXIMUM_FRAME_SIZE:
return TLV_OK;
case LLDP_8023_ADD_ETH_CAPS:
return ieee8023_rchange_add_eth_caps(bd, tlv->info + OUI_SUB_SIZE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before (TLV_OK leak issue).

@apconole
Copy link
Contributor

Thanks for this - I didn't see a changelog entry for it. This is a user visible feature, please include a changelog update.

I didn't get a chance to test this code - on kernels which don't support the new API interface, I think it should work okay. I saw your upstream kernel patch series, and I hope to get some time this week to test.

@vladimiroltean
Copy link
Contributor Author

Closing this PR and opening another one with the requested changes. I'd like to continue having the code in this form on an immutable branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants