Skip to content

Commit

Permalink
ice: Add documentation for devlink-rate implementation
Browse files Browse the repository at this point in the history
Add documentation to a newly added devlink-rate feature. Provide some
examples on how to use the commands, which netlink attributes are
supported and descriptions of the attributes.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
  • Loading branch information
mwilczy authored and intel-lab-lkp committed Nov 15, 2022
1 parent c161509 commit aaa72a1
Showing 1 changed file with 115 additions and 0 deletions.
115 changes: 115 additions & 0 deletions Documentation/networking/devlink/ice.rst
Original file line number Diff line number Diff line change
Expand Up @@ -254,3 +254,118 @@ Users can request an immediate capture of a snapshot via the
0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$ devlink region delete pci/0000:01:00.0/device-caps snapshot 1
Devlink Rate
==========

The ``ice`` driver implements devlink-rate API. It allows for offload of
the Hierarchical QoS to the hardware. It enables user to group Virtual
Functions in a tree structure and assign supported parameters: tx_share,
tx_max, tx_priority and tx_weight to each node in a tree. So effectively
user gains an ability to control how much bandwidth is allocated for each
VF group. This is later enforced by the HW.

It is assumed that this feature is mutually exclusive with DCB performed
in FW and ADQ, or any driver feature that would trigger changes in QoS,
for example creation of the new traffic class. The driver will prevent DCB
or ADQ configuration if user started making any changes to the nodes using
devlink-rate API. To configure those features a driver reload is necessary.
Correspondingly if ADQ or DCB will get configured the driver won't export
hierarchy at all, or will remove the untouched hierarchy if those
features are enabled after the hierarchy is exported, but before any
changes are made.

This feature is also dependent on switchdev being enabled in the system.
It's required bacause devlink-rate requires devlink-port objects to be
present, and those objects are only created in switchdev mode.

If the driver is set to the switchdev mode, it will export internal
hierarchy the moment VF's are created. Root of the tree is always
represented by the node_0. This node can't be deleted by the user. Leaf
nodes and nodes with children also can't be deleted.

.. list-table:: Attributes supported
:widths: 15 85

* - Name
- Description
* - ``tx_max``
- maximum bandwidth to be consumed by the tree Node. Rate Limit is
an absolute number specifying a maximum amount of bytes a Node may
consume during the course of one second. Rate limit guarantees
that a link will not oversaturate the receiver on the remote end
and also enforces an SLA between the subscriber and network
provider.
* - ``tx_share``
- minimum bandwidth allocated to a tree node when it is not blocked.
It specifies an absolute BW. While tx_max defines the maximum
bandwidth the node may consume, the tx_share marks committed BW
for the Node.
* - ``tx_priority``
- allows for usage of strict priority arbiter among siblings. This
arbitration scheme attempts to schedule nodes based on their
priority as long as the nodes remain within their bandwidth limit.
Range 0-7. Nodes with priority 7 have the highest priority and are
selected first, while nodes with priority 0 have the lowest
priority. Nodes that have the same priority are treated equally.
* - ``tx_weight``
- allows for usage of Weighted Fair Queuing arbitration scheme among
siblings. This arbitration scheme can be used simultaneously with
the strict priority. Range 1-200. Only relative values mater for
arbitration.

``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case
nodes with the same priority form a WFQ subgroup in the sibling group
and arbitration among them is based on assigned weights.

.. code:: shell
# enable switchdev
$ devlink dev eswitch set pci/0000:4b:00.0 mode switchdev
# at this point driver should export internal hierarchy
$ echo 2 > /sys/class/net/ens785np0/device/sriov_numvfs
$ devlink port function rate show
pci/0000:4b:00.0/node_25: type node parent node_24
pci/0000:4b:00.0/node_24: type node parent node_0
pci/0000:4b:00.0/node_32: type node parent node_31
pci/0000:4b:00.0/node_31: type node parent node_30
pci/0000:4b:00.0/node_30: type node parent node_16
pci/0000:4b:00.0/node_19: type node parent node_18
pci/0000:4b:00.0/node_18: type node parent node_17
pci/0000:4b:00.0/node_17: type node parent node_16
pci/0000:4b:00.0/node_14: type node parent node_5
pci/0000:4b:00.0/node_5: type node parent node_3
pci/0000:4b:00.0/node_13: type node parent node_4
pci/0000:4b:00.0/node_12: type node parent node_4
pci/0000:4b:00.0/node_11: type node parent node_4
pci/0000:4b:00.0/node_10: type node parent node_4
pci/0000:4b:00.0/node_9: type node parent node_4
pci/0000:4b:00.0/node_8: type node parent node_4
pci/0000:4b:00.0/node_7: type node parent node_4
pci/0000:4b:00.0/node_6: type node parent node_4
pci/0000:4b:00.0/node_4: type node parent node_3
pci/0000:4b:00.0/node_3: type node parent node_16
pci/0000:4b:00.0/node_16: type node parent node_15
pci/0000:4b:00.0/node_15: type node parent node_0
pci/0000:4b:00.0/node_2: type node parent node_1
pci/0000:4b:00.0/node_1: type node parent node_0
pci/0000:4b:00.0/node_0: type node
pci/0000:4b:00.0/1: type leaf parent node_25
pci/0000:4b:00.0/2: type leaf parent node_25
# let's create some custom node
$ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0
# second custom node
$ devlink port function rate add pci/0000:4b:00.0/node_custom_1 parent node_custom
# reassign second VF to newly created branch
$ devlink port function rate set pci/0000:4b:00.0/2 parent node_custom_1
# assign tx_weight to the VF
$ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 5
# assign tx_share to the VF
$ devlink port function rate set pci/0000:4b:00.0/2 tx_share 500Mbps

0 comments on commit aaa72a1

Please sign in to comment.