Skip to content

Commit

Permalink
dpif-netdev: Assign ports to pmds on non-local numa node.
Browse files Browse the repository at this point in the history
Previously if there is no available (non-isolated) pmd on the numa node
for a port then the port is not polled at all. This can result in a
non-operational system until such time as nics are physically
repositioned. It is preferable to operate with a pmd on the 'wrong' numa
node albeit with lower performance. Local pmds are still chosen when
available.

Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
  • Loading branch information
2 people authored and blp committed Aug 2, 2017
1 parent e215018 commit c37813f
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 6 deletions.
21 changes: 18 additions & 3 deletions Documentation/intro/install/dpdk.rst
Expand Up @@ -449,7 +449,7 @@ affinitized accordingly.

A poll mode driver (pmd) thread handles the I/O of all DPDK interfaces
assigned to it. A pmd thread shall poll the ports for incoming packets,
switch the packets and send to tx port. pmd thread is CPU bound, and needs
switch the packets and send to tx port. A pmd thread is CPU bound, and needs
to be affinitized to isolated cores for optimum performance.

By setting a bit in the mask, a pmd thread is created and pinned to the
Expand All @@ -458,8 +458,23 @@ affinitized accordingly.
$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x4

.. note::
pmd thread on a NUMA node is only created if there is at least one DPDK
interface from that NUMA node added to OVS.
A pmd thread on a NUMA node is only created if there is at least one DPDK
interface from that NUMA node added to OVS. A pmd thread is created by
default on a core of a NUMA node or when a specified pmd-cpu-mask has
indicated so. Even though a PMD thread may exist, the thread only starts
consuming CPU cycles if there is least one receive queue assigned to
the pmd.

.. note::
On NUMA systems PCI devices are also local to a NUMA node. Unbound rx
queues for a PCI device will be assigned to a pmd on it's local NUMA node
if a non-isolated PMD exists on that NUMA node. If not, the queue will be
assigned to a non-isolated pmd on a remote NUMA node. This will result in
reduced maximum throughput on that device and possibly on other devices
assigned to that pmd thread. If such a queue assignment is made a warning
message will be logged: "There's no available (non-isolated) pmd thread on
numa node N. Queue Q on port P will be assigned to the pmd on core C
(numa node N'). Expect reduced performance."

- QEMU vCPU thread Affinity

Expand Down
42 changes: 39 additions & 3 deletions lib/dpif-netdev.c
Expand Up @@ -3218,6 +3218,24 @@ rr_numa_list_lookup(struct rr_numa_list *rr, int numa_id)
return NULL;
}

/* Returns the next node in numa list following 'numa' in round-robin fashion.
* Returns first node if 'numa' is a null pointer or the last node in 'rr'.
* Returns NULL if 'rr' numa list is empty. */
static struct rr_numa *
rr_numa_list_next(struct rr_numa_list *rr, const struct rr_numa *numa)
{
struct hmap_node *node = NULL;

if (numa) {
node = hmap_next(&rr->numas, &numa->node);
}
if (!node) {
node = hmap_first(&rr->numas);
}

return (node) ? CONTAINER_OF(node, struct rr_numa, node) : NULL;
}

static void
rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
{
Expand Down Expand Up @@ -3272,6 +3290,7 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
{
struct dp_netdev_port *port;
struct rr_numa_list rr;
struct rr_numa *non_local_numa = NULL;

rr_numa_list_populate(dp, &rr);

Expand Down Expand Up @@ -3304,11 +3323,28 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
}
} else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
if (!numa) {
VLOG_WARN("There's no available (non isolated) pmd thread "
/* There are no pmds on the queue's local NUMA node.
Round-robin on the NUMA nodes that do have pmds. */
non_local_numa = rr_numa_list_next(&rr, non_local_numa);
if (!non_local_numa) {
VLOG_ERR("There is no available (non-isolated) pmd "
"thread for port \'%s\' queue %d. This queue "
"will not be polled. Is pmd-cpu-mask set to "
"zero? Or are all PMDs isolated to other "
"queues?", netdev_get_name(port->netdev),
qid);
continue;
}
q->pmd = rr_numa_get_pmd(non_local_numa);
VLOG_WARN("There's no available (non-isolated) pmd thread "
"on numa node %d. Queue %d on port \'%s\' will "
"not be polled.",
numa_id, qid, netdev_get_name(port->netdev));
"be assigned to the pmd on core %d "
"(numa node %d). Expect reduced performance.",
numa_id, qid, netdev_get_name(port->netdev),
q->pmd->core_id, q->pmd->numa_id);
} else {
/* Assign queue to the next (round-robin) PMD on it's local
NUMA node. */
q->pmd = rr_numa_get_pmd(numa);
}
}
Expand Down

0 comments on commit c37813f

Please sign in to comment.