From bc91eddfa266e255612df0873ccb170e2c917291 Mon Sep 17 00:00:00 2001
From: Kevin Traynor <ktraynor@redhat.com>
Date: Mon, 15 Mar 2021 15:43:59 +0000
Subject: [PATCH] dpif-netdev: Allow PMD auto load balance with cross-numa.

Previously auto load balance did not trigger a reassignment when
there was any cross-numa polling as an rxq could be polled from a
different numa after reassign and it could impact estimates.

In the case where there is only one numa with pmds available, the
same numa will always poll before and after reassignment, so estimates
are valid. Allow PMD auto load balance to trigger a reassignment in
this case.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
---
 Documentation/topics/dpdk/pmd.rst |  9 ++++++---
 lib/dpif-netdev.c                 | 16 +++++++++++++---
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst
index caa7d97befb..1f61bddb6ec 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -237,9 +237,12 @@ If not set, the default variance improvement threshold is 25%.
 
 .. note::
 
-    PMD Auto Load Balancing doesn't currently work if queues are assigned
-    cross NUMA as actual processing load could get worse after assignment
-    as compared to what dry run predicts.
+    PMD Auto Load Balancing doesn't request a reassignment if queues are
+    assigned cross NUMA and there are multiple NUMA nodes available for
+    reassignment. This is because reassignment to a different NUMA node could
+    lead to an unpredictable change in processing cycles required for a queue.
+    However, if there is only one cross NUMA node available then a dry run and
+    possible request to reassign may continue as normal.
 
 The minimum time between 2 consecutive PMD auto load balancing iterations can
 also be configured by::
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 816945375bc..29e74ee4341 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4887,6 +4887,12 @@ struct rr_numa {
     bool idx_inc;
 };
 
+static size_t
+rr_numa_list_count(struct rr_numa_list *rr)
+{
+    return hmap_count(&rr->numas);
+}
+
 static struct rr_numa *
 rr_numa_list_lookup(struct rr_numa_list *rr, int numa_id)
 {
@@ -5599,10 +5605,14 @@ get_dry_run_variance(struct dp_netdev *dp, uint32_t *core_list,
     for (int i = 0; i < n_rxqs; i++) {
         int numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
         numa = rr_numa_list_lookup(&rr, numa_id);
+        /* If there is no available pmd on the local numa but there is only one
+         * numa for cross-numa polling, we can estimate the dry run. */
+        if (!numa && rr_numa_list_count(&rr) == 1) {
+            numa = rr_numa_list_next(&rr, NULL);
+        }
         if (!numa) {
-            /* Abort if cross NUMA polling. */
-            VLOG_DBG("PMD auto lb dry run."
-                     " Aborting due to cross-numa polling.");
+            VLOG_DBG("PMD auto lb dry run. Aborting due to "
+                     "multiple numa nodes available for cross-numa polling.");
             goto cleanup;
         }