Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource-hwloc: do not walk topology by default? #716

Closed
grondo opened this Issue Jul 6, 2016 · 4 comments

Comments

Projects
None yet
4 participants
@grondo
Copy link
Contributor

grondo commented Jul 6, 2016

As shown in flux-framework/distribution#13, resource-hwloc startup has a major impact on large scale, full system runs. As a test I've disabled walk_topology in resource-hwloc (enabled by simplistic module option for now), and this shows a significant improvement in startup time. For now this might be a viable workaround if nothing (e.g. sched) needs the broken down topology inserted with separate kvs keys:

diff --git a/etc/rc1 b/etc/rc1
index ae65f5a..45010d1 100755
--- a/etc/rc1
+++ b/etc/rc1
@@ -8,7 +8,7 @@ flux module load -r 0 kvs
 flux module load -r all -x 0 kvs

 #flux module load -r all live --barrier-count=$(flux getattr size)
-flux module load -r all resource-hwloc & pids="$pids $!"
+flux module load -r all resource-hwloc ${FLUX_HWLOC_OPTS} & pids="$pids $!"
 flux module load -r all job
 flux module load -r all wrexec
 flux module load -r 0 cron sync=hb
diff --git a/src/modules/resource-hwloc/resource.c b/src/modules/resource-hwloc/
index 9f9f80b..9546194 100644
--- a/src/modules/resource-hwloc/resource.c
+++ b/src/modules/resource-hwloc/resource.c
@@ -44,6 +44,7 @@ typedef struct
     uint32_t rank;
     hwloc_topology_t topology;
     bool loaded;
+    bool walk_topology;
 } ctx_t;

 static int ctx_hwloc_init (flux_t h, ctx_t *ctx)
@@ -287,7 +288,7 @@ static int load_info_to_kvs (flux_t h, ctx_t *ctx)
         kvs_put_int (h, obj_path, nobj);
         free (obj_path);
     }
-    if (walk_topology (h,
+    if (ctx->walk_topology && walk_topology (h,
                        ctx->topology,
                        hwloc_get_root_obj (ctx->topology),
                        base_path) < 0) {
@@ -302,7 +303,7 @@ static int load_info_to_kvs (flux_t h, ctx_t *ctx)
         char *host_path = xasprintf ("resource.hwloc.by_host.%s", kvs_hostname)
         free (kvs_hostname);
         unlink_if_exists (h, host_path);
-        if (walk_topology (h,
+        if (ctx->walk_topology && walk_topology (h,
                            ctx->topology,
                            hwloc_get_root_obj (ctx->topology),
                            host_path) < 0) {
@@ -478,6 +479,9 @@ int mod_main (flux_t h, int argc, char **argv)
     if (!(ctx = getctx (h)))
         goto done;

+    if (argc > 0 && strcmp (argv[0], "walk-topology") == 0)
+        ctx->walk_topology = true;
+
     // Load hardware information immediately
     if (load_hwloc (h, ctx) < 0)
         goto done;

Quick results:

 grondo@hype264:~/git/f$ FLUX_HWLOC_OPTS=walk-topology time -f %E srun -n1024 src/cmd/flux start /bin/true
0:25.70
 grondo@hype264:~/git/f$ FLUX_HWLOC_OPTS= time -f %E srun -n1024 src/cmd/flux start /bin/true
0:11.05

Longer term, kvs improvements might help dramatically with this problem (e.g. submit all puts as a single request, or even async kvs put might resolve the issue). The real problem here seems to be the sheer number of synchronous kvs puts used in walk_topology.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 6, 2016

That's a dramatic enough speedup to warrant a PR. Any objections from sched guys? (I think they are only parsing the XML data not walking the KVS)

@lipari

This comment has been minimized.

Copy link
Contributor

lipari commented Jul 6, 2016

Indeed, we parse the xml data. No objections.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 6, 2016

Yap. Only xml data.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 6, 2016

Ok, let me make the option parsing to enable topology walk a bit more robust and submit as a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.