New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broker: drop broker.rundir and select ipc vs tcp using broker.mapping #3554
Conversation
ba5d922
to
1cd3424
Compare
Problem: overlay unit test calls ok() with missing printf arguments. Supply the missing arguments.
Problem: timer watchers are leaked in overlay unit test function recvmsg_timeout(). Destroy the watcher on exit from the function Thanks to grondo for pointing this out.
Problem: the creation of 'broker.rundir', a directory that is guaranteed unique for each broker, as opposed to 'rundir', which is not always unique, is no longer necessary. Change the local URI from local://<rundir>/<rank>/local to local://<rundir>/local-<rank>. Drop the init callback from overlay.c that was needed to create the broker.rundir between discovering the rank and creating an ipc:// tbon endoint in it when bootstrapping with PMI. Update flux-broker-attributes(7). Update overlay unit test.
Problem: create_rundir() has a local variable that is set to NULL then passed to free(). Drop unused variable.
Problem: if create_rundir() fails, in some cases a created directory may not be cleaned up. Change logic slightly so that cleanup registration occurs in the error path, not just the success path.
Problem: create_rundir() uses asprintf() to build path when a static buffer would be less complicated. Switch to static path buffer.
Problem: overlay_init() has a different, lesser purpose now that it no longer triggers the init callback, so its is named a bit too generically. Rename to overlay_set_geometry(). Update broker and unit test.
Problem: rundir generated by flux-start includes flux-start pid, while rundir generated by the broker does not. For consistency, use the same procedure to generate the rundir in flux-start as is used in the broker. Directories will be $TMPDIR/flux-XXXXXX, where XXXXXX is substituted with a unique string by mkdtemp(3).
Problem: if local-uri is malformed, module exits without setting errno. Set errno to EINVAL in this case.
Problem: a malformed local-uri is not caught early, allowing the instance to enter rc1 and time out. The connector-local module initialization happens in parallel with other broker startup, so is not easily caught before rc1 executes. Improve error handling by checking for obvious problems with the local-uri value during early broker startup.
Problem: errors in broker command line specification of rundir and local-uri are not covered well in test. Add some tests to t0001-basic.t.
1cd3424
to
c4ae955
Compare
Rebased on current master and added some more testing. |
Codecov Report
@@ Coverage Diff @@
## master #3554 +/- ##
==========================================
+ Coverage 82.38% 82.42% +0.04%
==========================================
Files 322 322
Lines 48829 48814 -15
==========================================
+ Hits 40226 40237 +11
+ Misses 8603 8577 -26
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One minor comment/question inline.
if (attr_add (attrs, "local-uri", uri, FLUX_ATTRFLAG_IMMUTABLE) < 0) { | ||
log_err ("create_broker_rundir: attr_add (local-uri)"); | ||
goto cleanup; | ||
if (snprintf (path, sizeof (path), "%s", uri + 8) >= sizeof (path)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're writing to uri + 8
, should the bounds check be for sizeof (path) - 8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading from uri + 8
so we check for overflow when writing to path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh duh! Oops, sorry for the think-o.
I'll go ahead and set MWP. Thanks! |
This is based on top of #3553.
With
PMI_process_mapping
available in the broker, use it rather than an explicit broker command line option fromflux start
to determine whenipc://
can be used in place oftcp://
for the overlay network.In addition, perform the following cleanup that falls out from these changes:
tbon.endpoint
to be set on the command line. We don't need this anymore now that we can bootstrap from config files, andflux-start
doesn't need to set it tell the broker to useipc://
broker.rundir
, the ranked subdirectory ofrundir
. Instead append the rank toipc://
paths when used, and to the local URI, in caserundir
happens to be shared by multiple brokers (e.g. standaloneflux start
).overlay_init()
. This used to createbroker.rundir
from within the PMI bootstrap code, after the rank was known, so that theipc://
paths could use it later on in the PMI bootstrap code.