-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-exec: support module stats to see current bulk-exec configuration #5943
Conversation
while ((impl = implementations[i]) && impl->name) { | ||
json_t *stats = NULL; | ||
if (impl->stats) { | ||
if ((*impl->stats) (&stats) == 0 && stats) { | ||
if (json_object_set_new (o, impl->name, stats) < 0) { | ||
errno = ENOMEM; | ||
goto error; | ||
} | ||
} | ||
} | ||
i++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought only one implementation could be active at runtime? (Am I confused?)
When I suggested a callback I just meant to integrate any general stats provided in the mainline with those produced by the active plugin.
Also (minor): new code should probably break long parameter lists at one per line, except in the case of "pack" functions and then it should keep keys and values together on a line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought only one implementation could be active at runtime? (Am I confused?)
Implementations are selected per job, e.g. if attributes.system.exec.test.run_duration
is set then the test exec implementation is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought only one implementation could be active at runtime? (Am I confused?)
static int jobinfo_load_implementation (struct jobinfo *job)
{
int i = 0;
int rc = -1;
struct exec_implementation *impl;
while ((impl = implementations[i]) && impl->name) {
/*
* Immediately fail if any implementation init method returns < 0.
* If rc > 0, then select this implementation and skip others,
* O/w, continue with the list.
*/
if ((rc = (*impl->init) (job)) < 0)
return -1;
else if (rc > 0) {
job->impl = impl;
return 0;
}
i++;
}
return -1;
}
suggests to me that an implementation is chosen when the job starts. So it could be a different implementation depending on settings, such as when we use the "test exec" implementation. That's the user selecting to use the test exec. So we might probably want to output configs for all exec implementations? (granted, the testexec has no "stats" at the moment).
When I suggested a callback I just meant to integrate any general stats provided in the mainline with those produced by the active plugin.
By "mainline" you mean job-exec
module? job-exec
didn't produce any stats yet, so the exec implementation stats are all there is at the moment.
Also (minor): new code should probably break long parameter lists at one per line, except in the case of "pack" functions and then it should keep keys and values together on a line
Ahhh I think you're referring to the flux_respond_pack()
, yeah, that was a poor cut & paste & modify from somewhere else. Will tweak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementations are selected per job,
sorry my bad!
By "mainline" you mean job-exec module?
Yes but my comment was misplaced due to an incorrect recollection of how stuff works here.
Ahhh I think you're referring to the flux_respond_pack(),
Also the message handler function prototype.
fb2d7d9
to
49998f5
Compare
Problem: Modern practive is to break long parameter lists into one parameter per line. Adjust a parameter line break in job-exec.
Problem: Modern practive is to break long parameter lists into one parameter per line. Adjust a parameter line break in job-list.
49998f5
to
3ec42da
Compare
re-pushed, fixing up those parameter lists and the parameter lists I found / cut & pasted from. Also fixed up a test that still relied on dmesg output from |
3ec42da
to
6069f67
Compare
Problem: In the near future we'd like the job-exec module to return some module information via a stats callback, but no callback exists. In addition, we'd like stats to be possible for each exec implementation. Add the initial infrastructure for a stats message handler and exec implementation callback. For the time being, no stats are actually reported.
6069f67
to
ff257ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me. Just a couple inline comments. I also wonder if the leading impl.
on the stats object returned from flux module stats
is really required, but leave that one up to you since it doesn't really matter too much.
src/modules/job-exec/exec_config.c
Outdated
save_errno = errno; | ||
json_decref (o); | ||
errno = save_errno; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could avoid the need for save_errno
by using ERRNO_SAFE_WRAP (json_decref, o);
here.
src/modules/job-exec/exec.c
Outdated
save_errno = errno; | ||
json_decref (o); | ||
json_decref (conf); | ||
errno = save_errno; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ERRNO_SAFE_WRAP
?
src/modules/job-exec/exec_config.c
Outdated
if (exec_service) { | ||
if (config_add_stats_string (o, | ||
"exec_service", | ||
exec_service) < 0) | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 4 duplicated blocks of code could perhaps be cleaned up if config_add_stats_string()
was enhanced to do nothing if value == NULL
. Then this all becomes:
if (config_add_stats_string (o, "default_cwd", default_cwd) < 0
|| config_add_stats_string (o, "default_job_shell", default_job_shell) < 0
|| config_add_stats_string (o, "flux_imp_path", flux_imp_path) < 0
|| config_add_stats_string (o, "exec_service", exec_service) < 0)
goto error;
ff257ff
to
faeffea
Compare
@grondo thanks, did those small tweaks, will set MWP |
Problem: No stats are generate for teh "bulk-exec" exec implementation. Add stats for the "bulk-exec" exec implementation. For the time being, the only "stats" are the config values that configured into the implementation.
Problem: The tests in t2403-job-exec-conf.t could be cleaned up by using the more modern --config-path broker option. Cleanup tests in t2403-job-exec-conf.t to use the --config-path broker option instead of starting a flux instance in a subshell.
faeffea
to
40dfb99
Compare
Problem: A number of tests in t2403-job-exec-conf.t and t2404-job-exec-multiuser.t grep for flux dmesg logs to determine if configuration has been loaded correctly. Update tests to use the configuration output via the job-exec module's new module stats.
Problem: Now that the configuration of exec implementations is available via module stats, some debug logs used for testing are not longer necessary. Remove now unnecessary debug logs.
40dfb99
to
ad52bd1
Compare
Problem: Coverage does not exist for a number of configurations in job-exec. Add the missing coverage to t2403-job-exec-conf.t.
ad52bd1
to
bbdecab
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5943 +/- ##
==========================================
+ Coverage 83.31% 83.33% +0.01%
==========================================
Files 515 515
Lines 83271 83344 +73
==========================================
+ Hits 69381 69457 +76
+ Misses 13890 13887 -3
|
Per comments in #5913, did a bunch of cleanup/re-work in prep of job-exec config reload (#5900).