Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-exec: support module stats to see current bulk-exec configuration #5943

Merged
merged 8 commits into from
May 6, 2024

Conversation

chu11
Copy link
Member

@chu11 chu11 commented May 5, 2024

Per comments in #5913, did a bunch of cleanup/re-work in prep of job-exec config reload (#5900).

  • job-exec module stats that export current config values (did it in a "impl.bulk-exec.config." object).
  • modernize tests in t2403-job-exec-conf.t
  • update to use module stats to determine conf
  • add missing coverage of other job-exec confs

Comment on lines +1454 to +1468
while ((impl = implementations[i]) && impl->name) {
json_t *stats = NULL;
if (impl->stats) {
if ((*impl->stats) (&stats) == 0 && stats) {
if (json_object_set_new (o, impl->name, stats) < 0) {
errno = ENOMEM;
goto error;
}
}
}
i++;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought only one implementation could be active at runtime? (Am I confused?)

When I suggested a callback I just meant to integrate any general stats provided in the mainline with those produced by the active plugin.

Also (minor): new code should probably break long parameter lists at one per line, except in the case of "pack" functions and then it should keep keys and values together on a line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought only one implementation could be active at runtime? (Am I confused?)

Implementations are selected per job, e.g. if attributes.system.exec.test.run_duration is set then the test exec implementation is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought only one implementation could be active at runtime? (Am I confused?)

static int jobinfo_load_implementation (struct jobinfo *job)                                                                                
{                                                                                                                                           
    int i = 0;                                                                                                                              
    int rc = -1;                                                                                                                            
    struct exec_implementation *impl;                                                                                                       
                                                                                                                                            
    while ((impl = implementations[i]) && impl->name) {                                                                                     
        /*                                                                                                                                  
         *  Immediately fail if any implementation init method returns < 0.                                                                 
         *  If rc > 0, then select this implementation and skip others,                                                                     
         *  O/w, continue with the list.                                                                                                    
         */                                                                                                                                 
        if ((rc = (*impl->init) (job)) < 0)                                                                                                 
            return -1;                                                                                                                      
        else if (rc > 0) {                                                                                                                  
            job->impl = impl;                                                                                                               
            return 0;                                                                                                                       
        }                                                                                                                                   
        i++;                                                                                                                                
    }                                                                                                                                       
    return -1;                                                                                                                              
}                                                                                                                                           

suggests to me that an implementation is chosen when the job starts. So it could be a different implementation depending on settings, such as when we use the "test exec" implementation. That's the user selecting to use the test exec. So we might probably want to output configs for all exec implementations? (granted, the testexec has no "stats" at the moment).

When I suggested a callback I just meant to integrate any general stats provided in the mainline with those produced by the active plugin.

By "mainline" you mean job-exec module? job-exec didn't produce any stats yet, so the exec implementation stats are all there is at the moment.

Also (minor): new code should probably break long parameter lists at one per line, except in the case of "pack" functions and then it should keep keys and values together on a line

Ahhh I think you're referring to the flux_respond_pack(), yeah, that was a poor cut & paste & modify from somewhere else. Will tweak.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementations are selected per job,

sorry my bad!

By "mainline" you mean job-exec module?

Yes but my comment was misplaced due to an incorrect recollection of how stuff works here.

Ahhh I think you're referring to the flux_respond_pack(),

Also the message handler function prototype.

@chu11 chu11 force-pushed the issue5900_job_exec_stats branch 3 times, most recently from fb2d7d9 to 49998f5 Compare May 5, 2024 16:57
chu11 added 2 commits May 5, 2024 09:58
Problem: Modern practive is to break long parameter lists into one
parameter per line.

Adjust a parameter line break in job-exec.
Problem: Modern practive is to break long parameter lists into one
parameter per line.

Adjust a parameter line break in job-list.
@chu11 chu11 force-pushed the issue5900_job_exec_stats branch from 49998f5 to 3ec42da Compare May 5, 2024 16:59
@chu11
Copy link
Member Author

chu11 commented May 5, 2024

re-pushed, fixing up those parameter lists and the parameter lists I found / cut & pasted from. Also fixed up a test that still relied on dmesg output from job-exec. (Tests seem to pass now, but forgot a && chain ... Will fix when I get to a computer later today)

@chu11 chu11 force-pushed the issue5900_job_exec_stats branch from 3ec42da to 6069f67 Compare May 6, 2024 04:30
Problem: In the near future we'd like the job-exec module to return
some module information via a stats callback, but no callback exists.
In addition, we'd like stats to be possible for each exec implementation.

Add the initial infrastructure for a stats message handler and exec
implementation callback.  For the time being, no stats are actually
reported.
@chu11 chu11 force-pushed the issue5900_job_exec_stats branch from 6069f67 to ff257ff Compare May 6, 2024 04:41
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me. Just a couple inline comments. I also wonder if the leading impl. on the stats object returned from flux module stats is really required, but leave that one up to you since it doesn't really matter too much.

Comment on lines 185 to 187
save_errno = errno;
json_decref (o);
errno = save_errno;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could avoid the need for save_errno by using ERRNO_SAFE_WRAP (json_decref, o); here.

Comment on lines 642 to 645
save_errno = errno;
json_decref (o);
json_decref (conf);
errno = save_errno;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ERRNO_SAFE_WRAP?

Comment on lines 164 to 168
if (exec_service) {
if (config_add_stats_string (o,
"exec_service",
exec_service) < 0)
goto error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 4 duplicated blocks of code could perhaps be cleaned up if config_add_stats_string() was enhanced to do nothing if value == NULL. Then this all becomes:

if (config_add_stats_string (o, "default_cwd", default_cwd) < 0
    || config_add_stats_string (o, "default_job_shell", default_job_shell) < 0
    || config_add_stats_string (o, "flux_imp_path", flux_imp_path) < 0
    || config_add_stats_string (o, "exec_service", exec_service) < 0)
    goto error;

@chu11 chu11 force-pushed the issue5900_job_exec_stats branch from ff257ff to faeffea Compare May 6, 2024 17:20
@chu11
Copy link
Member Author

chu11 commented May 6, 2024

@grondo thanks, did those small tweaks, will set MWP

chu11 added 2 commits May 6, 2024 11:06
Problem: No stats are generate for teh "bulk-exec" exec implementation.

Add stats for the "bulk-exec" exec implementation.  For the time being,
the only "stats" are the config values that configured into the
implementation.
Problem: The tests in t2403-job-exec-conf.t could be cleaned up
by using the more modern --config-path broker option.

Cleanup tests in t2403-job-exec-conf.t to use the --config-path
broker option instead of starting a flux instance in a subshell.
@chu11 chu11 force-pushed the issue5900_job_exec_stats branch from faeffea to 40dfb99 Compare May 6, 2024 18:07
chu11 added 2 commits May 6, 2024 11:29
Problem: A number of tests in t2403-job-exec-conf.t
and t2404-job-exec-multiuser.t grep for flux dmesg logs to
determine if configuration has been loaded correctly.

Update tests to use the configuration output via the job-exec
module's new module stats.
Problem: Now that the configuration of exec implementations is available
via module stats, some debug logs used for testing are not longer
necessary.

Remove now unnecessary debug logs.
@chu11 chu11 force-pushed the issue5900_job_exec_stats branch from 40dfb99 to ad52bd1 Compare May 6, 2024 18:29
Problem: Coverage does not exist for a number of configurations
in job-exec.

Add the missing coverage to t2403-job-exec-conf.t.
Copy link

codecov bot commented May 6, 2024

Codecov Report

Attention: Patch coverage is 55.12821% with 35 lines in your changes are missing coverage. Please review.

Project coverage is 83.33%. Comparing base (c6b6e72) to head (bbdecab).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5943      +/-   ##
==========================================
+ Coverage   83.31%   83.33%   +0.01%     
==========================================
  Files         515      515              
  Lines       83271    83344      +73     
==========================================
+ Hits        69381    69457      +76     
+ Misses      13890    13887       -3     
Files Coverage Δ
src/modules/job-exec/testexec.c 83.92% <ø> (ø)
src/modules/job-list/job-list.c 80.73% <100.00%> (ø)
src/modules/job-exec/exec.c 80.52% <50.00%> (-1.95%) ⬇️
src/modules/job-exec/job-exec.c 76.41% <59.09%> (-0.59%) ⬇️
src/modules/job-exec/exec_config.c 67.30% <53.84%> (-7.70%) ⬇️

... and 12 files with indirect coverage changes

@mergify mergify bot merged commit cc277a2 into flux-framework:master May 6, 2024
34 of 35 checks passed
@chu11 chu11 deleted the issue5900_job_exec_stats branch May 6, 2024 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants