Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource: support configuration of resources in TOML config #4566

Merged
merged 11 commits into from Sep 13, 2022

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Sep 13, 2022

After lamenting that configuration of resources via a hand edited R or via use of flux R encode was confusing and error prone, @garlick and I had the idea that a simple configuration of resources in the [resource] TOML table might be a huge improvement.

This PR introduces support for a resource.config array which can be used to configure resources instead of resource.path pointing to the system R. The format of each entry in resource.config is a required hosts key pointing to a hostlist of node names to which the entry applies, then optional cores, gpus (in idset form), and properties (an array of properties to assign to hosts). It is not an error to specify a hostname more than once - in that case the host is updated not overwritten. E.g.:

[[resource.config]]
hosts = "corona[171-207,211-296]"
cores = "0-47"
gpus =  "0-7" 

[[resource.config]]
hosts = "corona[171-207,211-280]"
properties = ["batch"]

[[resource.config]]
hosts = "corona[281-296]"
properties = ["debug"]

The main benefit of this format is that the admin does not have to worry about ranks, which would be reassigned by the resource module based on the [bootstrap] hostlist anyway, and can focus on assigning resources and properties to hosts by name.

@grondo
Copy link
Contributor Author

grondo commented Sep 13, 2022

Since @ryanday36 has configured a few systems with the flux R encode method, it will be good to get his feedback on this new method to see if anything is missing or could be improved.

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that we still have a way to put JGF in the configured R for advanced scheduling use cases because the path config key still allows an external R object to be configured.

I really like this. I quickly converted my test cluster to use it and everything worked as expected on the first try (with properties for queues).

Big usability improvement IMHO.

@@ -197,7 +198,17 @@ int rnode_add (struct rnode *orig, struct rnode *n)
return -1;
c = zhashx_next (n->children);
}
return 0;
if (n->properties) {
zlistx_t *l = zhashx_keys (n->properties);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check zhashx_keys() for NULL return here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I addressed this and force pushed the result.

Problem: The rnode_add() function, which handles aggregating
two rnode objects together into a single result, doesn't handle
properties. Therefore, properties could be lost when adding to a
struct rnode with this function.

Add any properties in the src rnode to the dest rnode in rnode_add().
Problem: The rnode_new() and rnode_add_child_idset() functions will
be useful in rlist.c, but these functions are not exported via rnode.h.

Export both these functions in the header file.
Problem: There is no user friendly configuration format that can be
used to generate Rv1 for resource configuration.

Add rlist_from_config() to librlist, which can read a simple format
TOML array from `resource.config` where each entry configures an idset
of cores, gpus, and/or properties for a set of hosts. E.g.:

 [[resource.config]]
 hosts = "foo[0-10]"
 cores = "0-3"
 gpus =  "0"

 [[resource.config]]
 hosts = "foo0"
 properties = [ "r0" ]

 [[resource.config]]
 hosts = "foo10"
 properties = [ "r10" ]

Hostnames can appear multiple times and each time the host is updated,
not overridden, which allows a natural configuration from generic to
specific. To avoid confusion, the configuration does not incorporate
ranks, instead ranks are meant to be assigned by the resource module
when it reranks the config.
Problem: There are no unit tests for rlist_from_config().

Save the more complex tests for a sharness test, but test some of
the simpler invalid configuration scenarios in the rlist unit tests.
Problem: The flux-R command has support for generating Rv1 from
arguments and other sources for testing etc., but does not include
a way to generate R from configuration as provided by the librlist
function rlist_from_config().

Add a from-config subcommand to test reading R from resource.config
array. This can be used to validate configuration or drive testing
of the underlying API.
Problem: The resource module leaks a struct rlist and an R object
when loading R from configuration.

Free the intermediate rlist in convert_R_conf(), and the "R_from_config"
after it is added to the inventory.
Problem: Generating a system R for resource configuration of a Flux
system instance is not user friendly, and can be confusing since the
ranks assigned in R are overridden by the ranks from the bootstrap
hostlist anyway.

Allow an alternate method for resource configuration via the
[[resource.config]] arrray using the rlist_from_config() function
from librlist.
Problem: t0026-flux-R.t uses 4 spaces to indent sharness tests, but
using TABs for these tests is de-facto standard and allows better
use of indented here-docs (`cat <<-EOF`).

Switch test_expect_success() indents to TAB instead of 4 spaces in
this test.

Additionally add support for emitting a logfile when FLUX_TESTS_LOGFILE
is set in the environment.
Problem: No tests of `flux R parse-config` exist in the testsuite.

Add a set of tests that exercise flux-R parse-config while also
testing various cases of the underlying rlist_from_config() function.
Problem: None of the resource module tests use the resource.config
array for testing.

Add a couple tests that ensure the Flux resource module can be configured
via the resource.config array instead of a path to an R.
Problem: The resource.config array is not documented.

Add documentation of resource.config in flux-config-resource(5).
@codecov
Copy link

codecov bot commented Sep 13, 2022

Codecov Report

Merging #4566 (d72dd8f) into master (24d65ac) will decrease coverage by 0.00%.
The diff coverage is 80.89%.

@@            Coverage Diff             @@
##           master    #4566      +/-   ##
==========================================
- Coverage   83.38%   83.38%   -0.01%     
==========================================
  Files         403      403              
  Lines       68052    68205     +153     
==========================================
+ Hits        56747    56870     +123     
- Misses      11305    11335      +30     
Impacted Files Coverage Δ
src/common/librlist/rlist.c 84.29% <77.96%> (-0.59%) ⬇️
src/modules/resource/resource.c 83.17% <83.33%> (+0.25%) ⬆️
src/common/librlist/rnode.c 84.68% <90.90%> (+0.11%) ⬆️
src/cmd/flux-R.c 82.57% <93.33%> (+0.39%) ⬆️
src/modules/resource/inventory.c 83.09% <100.00%> (+0.04%) ⬆️
src/modules/job-exec/bulk-exec.c 79.35% <0.00%> (-1.48%) ⬇️
src/common/libsdprocess/sdprocess.c 69.00% <0.00%> (-0.13%) ⬇️
src/modules/job-info/guest_watch.c 77.02% <0.00%> (+0.27%) ⬆️
src/modules/cron/cron.c 82.92% <0.00%> (+0.44%) ⬆️
src/common/libsubprocess/local.c 84.39% <0.00%> (+0.48%) ⬆️
... and 1 more

@ryanday36
Copy link

I like it! Much more human readable and looks like it would play nicely with templates in ansible. Do hosts need to be mapped to ranks somewhere, or are they implied by the order of the hosts?

@garlick
Copy link
Member

garlick commented Sep 13, 2022

Do hosts need to be mapped to ranks somewhere, or are they implied by the order of the hosts?

The hosts are mapped to ranks by the bootstrap.hosts array. That was true before - the ones in R were overridden by bootstrap.hosts if the order was different.

@grondo
Copy link
Contributor Author

grondo commented Sep 13, 2022

Ok given @ryanday36's approval I'll set MWP. Thanks all!

@mergify mergify bot merged commit 9f628ee into flux-framework:master Sep 13, 2022
@grondo grondo deleted the resource.config branch September 13, 2022 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants