New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resource: support configuration of resources in TOML config #4566
Conversation
42b4750
to
02f7cd3
Compare
Since @ryanday36 has configured a few systems with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be noted that we still have a way to put JGF in the configured R for advanced scheduling use cases because the path
config key still allows an external R object to be configured.
I really like this. I quickly converted my test cluster to use it and everything worked as expected on the first try (with properties for queues).
Big usability improvement IMHO.
@@ -197,7 +198,17 @@ int rnode_add (struct rnode *orig, struct rnode *n) | |||
return -1; | |||
c = zhashx_next (n->children); | |||
} | |||
return 0; | |||
if (n->properties) { | |||
zlistx_t *l = zhashx_keys (n->properties); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check zhashx_keys()
for NULL return here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I addressed this and force pushed the result.
Problem: The rnode_add() function, which handles aggregating two rnode objects together into a single result, doesn't handle properties. Therefore, properties could be lost when adding to a struct rnode with this function. Add any properties in the src rnode to the dest rnode in rnode_add().
Problem: The rnode_new() and rnode_add_child_idset() functions will be useful in rlist.c, but these functions are not exported via rnode.h. Export both these functions in the header file.
Problem: There is no user friendly configuration format that can be used to generate Rv1 for resource configuration. Add rlist_from_config() to librlist, which can read a simple format TOML array from `resource.config` where each entry configures an idset of cores, gpus, and/or properties for a set of hosts. E.g.: [[resource.config]] hosts = "foo[0-10]" cores = "0-3" gpus = "0" [[resource.config]] hosts = "foo0" properties = [ "r0" ] [[resource.config]] hosts = "foo10" properties = [ "r10" ] Hostnames can appear multiple times and each time the host is updated, not overridden, which allows a natural configuration from generic to specific. To avoid confusion, the configuration does not incorporate ranks, instead ranks are meant to be assigned by the resource module when it reranks the config.
Problem: There are no unit tests for rlist_from_config(). Save the more complex tests for a sharness test, but test some of the simpler invalid configuration scenarios in the rlist unit tests.
Problem: The flux-R command has support for generating Rv1 from arguments and other sources for testing etc., but does not include a way to generate R from configuration as provided by the librlist function rlist_from_config(). Add a from-config subcommand to test reading R from resource.config array. This can be used to validate configuration or drive testing of the underlying API.
Problem: The resource module leaks a struct rlist and an R object when loading R from configuration. Free the intermediate rlist in convert_R_conf(), and the "R_from_config" after it is added to the inventory.
Problem: Generating a system R for resource configuration of a Flux system instance is not user friendly, and can be confusing since the ranks assigned in R are overridden by the ranks from the bootstrap hostlist anyway. Allow an alternate method for resource configuration via the [[resource.config]] arrray using the rlist_from_config() function from librlist.
Problem: t0026-flux-R.t uses 4 spaces to indent sharness tests, but using TABs for these tests is de-facto standard and allows better use of indented here-docs (`cat <<-EOF`). Switch test_expect_success() indents to TAB instead of 4 spaces in this test. Additionally add support for emitting a logfile when FLUX_TESTS_LOGFILE is set in the environment.
Problem: No tests of `flux R parse-config` exist in the testsuite. Add a set of tests that exercise flux-R parse-config while also testing various cases of the underlying rlist_from_config() function.
Problem: None of the resource module tests use the resource.config array for testing. Add a couple tests that ensure the Flux resource module can be configured via the resource.config array instead of a path to an R.
Problem: The resource.config array is not documented. Add documentation of resource.config in flux-config-resource(5).
02f7cd3
to
d72dd8f
Compare
Codecov Report
@@ Coverage Diff @@
## master #4566 +/- ##
==========================================
- Coverage 83.38% 83.38% -0.01%
==========================================
Files 403 403
Lines 68052 68205 +153
==========================================
+ Hits 56747 56870 +123
- Misses 11305 11335 +30
|
I like it! Much more human readable and looks like it would play nicely with templates in ansible. Do hosts need to be mapped to ranks somewhere, or are they implied by the order of the hosts? |
The hosts are mapped to ranks by the |
Ok given @ryanday36's approval I'll set MWP. Thanks all! |
After lamenting that configuration of resources via a hand edited R or via use of
flux R encode
was confusing and error prone, @garlick and I had the idea that a simple configuration of resources in the[resource]
TOML table might be a huge improvement.This PR introduces support for a
resource.config
array which can be used to configure resources instead ofresource.path
pointing to the system R. The format of each entry inresource.config
is a requiredhosts
key pointing to a hostlist of node names to which the entry applies, then optionalcores
,gpus
(in idset form), andproperties
(an array of properties to assign tohosts
). It is not an error to specify a hostname more than once - in that case the host is updated not overwritten. E.g.:The main benefit of this format is that the admin does not have to worry about ranks, which would be reassigned by the
resource
module based on the[bootstrap]
hostlist anyway, and can focus on assigning resources and properties to hosts by name.