Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea: replace excluded resources with exclude property in RFC 28 #383

Closed
garlick opened this issue Jul 15, 2023 · 8 comments
Closed

idea: replace excluded resources with exclude property in RFC 28 #383

garlick opened this issue Jul 15, 2023 · 8 comments

Comments

@garlick
Copy link
Member

garlick commented Jul 15, 2023

Problem: RFC 28 specifies that the initial resource acquisition response contains the full set of resources less any excluded resources, but this means when the exclude set is increased dynamically, all we can do is mark the new targets down, which doesn't exclude them from feasibility, and when the exclude set is reduced dynamically, nothing can be done. The new exclude set is correct after the next restart of course.

Now that we have resource properties with support for dynamically adding and removing them in the acquisition protocol, we could amend RFC 28 to define a special exclude property and deprecate the special exclusion configuration syntax.

The scheduler would just need to have a built-in constraint that prevents the excluded resources from being allocated, e.g.

{ "not": [{ "properties": [ "exclude" ]}] }
@garlick garlick changed the title idea: replace exlucded resources with exclude property in RFC 28 idea: replace excluded resources with exclude property in RFC 28 Jul 15, 2023
@garlick
Copy link
Member Author

garlick commented Jul 16, 2023

One additional thought: if the frobnicator constraints plugin could handle adding this constraint to all jobs, then no scheduler changes would be needed.

@vsoch
Copy link
Member

vsoch commented Jul 16, 2023

I might have asked this before, but what's a frobnicator?

@garlick
Copy link
Member Author

garlick commented Jul 16, 2023

It frobs jobspec at job submission time.

https://en.wiktionary.org/wiki/frob

@grondo
Copy link
Contributor

grondo commented Jul 16, 2023

Probably obvious, but if this approach is taken then consumers of the resource.status RPC would need to be updated.

Another slight change would be that excluded resources are currently not contained in the resource set held by the scheduler, and thus are not returned in the response to the sched.resource-status RPC. flux resource list would have to be updated to apply the exclusions itself, or the scheduler would have to handle the exclude property specifically (which I guess would negate the benefit of the constraints solution)

Now that we can add configuration to jobs, is the ability to dynamically adjust excluded ranks as high a priority? (Not saying it wouldn't be useful, just wondering the relative priority. I think being able to exclude individual cores might be even more useful at this point)

@garlick
Copy link
Member Author

garlick commented Jul 16, 2023

Oof that's a big complication that I hadn't considered (apologies for being so near sighted). You also make a good point about dynamic exclusion changes not being all that useful at this point.

So...probably a bad idea!

Withdrawing :-)

@garlick garlick closed this as completed Jul 16, 2023
@grondo
Copy link
Contributor

grondo commented Jul 16, 2023

I thought this was a very good idea!

@vsoch
Copy link
Member

vsoch commented Jul 16, 2023

+1 I think it's great to bring up ideas, even if they aren't perfect. I think sometimes ignorance about things is actually very good for that because one is not aware of the limitations of something, and often those limitations aren't real but perceived (and things could change).

@garlick
Copy link
Member Author

garlick commented Jul 16, 2023

This is good actually, if we can make the exclude set static, we can simplify some code in the resource module. Then the corner cases mentioned above will not be lingering in the back of my mind as a deficiency of the resource acquisition protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants