Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched-simple: reject requests with unknown resource types #2425

Merged
merged 4 commits into from Sep 30, 2019

Conversation

@grondo
Copy link
Contributor

commented Sep 30, 2019

The sched-simple implementation only supports resource types of node slot and core. If a request with more than one resource at a level includes a non-supported type, then that type was being ignored and the job run just on the nodes/cores requested.

This PR fixes that oversight by requiring the scheduler to look at every resource at any given level in the jobspec request.

Note: Instead of canceling these jobs with the generic "unsatisfiable request" message as specified in #2417, the error message is more explicit:

$ flux mini run -g1 hostname
job-event: exception type=alloc severity=0 Unsupported resource type 'gpu'

If that isn't ok then it is probably trivial to make the error message more generic.

Copy link
Member

left a comment

LGTM

@garlick

This comment has been minimized.

Copy link
Member

commented Sep 30, 2019

Fixed a chain-lint error and force pushed. Hope that's OK.

@garlick garlick force-pushed the grondo:issue#2417 branch from b210cf0 to 494bd16 Sep 30, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2019

Thanks! Sorry about that.

grondo added 4 commits Sep 30, 2019
Problem: When parsing jobspec with libjj, all but the first resource
vertex in a list under an out edge was read by the parser, and other
resources were ignored. Thus, requests for resources of unsupported
types were being accepted by the scheduler, and the unsupported types
ignored (as long as they were not the first element in the resource
list)

Change the libjj implementation to read all resources in the list at
each level, and return an error if any resource doesn't match the
list of supported types.

Fixes #2417.
The error message from libjj for unsupported resource types has
been improved. Update tests to reflect the new wording.
Ensure that a request for GPU resources with `flux mini run -g` returns
an error from the simple-sched libjj JSON jobspec parser.
Ensure that sched-simple rejects a request with gpu type resources
at runtime, witha valid error message.
@grondo grondo force-pushed the grondo:issue#2417 branch 2 times, most recently from 3fece72 to 76aab90 Sep 30, 2019
@codecov-io

This comment has been minimized.

Copy link

commented Sep 30, 2019

Codecov Report

Merging #2425 into master will increase coverage by <.01%.
The diff coverage is 72.72%.

@@            Coverage Diff             @@
##           master    #2425      +/-   ##
==========================================
+ Coverage   81.12%   81.13%   +<.01%     
==========================================
  Files         225      225              
  Lines       36102    36110       +8     
==========================================
+ Hits        29289    29298       +9     
+ Misses       6813     6812       -1
Impacted Files Coverage Δ
src/modules/sched-simple/libjj.c 83.33% <72.72%> (-2.88%) ⬇️
src/common/libsubprocess/local.c 79.86% <0%> (-0.35%) ⬇️
src/shell/shell.c 85.75% <0%> (-0.27%) ⬇️
src/common/libflux/message.c 80.5% <0%> (+0.13%) ⬆️
src/broker/module.c 75.17% <0%> (+0.23%) ⬆️
src/broker/modservice.c 71.42% <0%> (+0.75%) ⬆️
src/common/libflux/mrpc.c 88.97% <0%> (+1.18%) ⬆️
@mergify mergify bot merged commit 0038f1b into flux-framework:master Sep 30, 2019
3 of 4 checks passed
3 of 4 checks passed
codecov/patch 72.72% of diff hit (target 81.12%)
Details
Summary 1 rule matches
Details
codecov/project 81.13% (+<.01%) compared to c8f28ed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@garlick garlick referenced this pull request Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.