Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix for incorrectly handling implicit exclusivity #502

Merged
merged 3 commits into from Jul 31, 2019

Conversation

@dongahn
Copy link
Contributor

dongahn commented Jul 30, 2019

This PR fixes a bug where an implicit resource exclusivity created by slot is incorrectly handled.

Problem: When we have some sub-resources under the visiting resource have been allocated to another job, we can prune the further walk even though the visiting resource itself has not been exclusively allocated to another job. But it appears dfu_impl_t::by_excl () does not factor into account the implicit exclusivity that can arise from slot: e.g., with slot[2]->socket[1]->core[1], sockets and cores under slots are implicitly exclusive.

Augment dfu_impl_t::by_excl () to handle this case.

Augment t/t3001-resource-basic.t to cover this case.

dongahn added 3 commits Jul 25, 2019
Problem: When we have some sub-resources under the visiting
resource allocated to another job, we can
prune the further walk even though the visiting resource
itself has not been exclusively allocated to another job.
It appears by_excl does not factor into account the
implicit exclusivity that can arise from slot: e.g.,
with slot[2]->socket[1]->core[1], sockets and cores
under slots are implicitly exclusive.
As a result, we end up exclusively allocating partially
allocated resources. This is incorrect.

Augment dfu_impl_t::by_excl() to fix this issue.
Add cases where once all of the sockets have been
exclusively allocated, no other jobs can be scheduled
even though there are cores that have not been
allocated.
flux-core changed the status of an cancel job to 15 for
mockup runs and two qmanager tests in t1001-qmanager-basic.t
were failing.

Fix them by changing the code from 9 to 15.
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 30, 2019

Codecov Report

Merging #502 into master will decrease coverage by 0.04%.
The diff coverage is 42.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #502      +/-   ##
==========================================
- Coverage   75.56%   75.52%   -0.05%     
==========================================
  Files          60       60              
  Lines        6156     6161       +5     
==========================================
+ Hits         4652     4653       +1     
- Misses       1504     1508       +4
Impacted Files Coverage Δ
resource/traversers/dfu_impl.hpp 100% <ø> (ø) ⬆️
resource/traversers/dfu_impl.cpp 82.3% <42.85%> (-0.54%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b51c89d...2a4f3b2. Read the comment docs.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jul 30, 2019

@SteVwonder: Ok, this is ready for your review.

Copy link
Member

SteVwonder left a comment

LGTM! Presumably you want this merged before #503 and #504?

@SteVwonder SteVwonder mentioned this pull request Jul 31, 2019
6 of 10 tasks complete
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jul 31, 2019

@SteVwonder: Yes this should be merged as soon as possible and I believe this is ready to go in. Thanks.

@SteVwonder SteVwonder merged commit 376b82a into flux-framework:master Jul 31, 2019
1 of 3 checks passed
1 of 3 checks passed
codecov/patch 42.85% of diff hit (target 75.56%)
Details
codecov/project 75.52% (-0.05%) compared to b51c89d
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.