Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable immediate allocation for NP jobs #114

Merged
merged 2 commits into from
Mar 31, 2022
Merged

Conversation

odp
Copy link
Collaborator

@odp odp commented Mar 23, 2022

_allocate_one looks at existing pods to look for an available node for the newly-added job. There could be a situation where two consecutive non-preemptible jobs can both get the same allocation because the earlier job haven't spawned any replica pods yet. This can trigger a reallocation for the second non-preemptible job at a later full allocation cycle which crashes the allocator because it violates the basic assumption about pinned non-preemptible jobs. This fix only allows this optimization for preemptible jobs.

@odp odp requested a review from rmfan March 23, 2022 07:35
@codecov-commenter
Copy link

Codecov Report

Merging #114 (893e105) into master (66bc1ea) will decrease coverage by 0.07%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #114      +/-   ##
==========================================
- Coverage   61.39%   61.31%   -0.08%     
==========================================
  Files          32       32              
  Lines        2531     2531              
  Branches      416      416              
==========================================
- Hits         1554     1552       -2     
- Misses        896      897       +1     
- Partials       81       82       +1     
Impacted Files Coverage Δ
sched/adaptdl_sched/allocator.py 0.00% <0.00%> (ø)
adaptdl/adaptdl/reducer.py 88.00% <0.00%> (-2.00%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 66bc1ea...893e105. Read the comment docs.

@odp odp merged commit 11c5699 into petuum:master Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants