Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: RescheduleTracker dropped if follow-up fails placements #12319

Merged
merged 4 commits into from
Jun 10, 2024

Conversation

tgross
Copy link
Member

@tgross tgross commented Mar 18, 2022

When an allocation fails it triggers an evaluation. The evaluation is processed and the scheduler sees it needs to reschedule, which triggers a follow-up eval. The follow-up eval creates a plan to (stop 1) (place 1). The replacement alloc has a RescheduleTracker (or gets its RescheduleTracker updated).

But in the case where the follow-up eval can't place all allocs (there aren't enough resources), it can create a partial plan to (stop 1) (place 0). It then creates a blocked eval. The plan applier stops the failed alloc. Then when the blocked eval is processed, the job is missing an allocation, so the scheduler creates a new allocation. This allocation is not a replacement from the perspective of the scheduler, so it's not handed off a RescheduleTracker.

This changeset fixes this by annotating the reschedule tracker whenever the scheduler can't place a replacement allocation. We check this annotation for allocations that have the stop desired status when filtering out allocations to pass to the reschedule tracker. I've also included tests that cover this case and expands coverage of the relevant area of the code.

Fixes: #12147
Fixes: #17072
Fixes: https://hashicorp.atlassian.net/browse/NET-9551


Workflow of the typical happy path:

graph TD;
id1(alloc fails)--client status: failed-->id2(eval A);
id2(eval A)-->id3(follow-up eval B);
id3(follow-up eval B)--desired status: stop-->id4(alloc A);
id3(follow up eval B)--create with RescheduleTracker -->id5(alloc B);
Loading

Workflow of the buggy path:

graph TD;
id1(alloc fails)--client status: failed-->id2(eval A);
id2(eval A)-->id3(follow-up eval B);
id3(follow-up eval B)--desired status: stop-->id4(alloc A);
id3(follow-up eval B)--could not place alloc B-->id5(blocked eval C);
id5(blocked eval C)--create without RescheduleTracker -->id6(alloc B);
Loading

@tgross tgross self-assigned this Mar 18, 2022
@tgross tgross changed the title scheduler: RescheduleTracker dropped if follow-up fail placements scheduler: RescheduleTracker dropped if follow-up fails placements Mar 18, 2022
@vercel vercel bot temporarily deployed to Preview – nomad April 6, 2022 14:59 Inactive
@tgross tgross removed their assignment Oct 26, 2023
@tgross tgross self-assigned this May 15, 2024
@tgross tgross added theme/scheduling stage/needs-rebase This PR needs to be rebased on main before it can be backported to pick up new BPA workflows labels May 15, 2024
@tgross tgross removed the stage/needs-rebase This PR needs to be rebased on main before it can be backported to pick up new BPA workflows label May 17, 2024
tgross added a commit that referenced this pull request May 22, 2024
While working on #20462 #12319 I found that some of our scheduler tests around
down nodes or disconnected clients were enforcing invariants that were
unclear. This changeset pulls out some minor refactorings so that the bug fix PR
is easier to review. This includes:

* Migrating a few tests from `testify` to `shoenig/test` that I'm going to touch
  in #12319 anyways.
* Adding test names to the node down test
* Update the disconnected client test so that we always re-process the
  pending/blocked eval it creates; this eliminates 2 redundant sub-tests.
* Update the disconnected client test assertions so that they're explicit in the
  test setup rather than implied by whether we re-process the pending/blocked
  eval.

Ref: #20462
Ref: #12319
tgross added a commit that referenced this pull request May 22, 2024
While working on #20462 #12319 I found that some of our scheduler tests around
down nodes or disconnected clients were enforcing invariants that were
unclear. This changeset pulls out some minor refactorings so that the bug fix PR
is easier to review. This includes:

* Migrating a few tests from `testify` to `shoenig/test` that I'm going to touch
  in #12319 anyways.
* Adding test names to the node down test
* Update the disconnected client test so that we always re-process the
  pending/blocked eval it creates; this eliminates 2 redundant sub-tests.
* Update the disconnected client test assertions so that they're explicit in the
  test setup rather than implied by whether we re-process the pending/blocked
  eval.

Ref: #20462
Ref: #12319
When an allocation fails it triggers an evaluation. The evaluation is processed
and the scheduler sees it needs to reschedule, which triggers a follow-up
eval. The follow-up eval creates a plan to `(stop 1) (place 1)`. The replacement
alloc has a `RescheduleTracker` (or gets its `RescheduleTracker` updated).

But in the case where the follow-up eval can't place all allocs (there aren't
enough resources), it can create a partial plan to `(stop 1) (place 0)`. It then
creates a blocked eval. The plan applier stops the failed alloc. Then when the
blocked eval is processed, the job is missing an allocation, so the scheduler
creates a new allocation. This allocation is _not_ a replacement from the
perspective of the scheduler, so it's not handed off a `RescheduleTracker`.

This changeset fixes this by annotating the reschedule tracker whenever the
scheduler can't place a replacement allocation. We check this annotation for
allocations that have the `stop` desired status when filtering out allocations
to pass to the reschedule tracker. I've also included tests that cover this case
and expands coverage of the relevant area of the code.

Fixes: #12147
Fixes: #17072
@tgross tgross marked this pull request as ready for review June 4, 2024 17:42
@tgross tgross added this to the 1.8.1 milestone Jun 4, 2024
@tgross tgross added backport/ent/1.6.x+ent Changes are backported to 1.6.x+ent backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.8.x backport to 1.8.x release line labels Jun 4, 2024
Copy link
Member

@gulducat gulducat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully absorbed all the implications of this, but here are my comments so far

nomad/structs/structs.go Outdated Show resolved Hide resolved
scheduler/generic_sched.go Outdated Show resolved Hide resolved
scheduler/generic_sched_test.go Outdated Show resolved Hide resolved
Copy link
Member

@gulducat gulducat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Contributor

@pkazmierczak pkazmierczak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

nomad/structs/structs.go Outdated Show resolved Hide resolved
@tgross
Copy link
Member Author

tgross commented Jun 7, 2024

Waiting on some comments Juanita told me she was working on before merging this.

Copy link
Member

@Juanadelacuesta Juanadelacuesta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work in this meta bug :)

@tgross tgross merged commit fa70267 into main Jun 10, 2024
19 checks passed
@tgross tgross deleted the b-blocked-reschedules branch June 10, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.6.x+ent Changes are backported to 1.6.x+ent backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.8.x backport to 1.8.x release line theme/restart/reschedule theme/scheduling type/bug
Projects
None yet
4 participants