Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: ensure dup alloc names are fixed before plan submit. #18873

Merged
merged 4 commits into from Oct 27, 2023

Conversation

jrasell
Copy link
Member

@jrasell jrasell commented Oct 26, 2023

This change fixes a bug within the generic scheduler which meant duplicate alloc indexes (names) could be submitted to the plan applier and written to state. The bug originates from the placements calculation notion that names of allocations being replaced are blindly copied to their replacement. This is not correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor duplicate tracking. This can be used when computing placements to ensure duplicate are found, and a new name picked before the plan is submitted. The name index tracking is now passed from the reconciler to the generic scheduler via the results, so this does not have to be regenerated, or another data structure used.

closes #10727

Reviewer Notes

The new test TestServiceSched_JobModify_ProposedDuplicateAllocIndex mimics the reproduction behaviour and can be used to see how the code change has fixed the bug.

This reproduction can be used to test the code change in a manual way. The reproduction (in my testing) worked 100% of the time.

This change fixes a bug within the generic scheduler which meant
duplicate alloc indexes (names) could be submitted to the plan
applier and written to state. The bug originates from the
placements calculation notion that names of allocations being
replaced are blindly copied to their replacement. This is not
correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor
duplicate tracking. This can be used when computing placements to
ensure duplicate are found, and a new name picked before the plan
is submitted. The name index tracking is now passed from the
reconciler to the generic scheduler via the results, so this does
not have to be regenerated, or another data structure used.
@jrasell jrasell self-assigned this Oct 26, 2023
@jrasell jrasell added backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line backport/1.6.x backport to 1.6.x release line labels Oct 26, 2023
@jrasell jrasell marked this pull request as ready for review October 26, 2023 11:42
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this @jrasell. The fix is tightly scoped and very clear, and the testing and extra commentary is really helpful.

scheduler/reconcile_util.go Outdated Show resolved Hide resolved
scheduler/reconcile_util.go Show resolved Hide resolved
scheduler/reconcile.go Outdated Show resolved Hide resolved
scheduler/generic_sched_test.go Show resolved Hide resolved
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Comment on lines +660 to +666
allocIndex := a.Index()

if bitmap.Check(allocIndex) {
duplicates[allocIndex]++
} else {
bitmap.Set(allocIndex)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think duplicates replaces bitmap entirely. bitmap is "a map of bools" when we need "a map of ints", hence the new duplicates mapping. However, I think ideally both bitmap and duplicates would be something like a slice of ints where the offset is the alloc index and the value is the number of duplicates: 0 indicating the index is unused, 1 indicating it is used, and >1 indicating it has duplicates. (this is all awkward to describe since both "alloc index" and "index of the slice" are two different concepts that could actually be same thing!)

That being said even if my suggestion works, it would be functionally identical to what you've implemented, so we should go with whatever the lowest-risk easiest-to-read implementation is. Since this is in the parallelized part of scheduling, and cpu time is usually plentiful on servers, I'm far more concerned with correctness than performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the name index tracking could be updated to include duplicates without using a bitmap, however, I think reworking this should be done within any follow-up. This change seems fairly low risk and targeted, whereas a larger rewrite would require substantial regression testing to be included. I'll raise an issue to discuss the idea of reworking the tracking, which I could noodle on casually.

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing some comments early in case I don't have a chance to come back for a deeper review later today, but great work!

Comment on lines +638 to +641
// Pull the allocation name as a new variables, so we can alter
// this as needed without making changes to the original
// object.
newAllocName := missing.Name()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding it, but this comment seems a bit inaccurate. newAllocName doesn't seem to be modified, and even if it were strings are constants, so it shouldn't affect missing? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newAllocName is potentially modified on L654 if it is found to be a duplicate.

// future version of Nomad.
if taskGroupNameIndex.IsDuplicate(allocIndex) {
oldAllocName := newAllocName
newAllocName = taskGroupNameIndex.Next(1)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately obvious to me how this could cause any problems, but given that we're trying to avoid duplicate names it could useful to further investigate if this bit of code could cause problems:

// We have exhausted the free set, now just pick overlapping indexes
var i uint
for i = 0; i < remainder; i++ {
next = append(next, structs.AllocName(a.job, a.taskGroup, i))
a.b.Set(i)
}

Maybe if, somehow, the count value differs between the time the allocNameIndex is created and the call to Next() (like in a job update? or a version revert?) we could maybe hit an overlap? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good thread to pull on and something I will take a look at once this PR has been merged. This resolves a reproducible manifestation of the bug, so I would like to get that fixed given the time pressures before opening up new investigations.

Comment on lines 982 to 983
// Duplicate allocation indexes can be caused due to the way this piece of
// code works. The reproduction involved canaries, and performing both a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make I understood, this hot path can return early with duplicate alloc names? Is there a way to avoid that so that callers don't have to handle them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has been clarified, hopefully that makes things clearer? I spent a while looking to see if I could fix the problem at the source, but I wasn't able to figure out a way.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jrasell jrasell merged commit 3c8eb54 into main Oct 27, 2023
26 checks passed
@jrasell jrasell deleted the gh-10727-sched-fix-method branch October 27, 2023 13:16
jrasell added a commit that referenced this pull request Oct 27, 2023
This change fixes a bug within the generic scheduler which meant
duplicate alloc indexes (names) could be submitted to the plan
applier and written to state. The bug originates from the
placements calculation notion that names of allocations being
replaced are blindly copied to their replacement. This is not
correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor
duplicate tracking. This can be used when computing placements to
ensure duplicate are found, and a new name picked before the plan
is submitted. The name index tracking is now passed from the
reconciler to the generic scheduler via the results, so this does
not have to be regenerated, or another data structure used.
jrasell added a commit that referenced this pull request Oct 27, 2023
…) (#18891)

This change fixes a bug within the generic scheduler which meant
duplicate alloc indexes (names) could be submitted to the plan
applier and written to state. The bug originates from the
placements calculation notion that names of allocations being
replaced are blindly copied to their replacement. This is not
correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor
duplicate tracking. This can be used when computing placements to
ensure duplicate are found, and a new name picked before the plan
is submitted. The name index tracking is now passed from the
reconciler to the generic scheduler via the results, so this does
not have to be regenerated, or another data structure used.

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
jrasell added a commit that referenced this pull request Oct 27, 2023
This change fixes a bug within the generic scheduler which meant
duplicate alloc indexes (names) could be submitted to the plan
applier and written to state. The bug originates from the
placements calculation notion that names of allocations being
replaced are blindly copied to their replacement. This is not
correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor
duplicate tracking. This can be used when computing placements to
ensure duplicate are found, and a new name picked before the plan
is submitted. The name index tracking is now passed from the
reconciler to the generic scheduler via the results, so this does
not have to be regenerated, or another data structure used.
jrasell added a commit that referenced this pull request Oct 27, 2023
This change fixes a bug within the generic scheduler which meant
duplicate alloc indexes (names) could be submitted to the plan
applier and written to state. The bug originates from the
placements calculation notion that names of allocations being
replaced are blindly copied to their replacement. This is not
correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor
duplicate tracking. This can be used when computing placements to
ensure duplicate are found, and a new name picked before the plan
is submitted. The name index tracking is now passed from the
reconciler to the generic scheduler via the results, so this does
not have to be regenerated, or another data structure used.
pkazmierczak pushed a commit that referenced this pull request Oct 30, 2023
This change fixes a bug within the generic scheduler which meant
duplicate alloc indexes (names) could be submitted to the plan
applier and written to state. The bug originates from the
placements calculation notion that names of allocations being
replaced are blindly copied to their replacement. This is not
correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor
duplicate tracking. This can be used when computing placements to
ensure duplicate are found, and a new name picked before the plan
is submitted. The name index tracking is now passed from the
reconciler to the generic scheduler via the results, so this does
not have to be regenerated, or another data structure used.
jrasell added a commit that referenced this pull request Oct 31, 2023
jrasell added a commit that referenced this pull request Nov 1, 2023
nvanthao pushed a commit to nvanthao/nomad that referenced this pull request Mar 1, 2024
…icorp#18873)

This change fixes a bug within the generic scheduler which meant
duplicate alloc indexes (names) could be submitted to the plan
applier and written to state. The bug originates from the
placements calculation notion that names of allocations being
replaced are blindly copied to their replacement. This is not
correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor
duplicate tracking. This can be used when computing placements to
ensure duplicate are found, and a new name picked before the plan
is submitted. The name index tracking is now passed from the
reconciler to the generic scheduler via the results, so this does
not have to be regenerated, or another data structure used.
nvanthao pushed a commit to nvanthao/nomad that referenced this pull request Mar 1, 2024
nvanthao pushed a commit to nvanthao/nomad that referenced this pull request Mar 1, 2024
…icorp#18873)

This change fixes a bug within the generic scheduler which meant
duplicate alloc indexes (names) could be submitted to the plan
applier and written to state. The bug originates from the
placements calculation notion that names of allocations being
replaced are blindly copied to their replacement. This is not
correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor
duplicate tracking. This can be used when computing placements to
ensure duplicate are found, and a new name picked before the plan
is submitted. The name index tracking is now passed from the
reconciler to the generic scheduler via the results, so this does
not have to be regenerated, or another data structure used.
nvanthao pushed a commit to nvanthao/nomad that referenced this pull request Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line backport/1.6.x backport to 1.6.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NOMAD_ALLOC_INDEX is not always unique within a single service job version
5 participants