New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix jobs exceeding plan concurrency #83
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/upgrade/plan/plan.go
Outdated
@@ -184,3 +186,12 @@ func sha256sum(s ...string) string { | |||
} | |||
return fmt.Sprintf("%x", h.Sum(nil)) | |||
} | |||
|
|||
func AppendIfMissing(slice []string, element string) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function used outside of this package? If not, can you make it unexported?
pkg/upgrade/handle_batch.go
Outdated
@@ -54,6 +54,17 @@ func (ctl *Controller) handleJobs(ctx context.Context) error { | |||
case err != nil: | |||
return obj, err | |||
} | |||
if func(jobNode string, applyingNodes []string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this entire block be simplified?
pkg/upgrade/plan/plan.go
Outdated
for _, node := range applying { | ||
selected = AppendIfMissing(selected, node) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see why this block is necessary given the following (from above):
applyingNodes, err := nodeCache.List(nodeSelector.Add(*requirementApplying))
if err != nil {
return nil, err
}
for _, node := range applyingNodes {
selected = append(selected, node.Name)
}
Additionally, I believe that this will prevent an "applying" node from ever dropping off of the "selected" slice returned by this func.
pkg/upgrade/handle_batch.go
Outdated
} | ||
return true | ||
}(obj.Labels[upgradeapi.LabelNode], plan.Status.Applying) { | ||
return obj, deleteJob(jobs, obj, metav1.DeletePropagationBackground) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will prematurely delete a job that has completed successfully (because the node for which it ran will drop off of the "applying" list).
I do not like the idea of deleting a job that that is neither failed nor completed but otherwise meets all of the plan's criteria. But if we must prune here, because the node that the job is targeting IS NOT on the plan's .status.applying
list, then it should be the last conditional action of this OnChange handler.
Additionally, given that there are no limits placed on the length of .status.applying
we should eschew looping through all elements looking for a match and instead leverage the fact that the .status.applying
is sorted which means we can use https://golang.org/pkg/sort/#SearchStrings or https://golang.org/pkg/sort/#Search to perform binary search.
fix jobs exceeding plan concurrency 2
fix stash comments
@@ -59,7 +59,7 @@ func (ctl *Controller) handlePlans(ctx context.Context) error { | |||
) | |||
|
|||
// process plan events by creating jobs to apply the plan | |||
upgradectlv1.RegisterPlanGeneratingHandler(ctx, plans, ctl.apply.WithCacheTypes(jobs, nodes, secrets).WithNoDelete(), "", ctl.Name, | |||
upgradectlv1.RegisterPlanGeneratingHandler(ctx, plans, ctl.apply.WithCacheTypes(nodes, secrets).WithGVK(jobs.GroupVersionKind()).WithDynamicLookup().WithNoDelete(), "", ctl.Name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is more correct but it didn't get us anything, is that right @MonzElmasry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! only fixed by deleting unapplied jobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small change requested: please comment the new block of code telling the reader what we are doing and why.
@@ -94,6 +95,10 @@ func (ctl *Controller) handleJobs(ctx context.Context) error { | |||
} | |||
return obj, enqueueOrDelete(jobs, obj, upgradejob.ConditionComplete) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like a comment here, as with the other blocks, explaining what we are deciding to do and why, e.g.:
// if the job is hasn't failed or completed but is not on the applying list, consider it running out-of-turn and delete it
Fixes #77