Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix jobs exceeding plan concurrency #83

Merged
merged 4 commits into from Jun 22, 2020
Merged

Conversation

MonzElmasry
Copy link
Contributor

@MonzElmasry MonzElmasry commented Jun 10, 2020

Fixes #77

@MonzElmasry MonzElmasry marked this pull request as draft June 17, 2020 00:36
@MonzElmasry MonzElmasry marked this pull request as ready for review June 21, 2020 06:48
Copy link

@galal-hussein galal-hussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -184,3 +186,12 @@ func sha256sum(s ...string) string {
}
return fmt.Sprintf("%x", h.Sum(nil))
}

func AppendIfMissing(slice []string, element string) []string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function used outside of this package? If not, can you make it unexported?

@@ -54,6 +54,17 @@ func (ctl *Controller) handleJobs(ctx context.Context) error {
case err != nil:
return obj, err
}
if func(jobNode string, applyingNodes []string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this entire block be simplified?

Comment on lines 171 to 173
for _, node := range applying {
selected = AppendIfMissing(selected, node)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see why this block is necessary given the following (from above):

		applyingNodes, err := nodeCache.List(nodeSelector.Add(*requirementApplying))
		if err != nil {
			return nil, err
		}
		for _, node := range applyingNodes {
			selected = append(selected, node.Name)
		}

Additionally, I believe that this will prevent an "applying" node from ever dropping off of the "selected" slice returned by this func.

}
return true
}(obj.Labels[upgradeapi.LabelNode], plan.Status.Applying) {
return obj, deleteJob(jobs, obj, metav1.DeletePropagationBackground)
Copy link
Contributor

@dweomer dweomer Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will prematurely delete a job that has completed successfully (because the node for which it ran will drop off of the "applying" list).

I do not like the idea of deleting a job that that is neither failed nor completed but otherwise meets all of the plan's criteria. But if we must prune here, because the node that the job is targeting IS NOT on the plan's .status.applying list, then it should be the last conditional action of this OnChange handler.

Additionally, given that there are no limits placed on the length of .status.applying we should eschew looping through all elements looking for a match and instead leverage the fact that the .status.applying is sorted which means we can use https://golang.org/pkg/sort/#SearchStrings or https://golang.org/pkg/sort/#Search to perform binary search.

Menna added 2 commits June 22, 2020 20:36
fix jobs exceeding plan concurrency 2
@@ -59,7 +59,7 @@ func (ctl *Controller) handlePlans(ctx context.Context) error {
)

// process plan events by creating jobs to apply the plan
upgradectlv1.RegisterPlanGeneratingHandler(ctx, plans, ctl.apply.WithCacheTypes(jobs, nodes, secrets).WithNoDelete(), "", ctl.Name,
upgradectlv1.RegisterPlanGeneratingHandler(ctx, plans, ctl.apply.WithCacheTypes(nodes, secrets).WithGVK(jobs.GroupVersionKind()).WithDynamicLookup().WithNoDelete(), "", ctl.Name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is more correct but it didn't get us anything, is that right @MonzElmasry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! only fixed by deleting unapplied jobs

Copy link
Member

@briandowns briandowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dweomer dweomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small change requested: please comment the new block of code telling the reader what we are doing and why.

@@ -94,6 +95,10 @@ func (ctl *Controller) handleJobs(ctx context.Context) error {
}
return obj, enqueueOrDelete(jobs, obj, upgradejob.ConditionComplete)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like a comment here, as with the other blocks, explaining what we are deciding to do and why, e.g.:

// if the job is hasn't failed or completed but is not on the applying list, consider it running out-of-turn and delete it

@dweomer dweomer merged commit 990c403 into rancher:master Jun 22, 2020
@MonzElmasry MonzElmasry deleted the job_conc branch October 16, 2020 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

can trigger number of apply jobs greater than plan concurrency
4 participants