Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix build controller performance issues #12623

Merged
merged 1 commit into from Jan 24, 2017

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Jan 23, 2017

Calls policy completed build processing as soon as it's marked Complete
instead of doing it when the build is handled by the BuildController.
Uses a build.openshift.io/accepted annotation to bump a build so the
BuildController can see it in its queue before the next Resync.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 23, 2017

@bparees ptal

Copy link
Contributor

@bparees bparees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nits. Also we're going to want to run the full extended builds/image_ecosystem suite on this.

BuildCompletedAnnotation = "openshift.io/build.completed"
// BuildAcceptedAnnotation is an annotation used to update a build that has been
// created so it can be seen build controller queue before a resync.
BuildAcceptedAnnotation = "openshift.io/build.accepted"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new annotation format(sigh) is
BuildSourceSecretMatchURIAnnotationPrefix = "build.openshift.io/source-secret-match-uri-"

please align these new annotations with that.

// will prevent the build controller from processing it further.
BuildCompletedAnnotation = "openshift.io/build.completed"
// BuildAcceptedAnnotation is an annotation used to update a build that has been
// created so it can be seen build controller queue before a resync.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildAcceptedAnnotation is an annotation used to update a build that can now be run based on the RunPolicy(e.g. Serial). Updating the build with this annotation forces the build to be processed by the build controller queue without waiting for a resync.

@bparees
Copy link
Contributor

bparees commented Jan 23, 2017

and technically this needs @openshift/api-review

@bparees bparees self-assigned this Jan 23, 2017
@csrwng
Copy link
Contributor Author

csrwng commented Jan 23, 2017

comments addressed [test]

@deads2k
Copy link
Contributor

deads2k commented Jan 23, 2017

I'm a little surprised these are annotations. Why wouldn't we have conditions for these on BuildStatus?

@smarterclayton
Copy link
Contributor

Um. Why?

@@ -102,9 +102,13 @@ func (bc *BuildController) HandleBuild(build *buildapi.Build) error {
}

if buildutil.IsBuildComplete(build) {
if _, ok := build.Annotations[buildapi.BuildCompletedAnnotation]; ok {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you have complete... and really complete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this safe? Why aren't you just creating a cache of "really complete" builds in memory and consulting that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) I couldn't think of a better name ... but basically it means that we shouldn't run the runPolicy.OnComplete command on it. Maybe ran-policy-on-complete ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton how is building a cache better ? It really doesn't matter that we run the policy.OnComplete again, but we'd rather avoid it if we can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton how is building a cache better ? It really doesn't matter that we run the policy.OnComplete again, but we'd rather avoid it if we can.

Adding post-processing annotations instead of just using the straight status fields we have (you can index on them if you want) is effectively adding multiple (potentially conflicting) sources of truth. Adding them to annotations in particular gets weird quick, since you've created a "status" annotation you could set via a spec update. In addition, you don't even get label selection benefit as an annotation and you aren't using them for restricting selection from the server, so they aren't buying much in terms of performance over checking the actual status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also memory caches die. we need this persisted so we don't have to run through all the builds again on a restart of the controller.

The distinction is a complete build means the build is complete. this annotation means "and we don't care about it anymore" (we still care about complete builds, the first time we see them enter the complete state)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k so you're proposing to modify BuildStatus to add this state ("really completed") to it?

@bparees
Copy link
Contributor

bparees commented Jan 23, 2017

Um. Why?

can't tell if that's directed at @deads2k or the PR itself.

@deads2k
Copy link
Contributor

deads2k commented Jan 23, 2017

@deads2k so you're proposing to modify BuildStatus to add this state ("really completed") to it?

If it isn't actually computable from current status, yes. If it is computable from current status, do that and produce your own index (other controllers do this for various resources). Having it as annotation doesn't seem like a good result.

@smarterclayton
Copy link
Contributor

Yeah, I'm -1 on modifying the object. Your controller should cache the negatives (ignore this because we know it's calculated) and you should bound that check if you're concerned with memory.

@smarterclayton
Copy link
Contributor

Also, you cannot prevent racing controllers or severely delayed updates, so you will have to run this policy periodically, not just once.

@bparees
Copy link
Contributor

bparees commented Jan 23, 2017

Also, you cannot prevent racing controllers or severely delayed updates, so you will have to run this policy periodically, not just once.

resync logic will always figure out if a build needs to be run eventually. this logic is related to an optimization path around "whenever a build completes, see if there is another build queued that we should immediately run".

@deads2k
Copy link
Contributor

deads2k commented Jan 23, 2017

Yeah, I'm -1 on modifying the object. Your controller should cache the negatives (ignore this because we know it's calculated) and you should bound that check if you're concerned with memory.

Speaking in person, this sounds like a build condition that is just missing from status today and the phase,condition tuple controls behavior (which is why we moved to conditions, away from phases).

The starttime one is just supposed to be a poke, so a secondary path to kicking the controller logic locally ought to work.

@bparees
Copy link
Contributor

bparees commented Jan 23, 2017

(and the problem is that we are currently doing that check for every build, every time we resync. we want to only do it for builds that really did just complete)

@csrwng
Copy link
Contributor Author

csrwng commented Jan 23, 2017

[test]

@bparees
Copy link
Contributor

bparees commented Jan 24, 2017

i think it lgtm.

for _, build := range nextBuilds {
build.Status.StartTimestamp = &now
build.Annotations[buildapi.BuildAcceptedAnnotation] = uuid.NewRandom().String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO for this block - "replace with informer notification requeueing in the future"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Calls policy completed build processing as soon as it's marked Complete
instead of doing it when the build is handled by the BuildController.
Uses a build.openshift.io/accepted annotation to bump a build so the
BuildController can see it in its queue before the next Resync.
@csrwng
Copy link
Contributor Author

csrwng commented Jan 24, 2017

[testextended][extended:core(builds)]

@smarterclayton
Copy link
Contributor

Lgtm as well

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to f3d26a3

@openshift-bot
Copy link
Contributor

Evaluated for origin testextended up to f3d26a3

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13216/) (Base Commit: b0f2c58)

1 similar comment
@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13216/) (Base Commit: b0f2c58)

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/testextended FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/1022/) (Base Commit: b0f2c58) (Extended Tests: core(builds))

@csrwng
Copy link
Contributor Author

csrwng commented Jan 24, 2017

1 extended test failed with a flake
[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to f3d26a3

@openshift-bot
Copy link
Contributor

openshift-bot commented Jan 24, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13267/) (Base Commit: 2ea8719) (Image: devenv-rhel7_5766)

@openshift-bot openshift-bot merged commit ed6d1ad into openshift:master Jan 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants