New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix build controller performance issues #12623

Merged
merged 1 commit into from Jan 24, 2017

Conversation

Projects
None yet
5 participants
@csrwng
Contributor

csrwng commented Jan 23, 2017

Calls policy completed build processing as soon as it's marked Complete
instead of doing it when the build is handled by the BuildController.
Uses a build.openshift.io/accepted annotation to bump a build so the
BuildController can see it in its queue before the next Resync.

@csrwng

This comment has been minimized.

Show comment
Hide comment
@csrwng

csrwng Jan 23, 2017

Contributor

@bparees ptal

Contributor

csrwng commented Jan 23, 2017

@bparees ptal

@bparees

naming nits. Also we're going to want to run the full extended builds/image_ecosystem suite on this.

Show outdated Hide outdated pkg/build/api/types.go
BuildCompletedAnnotation = "openshift.io/build.completed"
// BuildAcceptedAnnotation is an annotation used to update a build that has been
// created so it can be seen build controller queue before a resync.
BuildAcceptedAnnotation = "openshift.io/build.accepted"

This comment has been minimized.

@bparees

bparees Jan 23, 2017

Contributor

the new annotation format(sigh) is
BuildSourceSecretMatchURIAnnotationPrefix = "build.openshift.io/source-secret-match-uri-"

please align these new annotations with that.

@bparees

bparees Jan 23, 2017

Contributor

the new annotation format(sigh) is
BuildSourceSecretMatchURIAnnotationPrefix = "build.openshift.io/source-secret-match-uri-"

please align these new annotations with that.

Show outdated Hide outdated pkg/build/api/types.go
// will prevent the build controller from processing it further.
BuildCompletedAnnotation = "openshift.io/build.completed"
// BuildAcceptedAnnotation is an annotation used to update a build that has been
// created so it can be seen build controller queue before a resync.

This comment has been minimized.

@bparees

bparees Jan 23, 2017

Contributor

BuildAcceptedAnnotation is an annotation used to update a build that can now be run based on the RunPolicy(e.g. Serial). Updating the build with this annotation forces the build to be processed by the build controller queue without waiting for a resync.

@bparees

bparees Jan 23, 2017

Contributor

BuildAcceptedAnnotation is an annotation used to update a build that can now be run based on the RunPolicy(e.g. Serial). Updating the build with this annotation forces the build to be processed by the build controller queue without waiting for a resync.

@bparees

This comment has been minimized.

Show comment
Hide comment
@bparees

bparees Jan 23, 2017

Contributor

and technically this needs @openshift/api-review

Contributor

bparees commented Jan 23, 2017

and technically this needs @openshift/api-review

@bparees bparees self-assigned this Jan 23, 2017

@csrwng

This comment has been minimized.

Show comment
Hide comment
@csrwng

csrwng Jan 23, 2017

Contributor

comments addressed [test]

Contributor

csrwng commented Jan 23, 2017

comments addressed [test]

@deads2k

This comment has been minimized.

Show comment
Hide comment
@deads2k

deads2k Jan 23, 2017

Contributor

I'm a little surprised these are annotations. Why wouldn't we have conditions for these on BuildStatus?

Contributor

deads2k commented Jan 23, 2017

I'm a little surprised these are annotations. Why wouldn't we have conditions for these on BuildStatus?

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Jan 23, 2017

Member

Um. Why?

Member

smarterclayton commented Jan 23, 2017

Um. Why?

Show outdated Hide outdated pkg/build/controller/controller.go
@@ -102,9 +102,13 @@ func (bc *BuildController) HandleBuild(build *buildapi.Build) error {
}
if buildutil.IsBuildComplete(build) {
if _, ok := build.Annotations[buildapi.BuildCompletedAnnotation]; ok {
return nil
}

This comment has been minimized.

@smarterclayton

smarterclayton Jan 23, 2017

Member

So you have complete... and really complete?

@smarterclayton

smarterclayton Jan 23, 2017

Member

So you have complete... and really complete?

This comment has been minimized.

@smarterclayton

smarterclayton Jan 23, 2017

Member

How is this safe? Why aren't you just creating a cache of "really complete" builds in memory and consulting that?

@smarterclayton

smarterclayton Jan 23, 2017

Member

How is this safe? Why aren't you just creating a cache of "really complete" builds in memory and consulting that?

This comment has been minimized.

@csrwng

csrwng Jan 23, 2017

Contributor

:-) I couldn't think of a better name ... but basically it means that we shouldn't run the runPolicy.OnComplete command on it. Maybe ran-policy-on-complete ?

@csrwng

csrwng Jan 23, 2017

Contributor

:-) I couldn't think of a better name ... but basically it means that we shouldn't run the runPolicy.OnComplete command on it. Maybe ran-policy-on-complete ?

This comment has been minimized.

@csrwng

csrwng Jan 23, 2017

Contributor

@smarterclayton how is building a cache better ? It really doesn't matter that we run the policy.OnComplete again, but we'd rather avoid it if we can.

@csrwng

csrwng Jan 23, 2017

Contributor

@smarterclayton how is building a cache better ? It really doesn't matter that we run the policy.OnComplete again, but we'd rather avoid it if we can.

This comment has been minimized.

@deads2k

deads2k Jan 23, 2017

Contributor

@smarterclayton how is building a cache better ? It really doesn't matter that we run the policy.OnComplete again, but we'd rather avoid it if we can.

Adding post-processing annotations instead of just using the straight status fields we have (you can index on them if you want) is effectively adding multiple (potentially conflicting) sources of truth. Adding them to annotations in particular gets weird quick, since you've created a "status" annotation you could set via a spec update. In addition, you don't even get label selection benefit as an annotation and you aren't using them for restricting selection from the server, so they aren't buying much in terms of performance over checking the actual status.

@deads2k

deads2k Jan 23, 2017

Contributor

@smarterclayton how is building a cache better ? It really doesn't matter that we run the policy.OnComplete again, but we'd rather avoid it if we can.

Adding post-processing annotations instead of just using the straight status fields we have (you can index on them if you want) is effectively adding multiple (potentially conflicting) sources of truth. Adding them to annotations in particular gets weird quick, since you've created a "status" annotation you could set via a spec update. In addition, you don't even get label selection benefit as an annotation and you aren't using them for restricting selection from the server, so they aren't buying much in terms of performance over checking the actual status.

This comment has been minimized.

@bparees

bparees Jan 23, 2017

Contributor

also memory caches die. we need this persisted so we don't have to run through all the builds again on a restart of the controller.

The distinction is a complete build means the build is complete. this annotation means "and we don't care about it anymore" (we still care about complete builds, the first time we see them enter the complete state)

@bparees

bparees Jan 23, 2017

Contributor

also memory caches die. we need this persisted so we don't have to run through all the builds again on a restart of the controller.

The distinction is a complete build means the build is complete. this annotation means "and we don't care about it anymore" (we still care about complete builds, the first time we see them enter the complete state)

This comment has been minimized.

@csrwng

csrwng Jan 23, 2017

Contributor

@deads2k so you're proposing to modify BuildStatus to add this state ("really completed") to it?

@csrwng

csrwng Jan 23, 2017

Contributor

@deads2k so you're proposing to modify BuildStatus to add this state ("really completed") to it?

@bparees

This comment has been minimized.

Show comment
Hide comment
@bparees

bparees Jan 23, 2017

Contributor

Um. Why?

can't tell if that's directed at @deads2k or the PR itself.

Contributor

bparees commented Jan 23, 2017

Um. Why?

can't tell if that's directed at @deads2k or the PR itself.

@deads2k

This comment has been minimized.

Show comment
Hide comment
@deads2k

deads2k Jan 23, 2017

Contributor

@deads2k so you're proposing to modify BuildStatus to add this state ("really completed") to it?

If it isn't actually computable from current status, yes. If it is computable from current status, do that and produce your own index (other controllers do this for various resources). Having it as annotation doesn't seem like a good result.

Contributor

deads2k commented Jan 23, 2017

@deads2k so you're proposing to modify BuildStatus to add this state ("really completed") to it?

If it isn't actually computable from current status, yes. If it is computable from current status, do that and produce your own index (other controllers do this for various resources). Having it as annotation doesn't seem like a good result.

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Jan 23, 2017

Member

Yeah, I'm -1 on modifying the object. Your controller should cache the negatives (ignore this because we know it's calculated) and you should bound that check if you're concerned with memory.

Member

smarterclayton commented Jan 23, 2017

Yeah, I'm -1 on modifying the object. Your controller should cache the negatives (ignore this because we know it's calculated) and you should bound that check if you're concerned with memory.

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Jan 23, 2017

Member

Also, you cannot prevent racing controllers or severely delayed updates, so you will have to run this policy periodically, not just once.

Member

smarterclayton commented Jan 23, 2017

Also, you cannot prevent racing controllers or severely delayed updates, so you will have to run this policy periodically, not just once.

@bparees

This comment has been minimized.

Show comment
Hide comment
@bparees

bparees Jan 23, 2017

Contributor

Also, you cannot prevent racing controllers or severely delayed updates, so you will have to run this policy periodically, not just once.

resync logic will always figure out if a build needs to be run eventually. this logic is related to an optimization path around "whenever a build completes, see if there is another build queued that we should immediately run".

Contributor

bparees commented Jan 23, 2017

Also, you cannot prevent racing controllers or severely delayed updates, so you will have to run this policy periodically, not just once.

resync logic will always figure out if a build needs to be run eventually. this logic is related to an optimization path around "whenever a build completes, see if there is another build queued that we should immediately run".

@deads2k

This comment has been minimized.

Show comment
Hide comment
@deads2k

deads2k Jan 23, 2017

Contributor

Yeah, I'm -1 on modifying the object. Your controller should cache the negatives (ignore this because we know it's calculated) and you should bound that check if you're concerned with memory.

Speaking in person, this sounds like a build condition that is just missing from status today and the phase,condition tuple controls behavior (which is why we moved to conditions, away from phases).

The starttime one is just supposed to be a poke, so a secondary path to kicking the controller logic locally ought to work.

Contributor

deads2k commented Jan 23, 2017

Yeah, I'm -1 on modifying the object. Your controller should cache the negatives (ignore this because we know it's calculated) and you should bound that check if you're concerned with memory.

Speaking in person, this sounds like a build condition that is just missing from status today and the phase,condition tuple controls behavior (which is why we moved to conditions, away from phases).

The starttime one is just supposed to be a poke, so a secondary path to kicking the controller logic locally ought to work.

@bparees

This comment has been minimized.

Show comment
Hide comment
@bparees

bparees Jan 23, 2017

Contributor

(and the problem is that we are currently doing that check for every build, every time we resync. we want to only do it for builds that really did just complete)

Contributor

bparees commented Jan 23, 2017

(and the problem is that we are currently doing that check for every build, every time we resync. we want to only do it for builds that really did just complete)

@csrwng

This comment has been minimized.

Show comment
Hide comment
@csrwng

csrwng Jan 23, 2017

Contributor

[test]

Contributor

csrwng commented Jan 23, 2017

[test]

@bparees

This comment has been minimized.

Show comment
Hide comment
@bparees

bparees Jan 24, 2017

Contributor

i think it lgtm.

Contributor

bparees commented Jan 24, 2017

i think it lgtm.

Show outdated Hide outdated pkg/build/controller/policy/policy.go
for _, build := range nextBuilds {
build.Status.StartTimestamp = &now
build.Annotations[buildapi.BuildAcceptedAnnotation] = uuid.NewRandom().String()

This comment has been minimized.

@smarterclayton

smarterclayton Jan 24, 2017

Member

Add a TODO for this block - "replace with informer notification requeueing in the future"

@smarterclayton

smarterclayton Jan 24, 2017

Member

Add a TODO for this block - "replace with informer notification requeueing in the future"

This comment has been minimized.

@csrwng

csrwng Jan 24, 2017

Contributor

done

@csrwng

csrwng Jan 24, 2017

Contributor

done

Fix build controller performance issues
Calls policy completed build processing as soon as it's marked Complete
instead of doing it when the build is handled by the BuildController.
Uses a build.openshift.io/accepted annotation to bump a build so the
BuildController can see it in its queue before the next Resync.
@csrwng

This comment has been minimized.

Show comment
Hide comment
@csrwng

csrwng Jan 24, 2017

Contributor

[testextended][extended:core(builds)]

Contributor

csrwng commented Jan 24, 2017

[testextended][extended:core(builds)]

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Jan 24, 2017

Member

Lgtm as well

Member

smarterclayton commented Jan 24, 2017

Lgtm as well

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Jan 24, 2017

Member

Evaluated for origin test up to f3d26a3

Member

openshift-bot commented Jan 24, 2017

Evaluated for origin test up to f3d26a3

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Jan 24, 2017

Member

Evaluated for origin testextended up to f3d26a3

Member

openshift-bot commented Jan 24, 2017

Evaluated for origin testextended up to f3d26a3

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Jan 24, 2017

Member

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13216/) (Base Commit: b0f2c58)

Member

openshift-bot commented Jan 24, 2017

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13216/) (Base Commit: b0f2c58)

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Jan 24, 2017

Member

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13216/) (Base Commit: b0f2c58)

Member

openshift-bot commented Jan 24, 2017

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13216/) (Base Commit: b0f2c58)

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Jan 24, 2017

Member

continuous-integration/openshift-jenkins/testextended FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/1022/) (Base Commit: b0f2c58) (Extended Tests: core(builds))

Member

openshift-bot commented Jan 24, 2017

continuous-integration/openshift-jenkins/testextended FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/1022/) (Base Commit: b0f2c58) (Extended Tests: core(builds))

@csrwng

This comment has been minimized.

Show comment
Hide comment
@csrwng

csrwng Jan 24, 2017

Contributor

1 extended test failed with a flake
[merge]

Contributor

csrwng commented Jan 24, 2017

1 extended test failed with a flake
[merge]

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Jan 24, 2017

Member

Evaluated for origin merge up to f3d26a3

Member

openshift-bot commented Jan 24, 2017

Evaluated for origin merge up to f3d26a3

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Jan 24, 2017

Member

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13267/) (Base Commit: 2ea8719) (Image: devenv-rhel7_5766)

Member

openshift-bot commented Jan 24, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13267/) (Base Commit: 2ea8719) (Image: devenv-rhel7_5766)

@openshift-bot openshift-bot merged commit ed6d1ad into openshift:master Jan 24, 2017

3 of 4 checks passed

continuous-integration/openshift-jenkins/testextended Failed
Details
continuous-integration/openshift-jenkins/merge Passed
continuous-integration/openshift-jenkins/test Passed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment