Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy: remove generic deployment trigger controller #14910

Merged

Conversation

mfojtik
Copy link
Contributor

@mfojtik mfojtik commented Jun 27, 2017

Make main deployment config controller trigger rollouts on config changes and image changes.

This replaces the generictrigger controller which was calling instantiate to trigger rollouts, but we don't want to call instantiate from a controller.

This also handles the logic of initial deployment rollouts (with ICT/without ICT/...). The image change trigger is now delegated to @smarterclayton trigger controller so we only observe an image change which we treat differently from a config change.

This should be fully backward compatible with old logic.

@mfojtik
Copy link
Contributor Author

mfojtik commented Jun 27, 2017

[test]

@Kargakis @deads2k one less controller to worry about?

@@ -63,6 +63,8 @@ type DeploymentConfigController struct {

// queue contains deployment configs that need to be synced.
queue workqueue.RateLimitingInterface
// instantiateQueue contains deployment config that should be instantiated.
instantiateQueue workqueue.RateLimitingInterface
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kargakis i didn't want to pollute the main DC controller queue with instantiate calls, that's why I made separate queue just for instantiating (terrible word).

return false
}

if _, err := c.dn.DeploymentConfigs(dc.Namespace).Instantiate(deployutil.DefaultInstantiateRequestForConfig(dc.Name)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, please, we should stop using instantiate inside a controller. Also I was imagining that we could do everything in a single queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kargakis what you want to do instead? just update latest version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to move away from the assumption that bumping latestVersion starts a deployment because that is a status field.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 27, 2017
@mfojtik mfojtik force-pushed the remove-generic-trigger branch 2 times, most recently from c2271e9 to 9f86a34 Compare June 27, 2017 16:35
@mfojtik
Copy link
Contributor Author

mfojtik commented Jun 27, 2017

@Kargakis PTAL

[test]

@mfojtik
Copy link
Contributor Author

mfojtik commented Jun 27, 2017

(this need a unit test to guard us from adding labels/annotations to the RC templates otherwise bad things will happen).

@mfojtik
Copy link
Contributor Author

mfojtik commented Jun 27, 2017

@tnozicka for the review as well :)

@@ -95,18 +95,6 @@ func init() {
rbac.NewRule("create", "get", "list", "watch", "update", "patch", "delete").Groups(kapiGroup).Resources("replicationcontrollers").RuleOrDie(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break down the removal of this controller into a separate commit.

return true, "", err
}
rcTemplate := strippedRc.Spec.Template
// there is difference between map[string]string(nil) vs. map[string]string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course there is :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for clarification if somebody will wonder why i'm doing below that ;-)

if err != nil {
return nil, err
}
delete(copy.Spec.Template.Annotations, deployapi.DeploymentVersionAnnotation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks nicer, sure

// If we already have latest deployment running but the config template does not match
// the latest deployment assume this is a config change and trigger a new deployment if
// the config change trigger is enabled.
if deployutil.HasChangeTrigger(config) && !config.Spec.Paused {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break this out in its own func

@0xmichalis
Copy link
Contributor

Fixes #13763

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 27, 2017
@mfojtik
Copy link
Contributor Author

mfojtik commented Jun 27, 2017

@Kargakis addressed comments, but the integration tests looks bad.. seems like ICT is broken... i thought we now use generic triggers for ICT changes?

@mfojtik
Copy link
Contributor Author

mfojtik commented Jun 27, 2017

@Kargakis this got pretty complicated now :-) basically @smarterclayton controller is doing config change (to replace images)... however we have to record that as image change... you can also have DC with just ICT and without CTT... So the logic to handle this gets a bit crazier now.

@mfojtik mfojtik added this to the 3.7.0 milestone Jun 27, 2017
@mfojtik mfojtik force-pushed the remove-generic-trigger branch 2 times, most recently from eb67a45 to 32425bb Compare June 28, 2017 09:41
@mfojtik
Copy link
Contributor Author

mfojtik commented Jun 28, 2017

@Kargakis

ok      TestTriggers_MultipleICTs
ok      TestTriggers_configChange
ok      TestTriggers_imageChange
ok      TestTriggers_imageChange_nonAutomatic
ok      TestTriggers_manual

victory!

@mfojtik
Copy link
Contributor Author

mfojtik commented Jun 28, 2017

re-[test]

config.Status.Details = new(deployapi.DeploymentDetails)
config.Status.Details.Causes = []deployapi.DeploymentCause{{
Type: deployapi.DeploymentTriggerOnImageChange,
ImageTrigger: &deployapi.DeploymentCauseImageTrigger{From: api.ObjectReference{Kind: "DockerImage", Name: name}},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kargakis i guess this is ok, but just to be sure, this will always record the docker image (not image stream)... to do that I will have to search the triggers and find the trigger with lastTriggeredImage matching the name... can do that but don't want to complicate this if we can avoid it. wdyt?

@@ -375,13 +504,17 @@ func MakeDeploymentV1(config *deployapi.DeploymentConfig, codec runtime.Codec) (
for k, v := range config.Spec.Template.Labels {
podLabels[k] = v
}
// NOTE: You need to update the labelsToSkip variable in pkg/deploy/util when
// adding a new label here.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kargakis guess this check is enough? i don't think we want to keep adding more annotations or labels but if we do, this will remind you to update the list.

@@ -181,6 +181,21 @@ func (c *DeploymentConfigController) Handle(config *deployapi.DeploymentConfig)
c.recorder.Eventf(config, v1.EventTypeNormal, "DeploymentAwaitingCancellation", "Deployment of version %d awaiting cancellation of older running deployments", config.Status.LatestVersion)
return fmt.Errorf("found previous inflight deployment for %s - requeuing", deployutil.LabelForDeploymentConfig(config))
}
// Process triggers and start an initial rollouts
configCopy, err := deployutil.DeploymentConfigDeepCopy(config)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kargakis this is needed to avoid cache mutation (i believe)

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 5, 2017

EDIT: So I was wrong, we used to compare the encoded DC template with the current DC but this PR switched that to compare RC template vs. DC template (which is what upstream does). I haven't realized that is big change and can cause a lot of rollouts and vaporize all changes users does to their RCs (which they should never-ever touch, but Origin allows that..)

@bparees @jim-minter @tnozicka so I figured out what caused the continuous rollouts for the mysql deployment config. Basically the DC generated with new-app (or I assume template) has RC template with "name":

  template:
    metadata:
      annotations:
        openshift.io/generated-by: OpenShiftNewApp
      creationTimestamp: null
      labels:
        app: cakephp-mysql-persistent
        name: mysql
      name: mysql # <--

However, when we create the deployment (RC) from this template, we don't copy config.Spec.Template.Name into the deployment. On reconcile, when the deployment config controller runs, it sees this as a configuration diff:

I1005 08:25:14.891263   27502 deploymentconfig_controller.go:610] Rolling out #6 deployment for myproject/mysql caused by config change, dif
f:.
object.ObjectMeta.Name:
  a: "mysql"
  b: ""

(thanks God I added this diffs ;-) This means the DC controller will keep bumping the LatestVersion because it thinks there was a config change in the DC but since we never copy the "name" into the RC it will do that forever... This only affects templates that has "name" set, which is the reason why none of our deployment conformance tests captured this. Also this was broken for a long time, so I dunno how the old config change trigger controller managed this....

EDIT: It also open a question whether we should copy everything from the template ObjectMeta to the RC to prevent this from happening if somebody sets 'namespace' for example, or whatever other field in ObjectMeta. Other option is to completely ignore the the ObjectMeta.

@openshift-merge-robot openshift-merge-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2017
@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 5, 2017

actually copying the name/etc might not be a good solution here because it will cause all existing DC with latest deployment not having them to trigger which might lead to a mass redeployment in online or in big clusters....

@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 5, 2017

@tnozicka i'm leaning toward to just ignore comparing the name in config change trigger for now to avoid mass rollouts. I will create issue to track this in 3.8.

@tnozicka
Copy link
Contributor

tnozicka commented Oct 5, 2017

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2017
@openshift-merge-robot openshift-merge-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2017
@@ -166,6 +166,7 @@ func OkPodTemplate() *kapi.PodTemplateSpec {
DNSPolicy: kapi.DNSClusterFirst,
TerminationGracePeriodSeconds: &one,
SchedulerName: kapi.DefaultSchedulerName,
SecurityContext: &kapi.PodSecurityContext{},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kargakis @deads2k @tnozicka honestly, I have no idea why this is needed.... I spend last hour or so trying to figure out why the TestHandle() is failing, but without this the HasLatestPodTemplate() will see a diff where this field is nil vs. defaulted...

Just want to double check, but the codec i'm passing down to HasLatestPodTemplate is the same as we passed to the original trigger (annotationCodec), so I have no idea what happened there...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also let this serve as argument for removing the annotations with encoded deployment configs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfojtik I am not sure about this, it has been nil even before no matter the codec in tests - it might be sign of something else going wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnozicka it might well be that previous state was wrong :-) i checked the encoded DC in RC and they seems correct (in running server).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so my theory is this:

  1. Before we haven't decoded the DC nor we did the deepequal in main deployment config controller, because that was part of generic trigger.
  2. The SecurityContext{} is set to empty by default, see kube testing for pod_specs.

So I think this change is fine.

@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 5, 2017

@tnozicka and we are green \o/

@jim-minter
Copy link
Contributor

@mfojtik are there any opportunities to increase test coverage in this codebase as part of your PR, e.g. around the case of repeated redeployments?

@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 5, 2017

@jim-minter i will want to do some trigger tests as follow ups. The continuous rollouts was broken because of the new logic that this PR introduced. It is good that it was found via templateinstance, but we should really have unit tests for this.

@tnozicka
Copy link
Contributor

tnozicka commented Oct 5, 2017

I am ok with follow up
/lgtm

@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mfojtik, tnozicka

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@mfojtik mfojtik added the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2017
@tnozicka
Copy link
Contributor

tnozicka commented Oct 5, 2017

Flake #16370
/test unit
Flake #11315
/test extended_networking_minimal

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-merge-robot openshift-merge-robot merged commit 3c845a0 into openshift:master Oct 5, 2017
@deads2k
Copy link
Contributor

deads2k commented Oct 5, 2017

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.