Improve deployment scaling behavior #5875

ironcladlou · 2015-11-12T15:07:36Z

This change makes the DeploymentConfigController solely responsible for scaling RCs owned by a deployment config which aren't currently being manipulated by a deployer process. RCs owned by the deployment config will be reconciled so that the active deployment matches the config replica count and all other RCs are scaled to 0.

Rationale:

Prevents races with HPA, users performing manual scaling, the deployer process, and the deployer pod controller (which was also performing scaling operations.)
Aligns replica count handling with upstream deployments; today, the replica count for deployments follows the last successful deployment and requires users to scale RCs in order to adjust subsequent deployment replica counts. Now, changes to the config replica count will drive new deployments, and scale existing deployments.
Allows all scaling operations to target the deploymentConfig only rather than RCs.

This also adds a variety of new events to provide insight into what the controller is doing.

Fixes #5597

TODO

deployerPodController unit tests
update DC scaling subresource to scale the DC directly
update deployment registry to remove restriction on scaling during an in-progress deployment
e2e or extended tests to exercise scaling interactions
discussion of Improve deployment scaling behavior #5875 (comment)
implement compatibility logic for old clients which will still scale RCs directly

ironcladlou · 2015-11-12T15:09:52Z

cc @Kargakis @DirectXMan12

This is going to require a lot of review and testing.

ironcladlou · 2015-11-12T18:43:11Z

The way this is coded right now, replica counts for failed deployments won't be reconciled until the deployment config controller full resync interval, since there is no update to the deploymentConfig which would trigger a watch event. If that's a problem, we could do something like update an annotation or status of the deploymentConfig when a deployment fails (just some off the cuff examples).

DirectXMan12 · 2015-11-12T18:44:51Z

pkg/deploy/controller/deploymentconfig/controller.go

+	// If the latest deployment already exists, reconcile replica counts.
+	if latestDeploymentExists {
+		// If the latest deployment exists and is still running, there's nothing
+		// to do.


We could change the target replicas annotation here so that the rolling deployer shoots for the updated version

The rolling updater code (upstreamed) won't consider mid-flight changes to the target replica count. The complexity involved in making it do so at this point is probably more than we're willing to take on given our push to use upstream deployments (which does support mid-steam scale changes.)

DirectXMan12 · 2015-11-12T18:55:48Z

You should drop in a commit that makes the DC scale subresource use this, so we can test how well it works with the HPA

0xmichalis · 2015-11-13T11:34:04Z

pkg/deploy/controller/deploymentconfig/controller.go

-
-	// Compute the desired replicas for the deployment. Use the last completed
-	// deployment's current replica count, or the config template if there is no
-	// prior completed deployment available.
 	desiredReplicas := config.Template.ControllerTemplate.Replicas


I would like a big fat comment above explaining what has changed since this pretty big.

Did a ton of simplifying refactoring in here, PTAL. Holding off on unit tests until I do some more manual testing.

I also ripped out the status update code since it'll be reintroduced in a refactored form with #5530

0xmichalis · 2015-11-13T11:42:57Z

You should drop in a commit that makes the DC scale subresource use this, so we can test how well it works with the HPA

I was about to add a note about this. We need to update Update to scale the DC only. I think it will work like a charm. After this change, #5852 can get in, too.

0xmichalis · 2015-11-13T11:43:29Z

ps. this definitely needs a lot of testing. Will start from today.

ironcladlou · 2015-11-13T20:00:34Z

Completely redid the deploymentConfigController unit test.

0xmichalis · 2015-11-16T10:56:25Z

The way this is coded right now, replica counts for failed deployments won't be reconciled until the deployment config controller full resync interval, since there is no update to the deploymentConfig which would trigger a watch event. If that's a problem, we could do something like update an annotation or status of the deploymentConfig when a deployment fails (just some off the cuff examples).

If you trigger a fake config change then you would have the config controller be another entrypoint for scaling? I don't like it. I don't like that we have this interval for the dc controller, too:) What is stopping us from having a controller that will do both? Reconciling at intervals and on config changes too?

0xmichalis · 2015-11-16T12:22:53Z

You should drop in a commit that makes the DC scale subresource use this, so we can test how well it works with the HPA

Any updates on this?

0xmichalis · 2015-11-16T14:10:57Z

You should drop in a commit that makes the DC scale subresource use this, so we can test how well it works with the HPA

Any updates on this?

I am trying to implement this locally and based on this comment, I think we may want to add a new field in the deploymentConfig that will keep track of all the replicas that ever existed. 1) We would get closer to upstream deployments with this change and 2) we would compute this only in the controller where we already list all the deployments in every dc reconclilation (so no need for

origin/pkg/deploy/registry/deployconfig/etcd/etcd.go

Lines 243 to 260 in 0c0e452

    
           func (r *ScaleREST) deploymentsForConfig(namespace, configName string) (*kapi.ReplicationControllerList, error) { 
        
           	selector := util.ConfigSelector(configName) 
        
           	return r.rcNamespacer.ReplicationControllers(namespace).List(selector, fields.Everything()) 
        
           } 
        
           func (r *ScaleREST) replicasForDeploymentConfig(namespace, configName string) (int, error) { 
        
           	rcList, err := r.deploymentsForConfig(namespace, configName) 
        
           	if err != nil { 
        
           		return 0, err 
        
           	} 
        
           	replicas := 0 
        
           	for _, rc := range rcList.Items { 
        
           		replicas += rc.Spec.Replicas 
        
           	} 
        
           	return replicas, nil 
        
           }

). Thoughts?

ironcladlou · 2015-11-16T14:47:58Z

The way this is coded right now, replica counts for failed deployments won't be reconciled until the deployment config controller full resync interval, since there is no update to the deploymentConfig which would trigger a watch event. If that's a problem, we could do something like update an annotation or status of the deploymentConfig when a deployment fails (just some off the cuff examples).

If you trigger a fake config change then you would have the config controller be another entrypoint for scaling? I don't like it. I don't like that we have this interval for the dc controller, too:) What is stopping us from having a controller that will do both? Reconciling at intervals and on config changes too?

The config controller is still the scaling controller. What I mean is, due to our pod based deployment mechanism, there are times when we want to reconcile the DCs in response to non-DC updates. In this case, when the deployer pod reaches a terminal state, we should reconcile the owning DC. The question is how to trigger that reconcile other than waiting for the resync interval since it's not the DC that changed, but a related resource (the deployer pod).

ironcladlou · 2015-11-16T14:50:17Z

You should drop in a commit that makes the DC scale subresource use this, so we can test how well it works with the HPA

Any updates on this?

I agree that the scale subresource should be updated along with this PR, will add it to my list of TODOs.

0xmichalis · 2015-11-16T14:51:47Z

Ah, makes sense now. Agreed that a mocked config change would fit here.

On Mon, Nov 16, 2015 at 3:48 PM, Dan Mace notifications@github.com wrote:

The way this is coded right now, replica counts for failed deployments
won't be reconciled until the deployment config controller full resync
interval, since there is no update to the deploymentConfig which would
trigger a watch event. If that's a problem, we could do something like
update an annotation or status of the deploymentConfig when a deployment
fails (just some off the cuff examples).

If you trigger a fake config change then you would have the config
controller be another entrypoint for scaling? I don't like it. I don't like
that we have this interval for the dc controller, too:) What is stopping us
from having a controller that will do both? Reconciling at intervals and on
config changes too?

The config controller is still the scaling controller. What I mean is, due
to our pod based deployment mechanism, there are times when we want to
reconcile the DCs in response to non-DC updates. In this case, when the
deployer pod reaches a terminal state, we should reconcile the owning DC.
The question is how to trigger that reconcile other than waiting for the
resync interval since it's not the DC that changed, but a related resource
(the deployer pod).

—
Reply to this email directly or view it on GitHub
#5875 (comment).

ironcladlou · 2015-11-16T14:51:58Z

I am trying to implement this locally and based on this comment, I think we may want to add a new field in the deploymentConfig that will keep track of all the replicas that ever existed. 1) We would get closer to upstream deployments with this change and 2) we would compute this only in the controller where we already list all the deployments in every dc reconclilation (so no need for

origin/pkg/deploy/registry/deployconfig/etcd/etcd.go

Lines 243 to 260 in 0c0e452

func (r *ScaleREST) deploymentsForConfig(namespace, configName string) (*kapi.ReplicationControllerList, error) {

selector := util.ConfigSelector(configName)

return r.rcNamespacer.ReplicationControllers(namespace).List(selector, fields.Everything())

}

func (r *ScaleREST) replicasForDeploymentConfig(namespace, configName string) (int, error) {

rcList, err := r.deploymentsForConfig(namespace, configName)

if err != nil {

return 0, err

}

replicas := 0

for _, rc := range rcList.Items {

replicas += rc.Spec.Replicas

}

return replicas, nil

}

). Thoughts?

Agree, but is there any reason that has to be bundled with this PR?

0xmichalis · 2015-11-16T14:54:30Z

Agree, but is there any reason that has to be bundled with this PR?

No reason at all, I just wanted to discuss it. I can work on a separate PR and add it once this PR lands.

0xmichalis · 2015-11-17T10:50:57Z

we could do something like update an annotation or status of the deploymentConfig when a deployment fails

I would be fine with marking the owning dc with an annotation while marking the deployment as failed.

0xmichalis · 2015-11-17T12:12:29Z

pkg/deploy/controller/deploymentconfig/controller.go

-		config.Details = new(deployapi.DeploymentDetails)
+	// No deployments are running and the latest deployment doesn't exist, so
+	// create the new deployment.
+	deployment, err := deployutil.MakeDeployment(config, c.codec)


This utility doesn't propagate the config namespace to the deployment resulting in errors when the controller is trying to create a new deployment.

Good catch, thanks

0xmichalis · 2015-11-17T12:34:11Z

I have been testing this locally including the fix I commented above and so far it works as expected. It takes the interval but older deployments are scaled down as expected. Still no stress testing though.

One question: since the hpa controller now will target the dc template, the only way to scale an older deployment is by manually scaling it, so somebody has to specifically run oc scale rc/deployment-older, right? An idea is to special-case oc scale rc/something-owned-by-a-dc so we can trigger a reconcilation on the spot or even better prevent scaling (older or even the latest too) deployments (ie rcs) directly. If we agree, this has to happen both in upstream deployments and here (special-case rcs owned by deployments). Thoughts?

ironcladlou · 2015-11-17T13:57:34Z

@Kargakis

One question: since the hpa controller now will target the dc template, the only way to scale an older deployment is by manually scaling it, so somebody has to specifically run oc scale rc/deployment-older, right? An idea is to special-case oc scale rc/something-owned-by-a-dc so we can trigger a reconcilation on the spot or even better prevent scaling (older or even the latest too) deployments (ie rcs) directly. If we agree, this has to happen both in upstream deployments and here (special-case rcs owned by deployments). Thoughts?

If you tried to scale the old RC manually, the deployment controller would scale it down when reconciling. As a user trying to scale up an old deployment, do you really just want a rollback?

ironcladlou · 2015-11-17T14:26:55Z

I would be fine with marking the owning dc with an annotation while marking the deployment as failed.

Can we think more about this and address it in a followup?

ironcladlou · 2015-11-17T14:29:08Z

Should we flesh out better e2e and extended tests separately in #5879?

ironcladlou · 2015-11-17T14:29:18Z

[test]

ironcladlou · 2015-11-17T14:36:08Z

@openshift/ui-review - want to make sure that the console is using the scaling API endpoint for DCs rather than manipulating RCs directly.

openshift-bot · 2015-12-02T21:35:24Z

Evaluated for origin test up to d970f0a

0xmichalis · 2015-12-02T21:39:52Z

Testing locally with a new client and so far it works fine. Will test with an older client too. And I still want to have one more look in unit tests.

0xmichalis · 2015-12-02T21:41:55Z

latest flake: #6176

0xmichalis · 2015-12-02T22:18:43Z

Scaling a running deployment with an older client doesn't work but I think this is fine since it didn't work previously either (the deployer process stomps the latest rc replicas).
Older client scale right just before kicking a new deployment doesn't work.

[vagrant@localhost sample-app]$ oc get rc
CONTROLLER   CONTAINER(S)               IMAGE(S)                            SELECTOR                                                        REPLICAS   AGE
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   1          1m
[vagrant@localhost sample-app]$ oc get dc/database -o yaml | grep replicas
  replicas: 1
[vagrant@localhost sample-app]$ oc get rc/database-1 -o yaml | grep replicas
    openshift.io/deployment.replicas: "1"
      {"kind":"DeploymentConfig","apiVersion":"v1","metadata":{"name":"database","namespace":"test","selfLink":"/oapi/v1/namespaces/test/deploymentconfigs/database","uid":"076fb8d8-9941-11e5-93e5-080027c5bfa9","resourceVersion":"2559","creationTimestamp":"2015-12-02T22:07:08Z","labels":{"template":"application-template-stibuild"}},"spec":{"strategy":{"type":"Recreate","recreateParams":{"pre":{"failurePolicy":"Abort","execNewPod":{"command":["/bin/true"],"env":[{"name":"CUSTOM_VAR1","value":"custom_value1"}],"containerName":"ruby-helloworld-database"}},"post":{"failurePolicy":"Ignore","execNewPod":{"command":["/bin/false"],"env":[{"name":"CUSTOM_VAR2","value":"custom_value2"}],"containerName":"ruby-helloworld-database"}}},"resources":{}},"triggers":[{"type":"ConfigChange"}],"replicas":1,"selector":{"name":"database"},"template":{"metadata":{"creationTimestamp":null,"labels":{"name":"database"}},"spec":{"containers":[{"name":"ruby-helloworld-database","image":"openshift/mysql-55-centos7:latest","ports":[{"containerPort":3306,"protocol":"TCP"}],"env":[{"name":"MYSQL_USER","value":"userAK4"},{"name":"MYSQL_PASSWORD","value":"wYRK11Xt"},{"name":"MYSQL_DATABASE","value":"root"}],"resources":{},"terminationMessagePath":"/dev/termination-log","imagePullPolicy":"Always","securityContext":{"capabilities":{},"privileged":false}}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","securityContext":{}}}},"status":{"latestVersion":1,"details":{"causes":[{"type":"ConfigChange"}]}}}
  replicas: 1
  replicas: 1
[vagrant@localhost sample-app]$ oc scale dc/database --replicas 3; oc deploy database --latest
deploymentconfig "database" scaled
Started deployment #2
[vagrant@localhost sample-app]$ oc get rc -w
CONTROLLER   CONTAINER(S)               IMAGE(S)                            SELECTOR                                                        REPLICAS   AGE
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   3          5m
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   0          4s
CONTROLLER   CONTAINER(S)               IMAGE(S)                            SELECTOR                                                        REPLICAS   AGE
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   0          7s
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         5m
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         5m
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         5m
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   1         23s
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   1         23s
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   1         23s

ironcladlou · 2015-12-02T22:39:11Z

The desired replica count for the new deployment is now driven entirely by
the config replicas, which was still set to 1 (the scale operation set only
the RC replicas). The deployment started before the controller sync
reverse-updated the config to match the manual scaling value. The
controller doesn't reconcile unless the latest deployment exists and is
terminated (i.e. steady state).

Guess we'll have to think more about that. Whack-a-mole continues.

On Wed, Dec 2, 2015 at 5:18 PM, Michail Kargakis notifications@github.com
wrote:

Scaling a running deployment with an older client doesn't work but I
think this is fine since it didn't work previously either (the deployer
process stomps the latest rc replicas).

Older client scale right just before kicking a new deployment doesn't
work.

[vagrant@localhost sample-app]$ oc get rc
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
database-1 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-1,deploymentconfig=database,name=database 1 1m
[vagrant@localhost sample-app]$ oc get dc/database -o yaml | grep replicas
replicas: 1
[vagrant@localhost sample-app]$ oc get rc/database-1 -o yaml | grep replicas
openshift.io/deployment.replicas: "1"
{"kind":"DeploymentConfig","apiVersion":"v1","metadata":{"name":"database","namespace":"test","selfLink":"/oapi/v1/namespaces/test/deploymentconfigs/database","uid":"076fb8d8-9941-11e5-93e5-080027c5bfa9","resourceVersion":"2559","creationTimestamp":"2015-12-02T22:07:08Z","labels":{"template":"application-template-stibuild"}},"spec":{"strategy":{"type":"Recreate","recreateParams":{"pre":{"failurePolicy":"Abort","execNewPod":{"command":["/bin/true"],"env":[{"name":"CUSTOM_VAR1","value":"custom_value1"}],"containerName":"ruby-helloworld-database"}},"post":{"failurePolicy":"Ignore","execNewPod":{"command":["/bin/false"],"env":[{"name":"CUSTOM_VAR2","value":"custom_value2"}],"containerName":"ruby-helloworld-database"}}},"resources":{}},"triggers":[{"type":"ConfigChange"}],"replicas":1,"selector":{"name":"database"},"template":{"metadata":{"creationTimestamp":null,"labels":{"name":"database"}},"spec":{"containers":[{"name":"ruby-helloworld-database","image":"openshift/mysql-!
55-centos
7:latest","ports":[{"containerPort":3306,"protocol":"TCP"}],"env":[{"name":"MYSQL_USER","value":"userAK4"},{"name":"MYSQL_PASSWORD","value":"wYRK11Xt"},{"name":"MYSQL_DATABASE","value":"root"}],"resources":{},"terminationMessagePath":"/dev/termination-log","imagePullPolicy":"Always","securityContext":{"capabilities":{},"privileged":false}}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","securityContext":{}}}},"status":{"latestVersion":1,"details":{"causes":[{"type":"ConfigChange"}]}}}
replicas: 1
replicas: 1
[vagrant@localhost sample-app]$ oc scale dc/database --replicas 3; oc deploy database --latest
deploymentconfig "database" scaled
Started deployment #2
[vagrant@localhost sample-app]$ oc get rc -w
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
database-1 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-1,deploymentconfig=database,name=database 3 5m
database-2 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-2,deploymentconfig=database,name=database 0 4s
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
database-2 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-2,deploymentconfig=database,name=database 0 7s
database-1 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-1,deploymentconfig=database,name=database 0 5m
database-1 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-1,deploymentconfig=database,name=database 0 5m
database-1 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-1,deploymentconfig=database,name=database 0 5m
database-2 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-2,deploymentconfig=database,name=database 1 23s
database-2 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-2,deploymentconfig=database,name=database 1 23s
database-2 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-2,deploymentconfig=database,name=database 1 23s

—
Reply to this email directly or view it on GitHub
#5875 (comment).

ncdc · 2015-12-02T22:43:41Z

If you're rapid-fire wham-banging a scale followed by a deployment using an old client, I don't think there's much we can do to avoid the scenario from @Kargakis. I don't think it's a blocker to moving forward, and should be part of the education/documentation/release notes (tl;dr: upgrade your client).

openshift-bot · 2015-12-02T22:48:37Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/7620/)

0xmichalis · 2015-12-02T22:49:36Z

I was hoping this wouldn't work on master but it is :) :(

[vagrant@localhost sample-app]$ oc get dc/database -o yaml | grep replicas
  replicas: 1
[vagrant@localhost sample-app]$ oc get rc/database-1 -o yaml | grep replicas
      {"kind":"DeploymentConfig","apiVersion":"v1","metadata":{"name":"database","namespace":"test","selfLink":"/oapi/v1/namespaces/test/deploymentconfigs/database","uid":"54f329e4-9946-11e5-91b9-080027c5bfa9","resourceVersion":"309","creationTimestamp":"2015-12-02T22:45:05Z","labels":{"template":"application-template-stibuild"}},"spec":{"strategy":{"type":"Recreate","recreateParams":{"pre":{"failurePolicy":"Abort","execNewPod":{"command":["/bin/true"],"env":[{"name":"CUSTOM_VAR1","value":"custom_value1"}],"containerName":"ruby-helloworld-database"}},"post":{"failurePolicy":"Ignore","execNewPod":{"command":["/bin/false"],"env":[{"name":"CUSTOM_VAR2","value":"custom_value2"}],"containerName":"ruby-helloworld-database"}}},"resources":{}},"triggers":[{"type":"ConfigChange"}],"replicas":1,"selector":{"name":"database"},"template":{"metadata":{"creationTimestamp":null,"labels":{"name":"database"}},"spec":{"containers":[{"name":"ruby-helloworld-database","image":"openshift/mysql-55-centos7:latest","ports":[{"containerPort":3306,"protocol":"TCP"}],"env":[{"name":"MYSQL_USER","value":"user4Y3"},{"name":"MYSQL_PASSWORD","value":"mROu8mNO"},{"name":"MYSQL_DATABASE","value":"root"}],"resources":{},"terminationMessagePath":"/dev/termination-log","imagePullPolicy":"Always","securityContext":{"capabilities":{},"privileged":false}}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","securityContext":{}}}},"status":{"latestVersion":1,"details":{"causes":[{"type":"ConfigChange"}]}}}
  replicas: 1
  replicas: 1
[vagrant@localhost sample-app]$ oc scale dc/database --replicas 3; oc deploy database --latest
deploymentconfig "database" scaled
Started deployment #2
[vagrant@localhost sample-app]$ oc get rc -w
CONTROLLER   CONTAINER(S)               IMAGE(S)                            SELECTOR                                                        REPLICAS   AGE
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   3          2m
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   0          5s
CONTROLLER   CONTAINER(S)               IMAGE(S)                            SELECTOR                                                        REPLICAS   AGE
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   0          30s
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         3m
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         3m
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         3m
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         3m
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   3         48s
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   3         48s
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   3         48s
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   3         48s

ncdc · 2015-12-02T23:01:55Z

@Kargakis why the frown?

0xmichalis · 2015-12-02T23:04:50Z

Because we wouldn't really care if we didn't change any behavior i guess

ncdc · 2015-12-02T23:06:29Z

@Kargakis I'm confused... doesn't your most recent example (from master) show what we want? A scale to 3 immediately followed by a deployment results in 3 replicas. Am I missing something?

0xmichalis · 2015-12-02T23:08:29Z

I'm frowning because the same example using an old client and the server changes in this PR doesn't work... at the end of the day

ironcladlou · 2015-12-02T23:51:07Z

Since it seems like a pretty rare scenario, I'm certainly fine documenting
it as a known issue and moving forward.

On Wed, Dec 2, 2015 at 5:43 PM, Andy Goldstein notifications@github.com
wrote:

If you're rapid-fire wham-banging a scale followed by a deployment using
an old client, I don't think there's much we can do to avoid the scenario
from @Kargakis https://github.com/kargakis. I don't think it's a
blocker to moving forward, and should be part of the
education/documentation/release notes (tl;dr: upgrade your client).

—
Reply to this email directly or view it on GitHub
#5875 (comment).

ironcladlou · 2015-12-03T14:44:28Z

So the user impact in the case of new server/old client is your quick scale/deployment isn't effective, and you have to run scale once more after the deployment finishes. Accurate? Still seems like a pretty rare event. If you're manually scaling and observe the bug, scale once more.

I'd be more concerned if we had this sort of issue with the auto scaler, but that'll be fine.

0xmichalis · 2015-12-03T15:26:29Z

I am fine with moving on with this. The design of the controller is more robust than before and I don't perceive the issue I reported as a blocker, supposing we have docs mentioning it.

ncdc · 2015-12-03T15:48:32Z

Do we need any additional reviews? @smarterclayton @liggitt @deads2k @DirectXMan12 ?

deads2k · 2015-12-03T16:04:55Z

pkg/deploy/api/helpers.go

+		ObjectMeta: kapi.ObjectMeta{
+			Name:              dc.Name,
+			Namespace:         dc.Namespace,
+			CreationTimestamp: dc.CreationTimestamp,


This timestamp looks unusual. Why would we be forcing a CreationTimestamp on an object? Doesn't that happen server-side in FillObjectMetaSystemFields?

Since Scale objects don't have the normal object lifecycle (they're never persisted, they don't use a strategy, etc), I don't think so.

Actually, who uses this field at all? And if we're going to have it, why is it useful to be the dc.CreationTimestamp?

And if we're going to have it, why is it useful to be the dc.CreationTimestamp?

Because this is a dc subresource and it's supposed to carry information about the main resource. Probably. Already upstream: https://github.com/kubernetes/kubernetes/blob/8c182c2713ea6e1b8ffff1da11e0d802cacd0bd8/pkg/apis/extensions/helpers.go#L75

DirectXMan12 · 2015-12-03T16:12:57Z

I'll take one last look-through, but it should be all set.

deads2k · 2015-12-03T16:21:28Z

pkg/deploy/controller/deployerpod/controller.go

@@ -123,68 +114,6 @@ func (c *DeployerPodController) Handle(pod *kapi.Pod) error {
 	return nil
 }

-func (c *DeployerPodController) cleanupFailedDeployment(deployment *kapi.ReplicationController) error {


This is probably obvious if you know the controller better, but let's say I have a DC that has a PodSpec that grabs a NodePort. The post-deploy hook fails. Does this leave me with an running Pod that is claiming that NodePort instead of cleaning it up?

If the deployment fails, the RC is scaled down to 0 eventually no matter what, so there won't be any lingering pods. Make sense?

ironcladlou · 2015-12-03T17:55:47Z

I think for anybody not very familiar with the code, validating the controller's test cases might be the most productive way to review. Behavioral issues in terms of scenarios are the primary concern at this point.

ironcladlou · 2015-12-04T14:37:57Z

@Kargakis please tag this if you agree it's ready.

0xmichalis · 2015-12-05T20:43:30Z

LGTM

ironcladlou · 2015-12-07T14:31:45Z

[merge]

openshift-bot · 2015-12-07T14:35:16Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4261/) (Image: devenv-rhel7_2896)

openshift-bot · 2015-12-07T14:35:17Z

Evaluated for origin merge up to d970f0a

Merged by openshift-bot

smarterclayton · 2016-02-01T03:34:04Z

BTW the tests you added here are phenomenally clear. Excellent job.

ironcladlou mentioned this pull request Nov 12, 2015

Extended tests for HPA with DeploymentConfigs #5879

Closed

DirectXMan12 reviewed Nov 12, 2015
View reviewed changes

0xmichalis reviewed Nov 13, 2015
View reviewed changes

ironcladlou changed the title ~~WIP: Improve deployment scaling behavior~~ Improve deployment scaling behavior Nov 13, 2015

ironcladlou force-pushed the dc-scaling branch from 3a18a3c to 8e46900 Compare November 13, 2015 20:19

0xmichalis mentioned this pull request Nov 16, 2015

API field for keeping track of all replicas for a dc #5918

Closed

ironcladlou force-pushed the dc-scaling branch from 8e46900 to 224d98d Compare November 16, 2015 17:40

0xmichalis reviewed Nov 17, 2015
View reviewed changes

ironcladlou force-pushed the dc-scaling branch from c3be907 to bb2c6ff Compare November 17, 2015 13:59

deads2k reviewed Dec 3, 2015
View reviewed changes

openshift-bot pushed a commit that referenced this pull request Dec 7, 2015

Merge pull request #5875 from ironcladlou/dc-scaling

214199f

Merged by openshift-bot

openshift-bot merged commit 214199f into openshift:master Dec 7, 2015

ironcladlou mentioned this pull request Dec 9, 2015

oc scale against a dc is fragile in the presence of races #4327

Closed

Improve deployment scaling behavior #5875

Improve deployment scaling behavior #5875

Conversation

ironcladlou commented Nov 12, 2015

TODO

ironcladlou commented Nov 12, 2015

ironcladlou commented Nov 12, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DirectXMan12 commented Nov 12, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0xmichalis commented Nov 13, 2015

0xmichalis commented Nov 13, 2015

ironcladlou commented Nov 13, 2015

0xmichalis commented Nov 16, 2015

0xmichalis commented Nov 16, 2015

0xmichalis commented Nov 16, 2015

ironcladlou commented Nov 16, 2015

ironcladlou commented Nov 16, 2015

0xmichalis commented Nov 16, 2015

ironcladlou commented Nov 16, 2015

0xmichalis commented Nov 16, 2015

0xmichalis commented Nov 17, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0xmichalis commented Nov 17, 2015

ironcladlou commented Nov 17, 2015

ironcladlou commented Nov 17, 2015

ironcladlou commented Nov 17, 2015

ironcladlou commented Nov 17, 2015

ironcladlou commented Nov 17, 2015

openshift-bot commented Dec 2, 2015

0xmichalis commented Dec 2, 2015

0xmichalis commented Dec 2, 2015

0xmichalis commented Dec 2, 2015

ironcladlou commented Dec 2, 2015

Scaling a running deployment with an older client doesn't work but I think this is fine since it didn't work previously either (the deployer process stomps the latest rc replicas).

ncdc commented Dec 2, 2015

openshift-bot commented Dec 2, 2015

0xmichalis commented Dec 2, 2015

ncdc commented Dec 2, 2015

0xmichalis commented Dec 2, 2015

ncdc commented Dec 2, 2015

0xmichalis commented Dec 2, 2015

ironcladlou commented Dec 2, 2015

ironcladlou commented Dec 3, 2015

0xmichalis commented Dec 3, 2015

ncdc commented Dec 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DirectXMan12 commented Dec 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ironcladlou commented Dec 3, 2015

ironcladlou commented Dec 4, 2015

0xmichalis commented Dec 5, 2015

ironcladlou commented Dec 7, 2015

openshift-bot commented Dec 7, 2015

openshift-bot commented Dec 7, 2015

smarterclayton commented Feb 1, 2016

Scaling a running deployment with an older client doesn't work but I
think this is fine since it didn't work previously either (the deployer
process stomps the latest rc replicas).