Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve deployment scaling behavior #5875

Merged
merged 1 commit into from
Dec 7, 2015

Conversation

ironcladlou
Copy link
Contributor

This change makes the DeploymentConfigController solely responsible for scaling RCs owned by a deployment config which aren't currently being manipulated by a deployer process. RCs owned by the deployment config will be reconciled so that the active deployment matches the config replica count and all other RCs are scaled to 0.

Rationale:

  • Prevents races with HPA, users performing manual scaling, the deployer process, and the deployer pod controller (which was also performing scaling operations.)
  • Aligns replica count handling with upstream deployments; today, the replica count for deployments follows the last successful deployment and requires users to scale RCs in order to adjust subsequent deployment replica counts. Now, changes to the config replica count will drive new deployments, and scale existing deployments.
  • Allows all scaling operations to target the deploymentConfig only rather than RCs.

This also adds a variety of new events to provide insight into what the controller is doing.

Fixes #5597

TODO

  • deployerPodController unit tests
  • update DC scaling subresource to scale the DC directly
  • update deployment registry to remove restriction on scaling during an in-progress deployment
  • e2e or extended tests to exercise scaling interactions
  • discussion of Improve deployment scaling behavior #5875 (comment)
  • implement compatibility logic for old clients which will still scale RCs directly

@ironcladlou
Copy link
Contributor Author

cc @Kargakis @DirectXMan12

This is going to require a lot of review and testing.

@ironcladlou
Copy link
Contributor Author

The way this is coded right now, replica counts for failed deployments won't be reconciled until the deployment config controller full resync interval, since there is no update to the deploymentConfig which would trigger a watch event. If that's a problem, we could do something like update an annotation or status of the deploymentConfig when a deployment fails (just some off the cuff examples).

// If the latest deployment already exists, reconcile replica counts.
if latestDeploymentExists {
// If the latest deployment exists and is still running, there's nothing
// to do.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change the target replicas annotation here so that the rolling deployer shoots for the updated version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rolling updater code (upstreamed) won't consider mid-flight changes to the target replica count. The complexity involved in making it do so at this point is probably more than we're willing to take on given our push to use upstream deployments (which does support mid-steam scale changes.)

@DirectXMan12
Copy link
Contributor

You should drop in a commit that makes the DC scale subresource use this, so we can test how well it works with the HPA


// Compute the desired replicas for the deployment. Use the last completed
// deployment's current replica count, or the config template if there is no
// prior completed deployment available.
desiredReplicas := config.Template.ControllerTemplate.Replicas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay!:)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like a big fat comment above explaining what has changed since this pretty big.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a ton of simplifying refactoring in here, PTAL. Holding off on unit tests until I do some more manual testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also ripped out the status update code since it'll be reintroduced in a refactored form with #5530

@0xmichalis
Copy link
Contributor

You should drop in a commit that makes the DC scale subresource use this, so we can test how well it works with the HPA

I was about to add a note about this. We need to update Update to scale the DC only. I think it will work like a charm. After this change, #5852 can get in, too.

@0xmichalis
Copy link
Contributor

ps. this definitely needs a lot of testing. Will start from today.

@ironcladlou
Copy link
Contributor Author

Completely redid the deploymentConfigController unit test.

@ironcladlou ironcladlou changed the title WIP: Improve deployment scaling behavior Improve deployment scaling behavior Nov 13, 2015
@0xmichalis
Copy link
Contributor

The way this is coded right now, replica counts for failed deployments won't be reconciled until the deployment config controller full resync interval, since there is no update to the deploymentConfig which would trigger a watch event. If that's a problem, we could do something like update an annotation or status of the deploymentConfig when a deployment fails (just some off the cuff examples).

If you trigger a fake config change then you would have the config controller be another entrypoint for scaling? I don't like it. I don't like that we have this interval for the dc controller, too:) What is stopping us from having a controller that will do both? Reconciling at intervals and on config changes too?

@0xmichalis
Copy link
Contributor

You should drop in a commit that makes the DC scale subresource use this, so we can test how well it works with the HPA

Any updates on this?

@0xmichalis
Copy link
Contributor

You should drop in a commit that makes the DC scale subresource use this, so we can test how well it works with the HPA

Any updates on this?

I am trying to implement this locally and based on this comment, I think we may want to add a new field in the deploymentConfig that will keep track of all the replicas that ever existed. 1) We would get closer to upstream deployments with this change and 2) we would compute this only in the controller where we already list all the deployments in every dc reconclilation (so no need for

func (r *ScaleREST) deploymentsForConfig(namespace, configName string) (*kapi.ReplicationControllerList, error) {
selector := util.ConfigSelector(configName)
return r.rcNamespacer.ReplicationControllers(namespace).List(selector, fields.Everything())
}
func (r *ScaleREST) replicasForDeploymentConfig(namespace, configName string) (int, error) {
rcList, err := r.deploymentsForConfig(namespace, configName)
if err != nil {
return 0, err
}
replicas := 0
for _, rc := range rcList.Items {
replicas += rc.Spec.Replicas
}
return replicas, nil
}
). Thoughts?

@ironcladlou
Copy link
Contributor Author

The way this is coded right now, replica counts for failed deployments won't be reconciled until the deployment config controller full resync interval, since there is no update to the deploymentConfig which would trigger a watch event. If that's a problem, we could do something like update an annotation or status of the deploymentConfig when a deployment fails (just some off the cuff examples).

If you trigger a fake config change then you would have the config controller be another entrypoint for scaling? I don't like it. I don't like that we have this interval for the dc controller, too:) What is stopping us from having a controller that will do both? Reconciling at intervals and on config changes too?

The config controller is still the scaling controller. What I mean is, due to our pod based deployment mechanism, there are times when we want to reconcile the DCs in response to non-DC updates. In this case, when the deployer pod reaches a terminal state, we should reconcile the owning DC. The question is how to trigger that reconcile other than waiting for the resync interval since it's not the DC that changed, but a related resource (the deployer pod).

@ironcladlou
Copy link
Contributor Author

You should drop in a commit that makes the DC scale subresource use this, so we can test how well it works with the HPA

Any updates on this?

I agree that the scale subresource should be updated along with this PR, will add it to my list of TODOs.

@0xmichalis
Copy link
Contributor

Ah, makes sense now. Agreed that a mocked config change would fit here.

On Mon, Nov 16, 2015 at 3:48 PM, Dan Mace notifications@github.com wrote:

The way this is coded right now, replica counts for failed deployments
won't be reconciled until the deployment config controller full resync
interval, since there is no update to the deploymentConfig which would
trigger a watch event. If that's a problem, we could do something like
update an annotation or status of the deploymentConfig when a deployment
fails (just some off the cuff examples).

If you trigger a fake config change then you would have the config
controller be another entrypoint for scaling? I don't like it. I don't like
that we have this interval for the dc controller, too:) What is stopping us
from having a controller that will do both? Reconciling at intervals and on
config changes too?

The config controller is still the scaling controller. What I mean is, due
to our pod based deployment mechanism, there are times when we want to
reconcile the DCs in response to non-DC updates. In this case, when the
deployer pod reaches a terminal state, we should reconcile the owning DC.
The question is how to trigger that reconcile other than waiting for the
resync interval since it's not the DC that changed, but a related resource
(the deployer pod).


Reply to this email directly or view it on GitHub
#5875 (comment).

@ironcladlou
Copy link
Contributor Author

I am trying to implement this locally and based on this comment, I think we may want to add a new field in the deploymentConfig that will keep track of all the replicas that ever existed. 1) We would get closer to upstream deployments with this change and 2) we would compute this only in the controller where we already list all the deployments in every dc reconclilation (so no need for

func (r *ScaleREST) deploymentsForConfig(namespace, configName string) (*kapi.ReplicationControllerList, error) {
selector := util.ConfigSelector(configName)
return r.rcNamespacer.ReplicationControllers(namespace).List(selector, fields.Everything())
}
func (r *ScaleREST) replicasForDeploymentConfig(namespace, configName string) (int, error) {
rcList, err := r.deploymentsForConfig(namespace, configName)
if err != nil {
return 0, err
}
replicas := 0
for _, rc := range rcList.Items {
replicas += rc.Spec.Replicas
}
return replicas, nil
}
). Thoughts?

Agree, but is there any reason that has to be bundled with this PR?

@0xmichalis
Copy link
Contributor

Agree, but is there any reason that has to be bundled with this PR?

No reason at all, I just wanted to discuss it. I can work on a separate PR and add it once this PR lands.

@0xmichalis
Copy link
Contributor

we could do something like update an annotation or status of the deploymentConfig when a deployment fails

I would be fine with marking the owning dc with an annotation while marking the deployment as failed.

config.Details = new(deployapi.DeploymentDetails)
// No deployments are running and the latest deployment doesn't exist, so
// create the new deployment.
deployment, err := deployutil.MakeDeployment(config, c.codec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This utility doesn't propagate the config namespace to the deployment resulting in errors when the controller is trying to create a new deployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks

@0xmichalis
Copy link
Contributor

I have been testing this locally including the fix I commented above and so far it works as expected. It takes the interval but older deployments are scaled down as expected. Still no stress testing though.

One question: since the hpa controller now will target the dc template, the only way to scale an older deployment is by manually scaling it, so somebody has to specifically run oc scale rc/deployment-older, right? An idea is to special-case oc scale rc/something-owned-by-a-dc so we can trigger a reconcilation on the spot or even better prevent scaling (older or even the latest too) deployments (ie rcs) directly. If we agree, this has to happen both in upstream deployments and here (special-case rcs owned by deployments). Thoughts?

@ironcladlou
Copy link
Contributor Author

@Kargakis

One question: since the hpa controller now will target the dc template, the only way to scale an older deployment is by manually scaling it, so somebody has to specifically run oc scale rc/deployment-older, right? An idea is to special-case oc scale rc/something-owned-by-a-dc so we can trigger a reconcilation on the spot or even better prevent scaling (older or even the latest too) deployments (ie rcs) directly. If we agree, this has to happen both in upstream deployments and here (special-case rcs owned by deployments). Thoughts?

If you tried to scale the old RC manually, the deployment controller would scale it down when reconciling. As a user trying to scale up an old deployment, do you really just want a rollback?

@ironcladlou
Copy link
Contributor Author

I would be fine with marking the owning dc with an annotation while marking the deployment as failed.

Can we think more about this and address it in a followup?

@ironcladlou
Copy link
Contributor Author

Should we flesh out better e2e and extended tests separately in #5879?

@ironcladlou
Copy link
Contributor Author

[test]

@ironcladlou
Copy link
Contributor Author

@openshift/ui-review - want to make sure that the console is using the scaling API endpoint for DCs rather than manipulating RCs directly.

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to d970f0a

@0xmichalis
Copy link
Contributor

Testing locally with a new client and so far it works fine. Will test with an older client too. And I still want to have one more look in unit tests.

@0xmichalis
Copy link
Contributor

latest flake: #6176

@0xmichalis
Copy link
Contributor

  • Scaling a running deployment with an older client doesn't work but I think this is fine since it didn't work previously either (the deployer process stomps the latest rc replicas).
  • Older client scale right just before kicking a new deployment doesn't work.
[vagrant@localhost sample-app]$ oc get rc
CONTROLLER   CONTAINER(S)               IMAGE(S)                            SELECTOR                                                        REPLICAS   AGE
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   1          1m
[vagrant@localhost sample-app]$ oc get dc/database -o yaml | grep replicas
  replicas: 1
[vagrant@localhost sample-app]$ oc get rc/database-1 -o yaml | grep replicas
    openshift.io/deployment.replicas: "1"
      {"kind":"DeploymentConfig","apiVersion":"v1","metadata":{"name":"database","namespace":"test","selfLink":"/oapi/v1/namespaces/test/deploymentconfigs/database","uid":"076fb8d8-9941-11e5-93e5-080027c5bfa9","resourceVersion":"2559","creationTimestamp":"2015-12-02T22:07:08Z","labels":{"template":"application-template-stibuild"}},"spec":{"strategy":{"type":"Recreate","recreateParams":{"pre":{"failurePolicy":"Abort","execNewPod":{"command":["/bin/true"],"env":[{"name":"CUSTOM_VAR1","value":"custom_value1"}],"containerName":"ruby-helloworld-database"}},"post":{"failurePolicy":"Ignore","execNewPod":{"command":["/bin/false"],"env":[{"name":"CUSTOM_VAR2","value":"custom_value2"}],"containerName":"ruby-helloworld-database"}}},"resources":{}},"triggers":[{"type":"ConfigChange"}],"replicas":1,"selector":{"name":"database"},"template":{"metadata":{"creationTimestamp":null,"labels":{"name":"database"}},"spec":{"containers":[{"name":"ruby-helloworld-database","image":"openshift/mysql-55-centos7:latest","ports":[{"containerPort":3306,"protocol":"TCP"}],"env":[{"name":"MYSQL_USER","value":"userAK4"},{"name":"MYSQL_PASSWORD","value":"wYRK11Xt"},{"name":"MYSQL_DATABASE","value":"root"}],"resources":{},"terminationMessagePath":"/dev/termination-log","imagePullPolicy":"Always","securityContext":{"capabilities":{},"privileged":false}}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","securityContext":{}}}},"status":{"latestVersion":1,"details":{"causes":[{"type":"ConfigChange"}]}}}
  replicas: 1
  replicas: 1
[vagrant@localhost sample-app]$ oc scale dc/database --replicas 3; oc deploy database --latest
deploymentconfig "database" scaled
Started deployment #2
[vagrant@localhost sample-app]$ oc get rc -w
CONTROLLER   CONTAINER(S)               IMAGE(S)                            SELECTOR                                                        REPLICAS   AGE
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   3          5m
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   0          4s
CONTROLLER   CONTAINER(S)               IMAGE(S)                            SELECTOR                                                        REPLICAS   AGE
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   0          7s
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         5m
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         5m
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         5m
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   1         23s
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   1         23s
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   1         23s

@ironcladlou
Copy link
Contributor Author

The desired replica count for the new deployment is now driven entirely by
the config replicas, which was still set to 1 (the scale operation set only
the RC replicas). The deployment started before the controller sync
reverse-updated the config to match the manual scaling value. The
controller doesn't reconcile unless the latest deployment exists and is
terminated (i.e. steady state).

Guess we'll have to think more about that. Whack-a-mole continues.

On Wed, Dec 2, 2015 at 5:18 PM, Michail Kargakis notifications@github.com
wrote:

Scaling a running deployment with an older client doesn't work but I
think this is fine since it didn't work previously either (the deployer
process stomps the latest rc replicas).

Older client scale right just before kicking a new deployment doesn't
work.

[vagrant@localhost sample-app]$ oc get rc
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
database-1 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-1,deploymentconfig=database,name=database 1 1m
[vagrant@localhost sample-app]$ oc get dc/database -o yaml | grep replicas
replicas: 1
[vagrant@localhost sample-app]$ oc get rc/database-1 -o yaml | grep replicas
openshift.io/deployment.replicas: "1"
{"kind":"DeploymentConfig","apiVersion":"v1","metadata":{"name":"database","namespace":"test","selfLink":"/oapi/v1/namespaces/test/deploymentconfigs/database","uid":"076fb8d8-9941-11e5-93e5-080027c5bfa9","resourceVersion":"2559","creationTimestamp":"2015-12-02T22:07:08Z","labels":{"template":"application-template-stibuild"}},"spec":{"strategy":{"type":"Recreate","recreateParams":{"pre":{"failurePolicy":"Abort","execNewPod":{"command":["/bin/true"],"env":[{"name":"CUSTOM_VAR1","value":"custom_value1"}],"containerName":"ruby-helloworld-database"}},"post":{"failurePolicy":"Ignore","execNewPod":{"command":["/bin/false"],"env":[{"name":"CUSTOM_VAR2","value":"custom_value2"}],"containerName":"ruby-helloworld-database"}}},"resources":{}},"triggers":[{"type":"ConfigChange"}],"replicas":1,"selector":{"name":"database"},"template":{"metadata":{"creationTimestamp":null,"labels":{"name":"database"}},"spec":{"containers":[{"name":"ruby-helloworld-database","image":"openshift/mysql-!
55-centos
7:latest","ports":[{"containerPort":3306,"protocol":"TCP"}],"env":[{"name":"MYSQL_USER","value":"userAK4"},{"name":"MYSQL_PASSWORD","value":"wYRK11Xt"},{"name":"MYSQL_DATABASE","value":"root"}],"resources":{},"terminationMessagePath":"/dev/termination-log","imagePullPolicy":"Always","securityContext":{"capabilities":{},"privileged":false}}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","securityContext":{}}}},"status":{"latestVersion":1,"details":{"causes":[{"type":"ConfigChange"}]}}}
replicas: 1
replicas: 1
[vagrant@localhost sample-app]$ oc scale dc/database --replicas 3; oc deploy database --latest
deploymentconfig "database" scaled
Started deployment #2
[vagrant@localhost sample-app]$ oc get rc -w
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
database-1 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-1,deploymentconfig=database,name=database 3 5m
database-2 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-2,deploymentconfig=database,name=database 0 4s
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
database-2 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-2,deploymentconfig=database,name=database 0 7s
database-1 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-1,deploymentconfig=database,name=database 0 5m
database-1 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-1,deploymentconfig=database,name=database 0 5m
database-1 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-1,deploymentconfig=database,name=database 0 5m
database-2 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-2,deploymentconfig=database,name=database 1 23s
database-2 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-2,deploymentconfig=database,name=database 1 23s
database-2 ruby-helloworld-database openshift/mysql-55-centos7:latest deployment=database-2,deploymentconfig=database,name=database 1 23s


Reply to this email directly or view it on GitHub
#5875 (comment).

@ncdc
Copy link
Contributor

ncdc commented Dec 2, 2015

If you're rapid-fire wham-banging a scale followed by a deployment using an old client, I don't think there's much we can do to avoid the scenario from @Kargakis. I don't think it's a blocker to moving forward, and should be part of the education/documentation/release notes (tl;dr: upgrade your client).

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/7620/)

@0xmichalis
Copy link
Contributor

I was hoping this wouldn't work on master but it is :) :(

[vagrant@localhost sample-app]$ oc get dc/database -o yaml | grep replicas
  replicas: 1
[vagrant@localhost sample-app]$ oc get rc/database-1 -o yaml | grep replicas
      {"kind":"DeploymentConfig","apiVersion":"v1","metadata":{"name":"database","namespace":"test","selfLink":"/oapi/v1/namespaces/test/deploymentconfigs/database","uid":"54f329e4-9946-11e5-91b9-080027c5bfa9","resourceVersion":"309","creationTimestamp":"2015-12-02T22:45:05Z","labels":{"template":"application-template-stibuild"}},"spec":{"strategy":{"type":"Recreate","recreateParams":{"pre":{"failurePolicy":"Abort","execNewPod":{"command":["/bin/true"],"env":[{"name":"CUSTOM_VAR1","value":"custom_value1"}],"containerName":"ruby-helloworld-database"}},"post":{"failurePolicy":"Ignore","execNewPod":{"command":["/bin/false"],"env":[{"name":"CUSTOM_VAR2","value":"custom_value2"}],"containerName":"ruby-helloworld-database"}}},"resources":{}},"triggers":[{"type":"ConfigChange"}],"replicas":1,"selector":{"name":"database"},"template":{"metadata":{"creationTimestamp":null,"labels":{"name":"database"}},"spec":{"containers":[{"name":"ruby-helloworld-database","image":"openshift/mysql-55-centos7:latest","ports":[{"containerPort":3306,"protocol":"TCP"}],"env":[{"name":"MYSQL_USER","value":"user4Y3"},{"name":"MYSQL_PASSWORD","value":"mROu8mNO"},{"name":"MYSQL_DATABASE","value":"root"}],"resources":{},"terminationMessagePath":"/dev/termination-log","imagePullPolicy":"Always","securityContext":{"capabilities":{},"privileged":false}}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","securityContext":{}}}},"status":{"latestVersion":1,"details":{"causes":[{"type":"ConfigChange"}]}}}
  replicas: 1
  replicas: 1
[vagrant@localhost sample-app]$ oc scale dc/database --replicas 3; oc deploy database --latest
deploymentconfig "database" scaled
Started deployment #2
[vagrant@localhost sample-app]$ oc get rc -w
CONTROLLER   CONTAINER(S)               IMAGE(S)                            SELECTOR                                                        REPLICAS   AGE
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   3          2m
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   0          5s
CONTROLLER   CONTAINER(S)               IMAGE(S)                            SELECTOR                                                        REPLICAS   AGE
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   0          30s
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         3m
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         3m
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         3m
database-1   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-1,deploymentconfig=database,name=database   0         3m
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   3         48s
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   3         48s
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   3         48s
database-2   ruby-helloworld-database   openshift/mysql-55-centos7:latest   deployment=database-2,deploymentconfig=database,name=database   3         48s

@ncdc
Copy link
Contributor

ncdc commented Dec 2, 2015

@Kargakis why the frown?

@0xmichalis
Copy link
Contributor

Because we wouldn't really care if we didn't change any behavior i guess

@ncdc
Copy link
Contributor

ncdc commented Dec 2, 2015

@Kargakis I'm confused... doesn't your most recent example (from master) show what we want? A scale to 3 immediately followed by a deployment results in 3 replicas. Am I missing something?

@0xmichalis
Copy link
Contributor

I'm frowning because the same example using an old client and the server changes in this PR doesn't work... at the end of the day

@ironcladlou
Copy link
Contributor Author

Since it seems like a pretty rare scenario, I'm certainly fine documenting
it as a known issue and moving forward.

On Wed, Dec 2, 2015 at 5:43 PM, Andy Goldstein notifications@github.com
wrote:

If you're rapid-fire wham-banging a scale followed by a deployment using
an old client, I don't think there's much we can do to avoid the scenario
from @Kargakis https://github.com/kargakis. I don't think it's a
blocker to moving forward, and should be part of the
education/documentation/release notes (tl;dr: upgrade your client).


Reply to this email directly or view it on GitHub
#5875 (comment).

@ironcladlou
Copy link
Contributor Author

So the user impact in the case of new server/old client is your quick scale/deployment isn't effective, and you have to run scale once more after the deployment finishes. Accurate? Still seems like a pretty rare event. If you're manually scaling and observe the bug, scale once more.

I'd be more concerned if we had this sort of issue with the auto scaler, but that'll be fine.

@0xmichalis
Copy link
Contributor

I am fine with moving on with this. The design of the controller is more robust than before and I don't perceive the issue I reported as a blocker, supposing we have docs mentioning it.

@ncdc
Copy link
Contributor

ncdc commented Dec 3, 2015

Do we need any additional reviews? @smarterclayton @liggitt @deads2k @DirectXMan12 ?

ObjectMeta: kapi.ObjectMeta{
Name: dc.Name,
Namespace: dc.Namespace,
CreationTimestamp: dc.CreationTimestamp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timestamp looks unusual. Why would we be forcing a CreationTimestamp on an object? Doesn't that happen server-side in FillObjectMetaSystemFields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Scale objects don't have the normal object lifecycle (they're never persisted, they don't use a strategy, etc), I don't think so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, who uses this field at all? And if we're going to have it, why is it useful to be the dc.CreationTimestamp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we're going to have it, why is it useful to be the dc.CreationTimestamp?

Because this is a dc subresource and it's supposed to carry information about the main resource. Probably. Already upstream: https://github.com/kubernetes/kubernetes/blob/8c182c2713ea6e1b8ffff1da11e0d802cacd0bd8/pkg/apis/extensions/helpers.go#L75

@DirectXMan12
Copy link
Contributor

I'll take one last look-through, but it should be all set.

@@ -123,68 +114,6 @@ func (c *DeployerPodController) Handle(pod *kapi.Pod) error {
return nil
}

func (c *DeployerPodController) cleanupFailedDeployment(deployment *kapi.ReplicationController) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably obvious if you know the controller better, but let's say I have a DC that has a PodSpec that grabs a NodePort. The post-deploy hook fails. Does this leave me with an running Pod that is claiming that NodePort instead of cleaning it up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the deployment fails, the RC is scaled down to 0 eventually no matter what, so there won't be any lingering pods. Make sense?

@ironcladlou
Copy link
Contributor Author

I think for anybody not very familiar with the code, validating the controller's test cases might be the most productive way to review. Behavioral issues in terms of scenarios are the primary concern at this point.

@ironcladlou
Copy link
Contributor Author

@Kargakis please tag this if you agree it's ready.

@0xmichalis
Copy link
Contributor

LGTM

@ironcladlou
Copy link
Contributor Author

[merge]

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4261/) (Image: devenv-rhel7_2896)

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to d970f0a

openshift-bot pushed a commit that referenced this pull request Dec 7, 2015
@openshift-bot openshift-bot merged commit 214199f into openshift:master Dec 7, 2015
@smarterclayton
Copy link
Contributor

BTW the tests you added here are phenomenally clear. Excellent job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants