Consider instance upgraded when health check passes for it #3294

alena1108 · 2016-01-15T17:58:50Z

During the in service upgrade, when we chose startFirst=true, old instance gets destroyed right after the upgraded one comes up. Today we consider instance to be UP when it goes to Running state. For some applications Running state is not an indication of "up", health check should really be this indication.

The least we should do - make this option configurable.

@ibuildthecloud @vincent99

etlweather · 2016-01-15T23:46:23Z

Just would like to clarify, in my forum post, I mention that the state - in the UI - is "Initializing", not "Running", because it does not have a successful health check yet.

alena1108 · 2016-01-15T23:47:43Z

@etlweather state in the UI is combination of the instance state and the health state. So instance is in Running state and healthcheck = initializing is represented as Initializing in the UI

etlweather · 2016-01-15T23:48:20Z

Ah, got it.

vincent99 · 2016-01-16T00:08:35Z

It seems like this should be the default, if not only, behavior for a service with a healthCheck.

alena1108 · 2016-06-14T17:38:48Z

@sangeethah @soumyalj it works the same way for startFirst=true and startFirst=false

soumyalj · 2016-06-15T00:53:15Z

@alena1108 : Tested with 2 containers and healthcheck enabled, batch size 1 in the master. After an upgrade, the second container starts within 25s after the first one starts. It is not supposed to start until the first one is in healthy state

mysql> select id, name, created, state, health_state from instance where name like "%testupgrade%";
+-----+--------------------------+---------------------+---------+--------------+
| id  | name                     | created             | state   | health_state |
+-----+--------------------------+---------------------+---------+--------------+
| 222 | Default_testupgradebug_1 | 2016-06-15 00:34:43 | stopped | initializing |
| 223 | Default_testupgradebug_2 | 2016-06-15 00:34:43 | stopped | initializing |
| 224 | Default_testupgradebug_1 | 2016-06-15 00:37:10 | running | initializing |
| 225 | Default_testupgradebug_2 | 2016-06-15 00:37:32 | running | initializing |
+-----+--------------------------+---------------------+---------+--------------+

alena1108 · 2016-06-15T17:00:27Z

@soumyalj with the fix I've applied, even the first batch won't be upgraded till all instances are healthy. So steps to test will be:

Create a service with a valid health check. Wait till all instance are healthy.
Upgrade to the config having non-valid health check (for non-existing page).
Make sure upgrade got stuck after the first upgraded instance got stuck in initializing state. Go the instance, make it healthy by creating a page. verify that the upgrade was performed for the second instance now

soumyalj · 2016-06-16T21:41:06Z

Followed the above steps and verified in v1.1.0-dev5-rc2.

mariusstaicu · 2016-07-14T09:24:00Z

Upgraded to rancher 1.1.0 and the behaviour is the same.

I have one running instance for a service. I perform in service upgrade and the old instance gets stopped right after new instance enters in initializing state, not after it becomes green. Is there any setting that needs to be done for this to work ?

tcdev0 · 2016-08-03T18:10:54Z

@wstudios2009 same for me with rancher 1.1.0.

deniseschannon · 2016-08-03T21:47:43Z

@wstudios2009 @tcdev0 Are you using "Start before stop" or not for upgrading?

The specific change that was made was that if you had 3 containers and performed an upgrade.

Previously, we'd end up stopping all 3 old containers while all 3 new containers were stuck in initializing. Therefore the service would be completely down if your service was stuck in initializing/unhealthy.

With the fix for this issue, we only stop 1 container and wait until the new container is active before moving on to remove the next old container and starting the next new container.

mariusstaicu · 2016-08-04T11:08:49Z

@deniseschannon I use start_first: true in rancher-compose.yml.
Also, when upgrading, I have only one running instance (one running container). And this container waits until the new one is in "Initializing" state before shutting down.
I expected that the old container waits until the new one is in "Running" (not "Initializing") state and then, and only then it shuts down.

What happens can be seen in the picture below.

deniseschannon · 2016-09-09T21:00:31Z

@wstudios2009 Do you have a health check on the services?

Can you share your docker-compose.yml for the service? The old one and the new compose?

mariusstaicu · 2016-09-14T07:56:24Z

Yes, the health check is a simple HTTP GET check and is also working (state is 'Initializing' until my app is started), here's my rancher-compose.yml:

db:
  scale: 1
display:
  scale: 1
  upgrade_strategy:
      start_first: true
  health_check:
      port: 8080
      interval: 2000
      initializing_timeout: 120000
      unhealthy_threshold: 3
      request_line: GET "/device" "HTTP/1.0"
      healthy_threshold: 2
      response_timeout: 2000
web:
  scale: 1
  upgrade_strategy:
    start_first: true
  health_check:
    port: 8080
    interval: 2000
    initializing_timeout: 120000
    unhealthy_threshold: 3
    request_line: GET "/" "HTTP/1.0"
    healthy_threshold: 2
    response_timeout: 2000

Here is my docker-compose file (with some passwords and urls removed):

db:
  environment:
    POSTGRES_DB: db
    POSTGRES_PASSWORD: pass
    POSTGRES_USER: user
  labels:
    io.rancher.scheduler.affinity:host_label: location=esol
  image: postgres:9.5
  volumes:
  - myapp-int-db:/var/lib/postgresql/data
display:
  environment:
    GIT_SHA: e3eea3215301c31eb2d240c78d44347f0c9b81e7
    JAVA_OPTS: -Xms200m -Xmx380m -XX:MaxMetaspaceSize=80m
    RELEASE: '11'
  labels:
    io.rancher.container.pull_image: always
    io.rancher.container.hostname_override: container_name
  image: <private repo image path here>:11
web:
  environment:
    JAVA_OPTS: -Xms200m -Xmx380m -XX:MaxMetaspaceSize=80m
    RELEASE: '49'
  labels:
    io.rancher.container.pull_image: always
    io.rancher.container.hostname_override: container_name
  image: <private repo image path here>:49
  links:
  - db:db
  volumes:
  - myapp-int:/.home

The only thing that changes between deployments in docker-compose.yml are the images versions and some non-related env variables (GIT_SHA, RELEASE etc.).

patrickkeller · 2018-02-23T13:31:19Z

This issue still exists in rancher 1.6.14. The old container gets stopped even when the new one is still in "Initializing" state.

superseb · 2018-02-23T13:34:36Z

@patrickkeller #11487

alena1108 added the kind/enhancement Issues that improve or augment existing functionality label Jan 15, 2016

alena1108 self-assigned this Jan 15, 2016

alena1108 added this to the Release 1.0 milestone Jan 15, 2016

deniseschannon added the release/v1.0.0 label Mar 2, 2016

alena1108 added release/future and removed release/v1.0.0 labels Mar 10, 2016

alena1108 removed this from the Release 1.0 milestone Mar 10, 2016

will-chan added this to the Release 1.1.0 milestone Jun 9, 2016

will-chan removed the release/future label Jun 9, 2016

aemneina added the internal label Jun 9, 2016

alena1108 mentioned this issue Jun 10, 2016

Don't upgrade next batch till service is not only active, but healthy rancher/cattle#1736

Merged

alena1108 added status/resolved labels Jun 10, 2016

alena1108 assigned sangeethah and unassigned alena1108 Jun 13, 2016

sangeethah assigned soumyalj and unassigned sangeethah Jun 13, 2016

soumyalj removed status/resolved labels Jun 15, 2016

soumyalj removed their assignment Jun 15, 2016

alena1108 self-assigned this Jun 15, 2016

alena1108 mentioned this issue Jun 15, 2016

inserviceupgrade: call reconcile before upgrading new batch rancher/cattle#1759

Merged

alena1108 added the status/resolved label Jun 15, 2016

deniseschannon assigned soumyalj Jun 16, 2016

deniseschannon added the status/to-test label Jun 16, 2016

sangeethah unassigned alena1108 Jun 16, 2016

soumyalj closed this as completed Jun 16, 2016

etlweather mentioned this issue Aug 27, 2016

Wait for container to be fully started (healthy) before stopping during the upgrade #5826

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider instance upgraded when health check passes for it #3294

Consider instance upgraded when health check passes for it #3294

alena1108 commented Jan 15, 2016

etlweather commented Jan 15, 2016

alena1108 commented Jan 15, 2016

etlweather commented Jan 15, 2016

vincent99 commented Jan 16, 2016

alena1108 commented Jun 14, 2016

soumyalj commented Jun 15, 2016

alena1108 commented Jun 15, 2016

soumyalj commented Jun 16, 2016

mariusstaicu commented Jul 14, 2016

tcdev0 commented Aug 3, 2016

deniseschannon commented Aug 3, 2016

mariusstaicu commented Aug 4, 2016 •

edited

deniseschannon commented Sep 9, 2016

mariusstaicu commented Sep 14, 2016

patrickkeller commented Feb 23, 2018

superseb commented Feb 23, 2018

Consider instance upgraded when health check passes for it #3294

Consider instance upgraded when health check passes for it #3294

Comments

alena1108 commented Jan 15, 2016

etlweather commented Jan 15, 2016

alena1108 commented Jan 15, 2016

etlweather commented Jan 15, 2016

vincent99 commented Jan 16, 2016

alena1108 commented Jun 14, 2016

soumyalj commented Jun 15, 2016

alena1108 commented Jun 15, 2016

soumyalj commented Jun 16, 2016

mariusstaicu commented Jul 14, 2016

tcdev0 commented Aug 3, 2016

deniseschannon commented Aug 3, 2016

mariusstaicu commented Aug 4, 2016 • edited

deniseschannon commented Sep 9, 2016

mariusstaicu commented Sep 14, 2016

patrickkeller commented Feb 23, 2018

superseb commented Feb 23, 2018

mariusstaicu commented Aug 4, 2016 •

edited