Upgrading instance through webapp fails for the first time. #8502

1yuv · 2023-08-30T16:27:19Z

Describe the bug
When upgrade is staged and installed from webapp interface, first installation fails with following message:

Error triggering udpate

To Reproduce
Steps to reproduce the behavior:

Have 4.3.0 instance
From webapp, upgrades, select to upgrade instance to 4.3.1.
Once staging is completed, hit Install button.
Error is thrown.

Expected behavior
Installation should be successful in first attempt.

Screen recording

Screen.Recording.2023-08-30.at.10.02.14.PM.mov

The text was updated successfully, but these errors were encountered:

henokgetachew · 2023-08-30T16:29:20Z

Does this mean second attempt works or is it just broken?

1yuv · 2023-08-30T16:33:51Z

Does this mean second attempt works or is it just broken?

Second attempt just works fine without error.

henokgetachew · 2023-08-30T16:35:37Z

It's weird that second attempt works but not the first one. I will have a look at this.

mrjones-plip · 2023-08-30T17:55:52Z

cc @nydr and @garethbowen - I'm seeing a lot of HTML being returned when JSON is expected in the video above. Suspect that #8179 will help a lot here, but not sure if addresses the root cause.

garethbowen · 2023-08-30T18:40:00Z

There's also a curious http2 error which doesn't make sense to me. The http2 change doesn't exist in the tag so I'm assuming it's something spurious. We need to dig deeper into the actual logs to see what happened.

dianabarsan · 2023-09-05T14:24:48Z

Hah, I think there's a bit of a race condition here where, for a brief moment, some of the old containers are still running while others are down and unexpected errors happen.
When an upgrade is triggered, we just do a docker-compose up on updated docker-compose files. This means that docker will recreate the containers concurrently, and there are no rules as to which containers come up or down first. This means that there can be a short moment where the old API is up, while CouchDb is down and throwing that json error (or some other combination).

I think that the incoming change that updates how haproxy and nginx respond when services are down will maybe fix this.

1yuv · 2023-09-11T08:17:40Z

incoming change that updates how haproxy and nginx respond when services are down will maybe fix this.

Hi @dianabarsan , is there a PR or issue for this? Can you link that here ?

dianabarsan · 2023-09-11T08:19:04Z

@yrimal

#8179

garethbowen · 2023-10-10T16:28:45Z

Confirming that this still happens with 4.4.0 -> 4.4.1 so it's not fixed by the issue Diana cited, though if I'm patient it does just work eventually. I think the yellow warning is shown unnecessarily and the upgrade trigger happens correctly.

dianabarsan · 2023-10-11T07:42:48Z

The warning is displayed "optimistically" after 1 minute - because we believed that 1 minute is enough time for containers to stop and restart. This is clearly not true.
We can increase this interval.

dianabarsan · 2023-10-11T07:48:30Z

A nice solution would be to check the health of the containers somehow, and validate we are indeed on the right versions, but right now I don't have a solution for this, especially since deployments are either in docker or k8s, we'd need to add an endpoint to some external service that API can reach.

garethbowen · 2023-10-11T18:36:52Z

@dianabarsan I see the warning immediately, not after 1 minute, just like in @1yuv 's video. It's displayed behind the dialog but if you close the dialog it's there.

mrjones-plip · 2023-10-11T18:54:01Z

A nice solution would be to check the health of the containers somehow, and validate we are indeed on the right versions, but right now I don't have a solution for this, especially since deployments are either in docker or k8s, we'd need to add an endpoint to some external service that API can reach.

For docker compose deployment, we do have access to the docker Unix socket, so we could tell container state as well as new version image download progress. But yeah, as you already said, - it'd only be for docker compose and not k*s. Since we're trying to migrate away from docker compose hosting - likely not worth pursuing.

dianabarsan · 2023-10-12T06:23:15Z

Yea, after further testing I think I know what is happening:

we update docker-compose files
we run docker-compose pull
we run docker-compose up
this makes docker download and restart containers in parallel
haproxy (or other db container) is updated (downloaded and restarted) before API. API restarts because API rage quits when it doesn't have a DB. API believes momentarily that the upgrade has gone wrong somehow. Upgrade continues and old API is eventually killed, new API comes up and everything is fine.

The warning is displayed because previous version of API goes down and up because another container (likely haproxy or healthcheck) is updated.
I'm experimenting with not killing API when the database goes down.

- increases interval before warning the user that the upgrade might have issues - handles case where nginx sends a 502 - prevents upgrade interruption when state is completing #8502

1yuv added Type: Bug Fix something that isn't working as intended Affects: 4.3.0 labels Aug 30, 2023

tatilepizs added the Affects: 4.4.0 label Sep 22, 2023

garethbowen added Affects: 4.3.1 Affects: 4.3.2 Affects: 4.4.1 labels Oct 10, 2023

dianabarsan self-assigned this Oct 11, 2023

dianabarsan added this to the 4.5.0 milestone Oct 11, 2023

dianabarsan mentioned this issue Oct 11, 2023

fix(8502): Fix staging upgrade momentarily showing warning #8630

Merged

5 tasks

dianabarsan closed this as completed Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading instance through webapp fails for the first time. #8502

Upgrading instance through webapp fails for the first time. #8502

1yuv commented Aug 30, 2023

henokgetachew commented Aug 30, 2023

1yuv commented Aug 30, 2023

henokgetachew commented Aug 30, 2023

mrjones-plip commented Aug 30, 2023

garethbowen commented Aug 30, 2023

dianabarsan commented Sep 5, 2023

1yuv commented Sep 11, 2023

dianabarsan commented Sep 11, 2023 •

edited

garethbowen commented Oct 10, 2023 •

edited

dianabarsan commented Oct 11, 2023

dianabarsan commented Oct 11, 2023

garethbowen commented Oct 11, 2023

mrjones-plip commented Oct 11, 2023

dianabarsan commented Oct 12, 2023 •

edited

Upgrading instance through webapp fails for the first time. #8502

Upgrading instance through webapp fails for the first time. #8502

Comments

1yuv commented Aug 30, 2023

henokgetachew commented Aug 30, 2023

1yuv commented Aug 30, 2023

henokgetachew commented Aug 30, 2023

mrjones-plip commented Aug 30, 2023

garethbowen commented Aug 30, 2023

dianabarsan commented Sep 5, 2023

1yuv commented Sep 11, 2023

dianabarsan commented Sep 11, 2023 • edited

garethbowen commented Oct 10, 2023 • edited

dianabarsan commented Oct 11, 2023

dianabarsan commented Oct 11, 2023

garethbowen commented Oct 11, 2023

mrjones-plip commented Oct 11, 2023

dianabarsan commented Oct 12, 2023 • edited

dianabarsan commented Sep 11, 2023 •

edited

garethbowen commented Oct 10, 2023 •

edited

dianabarsan commented Oct 12, 2023 •

edited