Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use two replicas for Kubeapps-apis service in CI #3867

Merged
merged 3 commits into from
Dec 1, 2021
Merged

Conversation

absoludity
Copy link
Contributor

@absoludity absoludity commented Dec 1, 2021

Signed-off-by: Michael Nelson minelson@vmware.com

Description of the change

Updates the CI deployment to use two replicas for the kubeapps-apis service.

We've been experiencing intermittent issues with the last e2e test, 08-rollback.js, lately that appear to be getting more frequent.

I first added the same retries as Antonio had done in #3866 so that I could continue debugging, and found that the first screenshot (first failure) shows an error:

08-rollback-0

while the second and third retries are just reloading the same page (though note that the update to the replicas is lost, so it shows 1 again) without error:

08-rollback-1

So an error is occurring, but then the retry is simply re-loading the page (without re-clicking on the deploy), which works fine, but is a different state to that expected.

I then retried the e2e test with SSH enabled 3 times, and of course, it passed three times straight (perhaps they allocate more memory when ssh is enabled, not sure as I didn't check).

But I can reproduce the exact same error locally by getting a package ready to deploy, then scaling the apis server to 0, then deploying (same screenshot as first error above).

So I'm confident that the issue is a 502 at that point, perhaps due to OOM or similar.

I started investigating how we could work around this in the CI test by retrying with the deploy click included, or handling a 502, but realised that we're working around a problem which kubernetes already solves by avoiding a single point-of-failure. The only reason we're hitting this is because we scale all deployments to a single replica in CI to keep resources down, whereas a valid solution here is to not run with a SPOF for the kubeapps apis service and let k8s route the requests.

Of course, this doesn't mean we shouldn't also continue to learn more about the memory usage of the service, but it does provide a more realistic e2e test and solve the current CI intermittent issue, IMO.

Benefits

More stable CI.

Possible drawbacks

Extra resource required in CI (not a big deal)

Applicable issues

  • fixes #

Additional information

Signed-off-by: Michael Nelson <minelson@vmware.com>
Signed-off-by: Michael Nelson <minelson@vmware.com>
@absoludity absoludity changed the title Add retries to get screenshots Use two replicas for Kubeapps-apis service in CI Dec 1, 2021
Signed-off-by: Michael Nelson <minelson@vmware.com>
@absoludity absoludity marked this pull request as ready for review December 1, 2021 03:35
@absoludity
Copy link
Contributor Author

Passed 2 from 3, with one failure in between which was Execution context was destroyed, most likely because of a navigation..

@@ -189,7 +189,7 @@ installOrUpgradeKubeapps() {
--set frontend.replicaCount=1
--set kubeops.replicaCount=1
--set dashboard.replicaCount=1
--set kubeappsapis.replicaCount=1
--set kubeappsapis.replicaCount=2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my... I didn't remember that scale down. I wondered how can it be possible if we already set two replicas and lowered the memlimits to 256!
Thanks for the fix!
(I wonder if the Bitnami people are also manually setting the num of replicas...)

@absoludity absoludity merged commit 85a484f into master Dec 1, 2021
@absoludity absoludity deleted the ci-fail-rollback branch December 1, 2021 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants