Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky chart tests #1820

Merged
merged 4 commits into from
Jun 25, 2020
Merged

Fix flaky chart tests #1820

merged 4 commits into from
Jun 25, 2020

Conversation

andresmgot
Copy link
Contributor

Description of the change

After adding the upgrade test, the scenario in which that test is run (helm3+pg) fails many times. After a bit of investigation I noticed that since we are running the install plus the upgrade in a short time, the upgrade hook may finish after the repos are populated. This causes that the catalog doesn't include the chart required for the tests.

This PR, adds:

  • A retry for the assetsvc test so if the populate job are still running, we retry a few times.
  • Use bitnami charts rather than stable (which is being deprecated).
  • Ensure that the sync job is run and finishes after the upgrade.

Copy link
Contributor

@absoludity absoludity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs an increment in the loop, and other comments, but +1

- -c
- curl -o /tmp/output $ASSETSVC_HOST:$ASSETSVC_PORT/v1/ns/{{ .Release.Namespace }}/charts && cat /tmp/output && cat /tmp/output | grep wordpress
- |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always worth a comment so in 3 months we know why this was added (and whether it can be removed etc.)

if curl -o /tmp/output $ASSETSVC_HOST:$ASSETSVC_PORT/v1/ns/{{ .Release.Namespace }}/charts && cat /tmp/output && cat /tmp/output | grep wordpress; then
break
fi
sleep 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing an increment ((n+=1)) here, so this would just keep going until timing out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, yes

@@ -288,8 +288,12 @@ else
fi

# Wait for Kubeapps Jobs
k8s_wait_for_job_completed kubeapps apprepositories.kubeapps.com/repo-name=stable
info "Job apprepositories.kubeapps.com/repo-name=stable ready"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said that

After a bit of investigation I noticed that since we are running the install plus the upgrade in a short time, the upgrade hook may finish after the repos are populated. This causes that the catalog doesn't include the chart required for the tests.

Rather than making CI do something different to what people do in real life, could we not instead:

  1. Install (and upgrade) with only the bitnami repository
  2. Update the above wait which you've deleted to wait for the bitnami repository sync to be completed

Wouldn't that effectively do what you've done below, ensuring that the upgrade hook cannot begin until the (single) repo is initially populated? Then, the upgrade happens, which may re-populate, but we'll still wait for the job to complete. Not sure if I'm missing something.

+1'ing but see what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that effectively do what you've done below, ensuring that the upgrade hook cannot begin until the (single) repo is initially populated? Then, the upgrade happens, which may re-populate, but we'll still wait for the job to complete. Not sure if I'm missing something.

Nope, the wait just ensures that any sync job for the given repo is finished, that's why I am removing all the jobs beforehand. Also, the problem is that the flow is the following:

  1. Application install, a sync job is started.
  2. Application upgrade, a second sync job and the upgrade hook are started.
  3. The two sync jobs finish.
  4. The upgrade hook finishes, deleting the database. This is sometimes slower than the sync jobs because (I assume) it needs to pull the images.

That way, no matter how much we wait, the repo won't be populated, that's why I am refreshing the repo again.

@andresmgot andresmgot merged commit 9e03645 into master Jun 25, 2020
@andresmgot andresmgot deleted the retryAssetsvc branch September 8, 2020 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants