E2e tests fixes and improvements #4243

castelblanque · 2022-02-07T16:43:04Z

Description of the change

This PR tries to fix problems with CI E2E tests in both local Kind and GKE environments.
It adds some logging of the parameters used in the e2e-test.sh script.
Different timeouts are set when running in Kind or GKE, as GKE turned out to be slower while developing. Why not having the same long timeout for both? This way we can have earlier feedback when running in Kind, which is what runs on every commit push.

One of the problems was in resources created by test 03 being unable to have images pulled due to a 401 Unauthorized error. Now the existing Docker credentials are used, but still the secret is created in the test.

More comments below.

Benefits

E2E tests should pass successfully now in both local Kind and GKE.

Possible drawbacks

No one knows what is the next surprise with E2E tests...

Applicable issues

Related to Investigate other e2e test engines #2898

* Trying new CI setup Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Fix bearer token login on E2E tests (#2898) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Added positive and negative permissions tests for regular user (#4081) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Re-add rolebinding for kubeapps-user in its namespace (#4081) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Fixing tests for GKE env (#2898) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Fix for E2E running in GKE (#2898) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Trying to fix E2E tests for GKE (#2898) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Improving E2E tests setup (#2898) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Fix for CI pipeline in GKE Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Fixes for E2E tests in GKE Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Trying E2E tests with lower cluster resources Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Removed temporary comments in CI config Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Temporary fix for e2e CI development Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Trying test without Docker credentials Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Use new integration image. Small fixes for E2E. Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * E2E Tests fix Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Fixing E2E tests Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Debugging CI Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Improving GKE CI setup Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Debugging E2E CI tests Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Adding missing timeout param for local E2E Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Tidy up after debugging E2E test Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

castelblanque · 2022-02-07T16:43:49Z

.circleci/config.yml

@@ -120,7 +120,7 @@ build_on_tag_or_prerelease: &build_on_tag_or_prerelease
    tags:
      only: /^v.*/
    branches:
-      only: prerelease
+      only: e2e-tests


Intentionally left until the PR is merged so that tests can be run in both local Kind and GKE.

Wouldn't that mean, notw that you've tested in both, that you'd want to revert this now? (before review and landing?)

That branch name will not be merged, of course. I left it intentionally while this PR is opened in case we need to push new stuff, that we can run the tests in GKE too. Didn't want to touch the actual prerelease branch.

castelblanque · 2022-02-07T16:44:33Z

.circleci/config.yml

@@ -241,6 +241,10 @@ common_envars: &common_envars
  POSTGRESQL_VERSION: << pipeline.parameters.POSTGRESQL_VERSION >>
  RUST_VERSION: << pipeline.parameters.RUST_VERSION >>

+common_gke_envars: &common_gke_envars


Added common environment vars for GKE in a single place. Before, USE_MULTICLUSTER_OIDC_ENV was repeated.

castelblanque · 2022-02-07T16:45:51Z

.circleci/config.yml

@@ -364,7 +368,6 @@ install_additional_cluster: &install_additional_cluster
        kubectl --context kind-kubeapps-ci-additional --kubeconfig ${HOME}/.kube/kind-config-kubeapps-ci-additional apply --kubeconfig=${HOME}/.kube/kind-config-kubeapps-ci-additional -f ./docs/user/manifests/kubeapps-local-dev-users-rbac.yaml &&
        kubectl --context kind-kubeapps-ci-additional --kubeconfig ${HOME}/.kube/kind-config-kubeapps-ci-additional apply --kubeconfig=${HOME}/.kube/kind-config-kubeapps-ci-additional -f ./docs/user/manifests/kubeapps-local-dev-namespace-discovery-rbac.yaml &&

-        kubectl --context kind-kubeapps-ci-additional --kubeconfig ${HOME}/.kube/kind-config-kubeapps-ci-additional delete rolebinding kubeapps-user -n  kubeapps-user-namespace &&


After adding new tests, it was needed this rolebinding created in the file kubeapps-local-dev-users-rbac.yaml.

castelblanque · 2022-02-07T16:49:24Z

.circleci/config.yml

+        # that only applies to Kind clusters
+        name: Apply customizations to GKE cluster
+        command: |
+          kubectl create namespace kubeapps-user-namespace


This applies only to GKE. Needed now that we are making use of new rolebindings (see script/e2e-test.sh).

castelblanque · 2022-02-07T16:50:45Z

integration/global-setup.js

+  console.log("Setting up Playwright tests");
+  console.log(`>> Global timeout: ${config.globalTimeout / timeDisplayFactor} mins`);
+  config.projects.forEach(project => {
+    console.log(
+      `>> Project ${project.name} test timeout: ${project.timeout / timeDisplayFactor} mins`,
+    );
+  });
+  console.log(`>> Deployments timeout: ${utils.getDeploymentTimeout() / timeDisplayFactor} mins`);


Adds some logging about different timeouts at the beginning of Playwright execution.

castelblanque · 2022-02-07T16:51:09Z

integration/manifests/executor.yaml

@@ -21,7 +21,7 @@ spec:
            - tail
            - -f
            - /dev/null
-          image: kubeapps/integration-tests-pw:v1.0.0
+          image: kubeapps/integration-tests-pw:v1.0.1


Manually pushed.

castelblanque · 2022-02-07T16:51:50Z

integration/tests/main/01-missing-permissions.spec.js

+    // Change to user's namespace using UI
+    await page.click(".kubeapps-dropdown .kubeapps-nav-link");
+    await page.selectOption('select[name="namespaces"]', "default");
+    await page.click('cds-button:has-text("Change Context")');


Due to different RBAC in GKE, test was running in a different namespace.

Ah - because token authentication is being used in GKE, it would have been defaulting to the namespace of the token - nice catch.

castelblanque · 2022-02-07T16:53:06Z

integration/tests/main/03-create-private-package-repository.spec.js

+  await expect(page.locator(".left-menu")).toContainText("Up to date", { timeout: deployTimeout });
+  await page.waitForTimeout(5000);
+  await page.click('a.nav-link:has-text("Applications")');
+  await page.waitForTimeout(3000); // Sometimes typing was too fast to get the result shown
+  await page.locator("input#search").fill(appName);
+  await page.waitForTimeout(3000);


Perhaps in the future we can fine-tune this timeouts, or even remove them completely.

Yeah - I worry about adding timeouts like this unless absolutely necessary. As above, it's not clear to me why we can't instead wait for the selector below to be available before clicking. There's 11 seconds of waiting in the last 5 lines.

I don't like those timeouts either, they came later in the process due to necessity. Without them, sequential execution of e.g.

await page.click('a.nav-link:has-text("Applications")'); await page.locator("input#search").fill(appName);

sometimes would not produce any search result (and the app was there for sure).
Same happened in other combination of instructions where, I would say, frontend was unable to handle the speed of the test execution.

castelblanque · 2022-02-07T16:54:02Z

integration/tests/main/08-rollback.spec.js


  // Trigger deploy
  await page.locator('cds-button:has-text("Deploy")').click();

  // Check deployment
  await page.waitForSelector("css=.application-status-pie-chart-number >> text=1", {
-    timeout: 60000,
+    timeout: utils.getDeploymentTimeout(),


Every waitForSelector for a deployment is unified now using the same timeout.

castelblanque · 2022-02-07T17:01:10Z

script/e2e-test.sh

+DEX_IP=${8:-"172.18.0.2"}
+ADDITIONAL_CLUSTER_IP=${9:-"172.18.0.3"}


Intentionally left these two as last params: not existing when running in GKE. Otherwise params indexes would not be correct.

castelblanque · 2022-02-07T17:01:36Z

script/e2e-test.sh

+    --set frontend.replicaCount=1
+    --set kubeops.replicaCount=1
+    --set dashboard.replicaCount=1


Running tests with 1 replica here showed to work fine.

Hmm, it might be worth checking the commit msg there? I remember some environment we found during release was returning 502s, but it may have been for the kubeappsapis service only, which you've left with 2 below.

Previously, only service with 2 replicas was kubeappsapis indeed.

castelblanque · 2022-02-07T17:02:32Z

script/e2e-test.sh

+kubectl create rolebinding kubeapps-view-user-apprepo-read -n kubeapps-user-namespace --clusterrole=kubeapps:kubeapps:apprepositories-read --serviceaccount kubeapps:kubeapps-view
+kubectl create rolebinding kubeapps-view-user -n kubeapps-user-namespace --clusterrole=edit --serviceaccount kubeapps:kubeapps-view


This has same effect as RBAC Yaml file used for OIDC.

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

castelblanque · 2022-02-07T17:04:27Z

integration/tests/utils/util-functions.js

  getRandomName: base => {
    const randomNumber = Math.floor(Math.random() * Math.floor(100000));
    return base + "-" + randomNumber;
  },

+  getDeploymentTimeout: () => {
+    // TODO(castelblanque) Fine tune this for cases of tests with two deployments or more
+    return (process.env.TEST_TIMEOUT ? parseInt(process.env.TEST_TIMEOUT) / 2 : 2) * 60 * 1000;


Waiting time for deployments to be ready defaults to half the timeout of each test. There are some tests in which we wait for two deployments, therefore we might run over the total time.

absoludity

Thanks Rafa - glad to see more consistency with the GKE tests.

This doesn't address the issue we mentioned on the standup yesterday of the screenshots not appearing though right? (as in, it's still worth my time to investigate that separately?)

absoludity · 2022-02-07T22:33:36Z

.circleci/config.yml

@@ -120,7 +120,7 @@ build_on_tag_or_prerelease: &build_on_tag_or_prerelease
    tags:
      only: /^v.*/
    branches:
-      only: prerelease
+      only: e2e-tests


Wouldn't that mean, notw that you've tested in both, that you'd want to revert this now? (before review and landing?)

absoludity · 2022-02-07T22:37:03Z

.circleci/config.yml

@@ -241,6 +241,10 @@ common_envars: &common_envars
  POSTGRESQL_VERSION: << pipeline.parameters.POSTGRESQL_VERSION >>
  RUST_VERSION: << pipeline.parameters.RUST_VERSION >>

+common_gke_envars: &common_gke_envars
+  USE_MULTICLUSTER_OIDC_ENV: "false"
+  TEST_TIMEOUT: 6 # Timeout minutes for each test


Given that most times are in milliseconds, can we rename this to TEST_TIMEOUT_MINUTES rather than relying on the comment here? Ah - I see, you're keeping it consistent with the existing CI_TIMEOUT. Up to you (whether you leave them as they are or update with the _MINUTES suffix, but yes, better not to do one and not the other).

absoludity · 2022-02-07T22:38:08Z

.circleci/config.yml

  GKE_REGULAR_VERSION_LATEST_RELEASE:
    <<: *gke_test
    environment:
      GKE_BRANCH: << pipeline.parameters.GKE_REGULAR_VERSION >>
      TEST_LATEST_RELEASE: 1
-      USE_MULTICLUSTER_OIDC_ENV: "false"
-      <<: *common_envars
+      <<: [*common_envars, *common_gke_envars]


Much nicer, including the oidc env in the common ones :)

absoludity · 2022-02-07T22:44:04Z

integration/tests/main/01-missing-permissions.spec.js

+    // Change to user's namespace using UI
+    await page.click(".kubeapps-dropdown .kubeapps-nav-link");
+    await page.selectOption('select[name="namespaces"]', "default");
+    await page.click('cds-button:has-text("Change Context")');


Ah - because token authentication is being used in GKE, it would have been defaulting to the namespace of the token - nice catch.

absoludity · 2022-02-07T22:48:40Z

integration/tests/main/03-create-private-package-repository.spec.js

+  await page.fill("input#kubeapps-docker-cred-username", process.env.DOCKER_USERNAME);
+  await page.fill("input#kubeapps-docker-cred-password", process.env.DOCKER_PASSWORD);


So I'm still confused why we are adding real credentials here, given that the image itself is public (the chartmuseum repo is private, but the chart in the repo refers to public images), so no image pull secret should be necessary (as a user, I wouldn't create one in this situation right?). If it is simply because we want to verify creating the image pull secret, even if not used, and can no longer send dummy credentials to fetch a public image, then let's be clear about that here with a comment, something like:

// We want to verify the use of adding an image pull secret here even though the images for the chart // are public. We're setting the actual credentials as the docker registry server rejects a request to pull // a public images with invalid credentials, returning a 403.

so it doesn't take us time to unravel in a year why we were sending creds in the first place?

I haven't changed the essence of this test. Docker credentials were being added before in the same way, except that it was fake credentials. Does it make sense to instruct the retrieval of public images with credentials? Maybe we were relying in some Docker URL feature that changes over time or is inconsistent?

As mentioned here, that test was failing in both Kind and GKE due to Docker being unable to pull the image docker.io/bitnami/apache:2.4.48-debian-10-r74 (and r75) with a 401 Unauthorized. That error looks right to me, as we were sending non existing credentials. In Kind I worked around that by pre-pushing the apache images to Kind (only service images were being pushed before). But when coming to GKE, I preferred to use actual credentials and not using shortcuts given that the credentials were already set up in CI...and that we really wanted to test the application of Docker credentials I assume. Actually we might be able to remove the pre-pushing used for Kind, haven't tried.

Why it wasn't failing before? I'm not sure that we were having failures correctly detected with Puppeteer before (remember the "retry" pilgrimage? :D ). Not sure neither that credentials were sometimes being ignored/lost when pulling the images, maybe due to https://index.docker.io/v1/ inconsistently ignoring them due to requesting a public image.

absoludity · 2022-02-07T22:54:37Z

integration/tests/main/03-create-private-package-repository.spec.js

+  await page.click('a.nav-link:has-text("Applications")');
+  await page.waitForTimeout(3000); // Sometimes typing was too fast to get the result shown
+  await page.locator("input#search").fill(appName);
+  await page.waitForTimeout(3000);


That's an extra 6 seconds added to the test here. Could we instead just wait until the selector below is available to click on?

absoludity · 2022-02-07T22:55:42Z

integration/tests/main/03-create-private-package-repository.spec.js

+  await expect(page.locator(".left-menu")).toContainText("Up to date", { timeout: deployTimeout });
+  await page.waitForTimeout(5000);
+  await page.click('a.nav-link:has-text("Applications")');
+  await page.waitForTimeout(3000); // Sometimes typing was too fast to get the result shown
+  await page.locator("input#search").fill(appName);
+  await page.waitForTimeout(3000);


Yeah - I worry about adding timeouts like this unless absolutely necessary. As above, it's not clear to me why we can't instead wait for the selector below to be available before clicking. There's 11 seconds of waiting in the last 5 lines.

absoludity · 2022-02-07T22:57:58Z

integration/tests/main/09-user-positive-installation.spec.js

@@ -73,7 +75,8 @@ test.describe("Limited user simple deployments", () => {

    // Search for package deployed
    await page.click('a.nav-link:has-text("Applications")');
-    await page.locator("input#search").type("alertmanager");
+    await page.waitForTimeout(3000); // Sometimes typing was too fast to get the result shown
+    await page.locator("input#search").fill("alertmanager");
    await page.waitForTimeout(3000);


Same here with the timeouts rather than waiting for the selector.

absoludity · 2022-02-07T23:00:16Z

script/e2e-test.sh

+    --set frontend.replicaCount=1
+    --set kubeops.replicaCount=1
+    --set dashboard.replicaCount=1


Hmm, it might be worth checking the commit msg there? I remember some environment we found during release was returning 502s, but it may have been for the kubeappsapis service only, which you've left with 2 below.

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

castelblanque and others added 3 commits February 7, 2022 17:14

Forcing execution of E2E tests in GKE

026a00a

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

Merge remote-tracking branch 'origin/e2e-tests' into e2e-tests

155ba6a

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

castelblanque commented Feb 7, 2022

View reviewed changes

Added some comments to E2E files

d8394d1

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

castelblanque commented Feb 7, 2022

View reviewed changes

castelblanque changed the title ~~E2e tests~~ E2e tests fixes and improvements Feb 7, 2022

absoludity approved these changes Feb 7, 2022

View reviewed changes

absoludity mentioned this pull request Feb 8, 2022

CI: Don't overwrite /app/reports directory with html reporter. #4246

Merged

Renamed timeout variables and revert to prerelease branch

1d0972c

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

castelblanque merged commit b584702 into main Feb 8, 2022

castelblanque deleted the e2e-tests branch February 8, 2022 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2e tests fixes and improvements #4243

E2e tests fixes and improvements #4243

castelblanque commented Feb 7, 2022 •

edited

castelblanque Feb 7, 2022

absoludity Feb 7, 2022

castelblanque Feb 8, 2022 •

edited

castelblanque Feb 7, 2022

castelblanque Feb 7, 2022

castelblanque Feb 7, 2022

castelblanque Feb 7, 2022

castelblanque Feb 7, 2022

castelblanque Feb 7, 2022

absoludity Feb 7, 2022

castelblanque Feb 7, 2022

absoludity Feb 7, 2022

castelblanque Feb 8, 2022

castelblanque Feb 7, 2022

castelblanque Feb 7, 2022

castelblanque Feb 7, 2022

absoludity Feb 7, 2022

castelblanque Feb 8, 2022

castelblanque Feb 7, 2022

castelblanque Feb 7, 2022

absoludity left a comment

absoludity Feb 7, 2022

absoludity Feb 7, 2022

absoludity Feb 7, 2022

absoludity Feb 7, 2022

absoludity Feb 7, 2022

castelblanque Feb 8, 2022

absoludity Feb 7, 2022

absoludity Feb 7, 2022

absoludity Feb 7, 2022

absoludity Feb 7, 2022

		DEX_IP=${8:-"172.18.0.2"}
		ADDITIONAL_CLUSTER_IP=${9:-"172.18.0.3"}

		kubectl create rolebinding kubeapps-view-user-apprepo-read -n kubeapps-user-namespace --clusterrole=kubeapps:kubeapps:apprepositories-read --serviceaccount kubeapps:kubeapps-view
		kubectl create rolebinding kubeapps-view-user -n kubeapps-user-namespace --clusterrole=edit --serviceaccount kubeapps:kubeapps-view

		await page.fill("input#kubeapps-docker-cred-username", process.env.DOCKER_USERNAME);
		await page.fill("input#kubeapps-docker-cred-password", process.env.DOCKER_PASSWORD);

E2e tests fixes and improvements #4243

E2e tests fixes and improvements #4243

Conversation

castelblanque commented Feb 7, 2022 • edited

Description of the change

Benefits

Possible drawbacks

Applicable issues

Choose a reason for hiding this comment

Choose a reason for hiding this comment

castelblanque Feb 8, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

absoludity left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

castelblanque commented Feb 7, 2022 •

edited

castelblanque Feb 8, 2022 •

edited