Improve the visiblity into individual e2e test case failures #68

timflannagan · 2022-09-27T20:27:16Z

Related to #66 which has the debug information present.

Signed-off-by: timflannagan <timflannagan@gmail.com>

timflannagan · 2022-09-27T20:28:16Z

Holding for CI feedback in #66.

/hold

timflannagan · 2022-09-27T20:28:29Z

These changes only affect the dev testing suite. Adding the required no-ff labels.

/label px-approved
/label docs-approved
/label qe-approved

tylerslaton

/lgtm

openshift-ci · 2022-09-27T20:39:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: timflannagan, tylerslaton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [timflannagan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

timflannagan · 2022-09-27T22:43:00Z

Removing the hold now that #66 had e2e runs that expressed the desired behavior.

/hold cancel

environments Signed-off-by: timflannagan <timflannagan@gmail.com>

Signed-off-by: timflannagan <timflannagan@gmail.com>

timflannagan · 2022-09-28T02:40:48Z

test/e2e/util.go

+		return err
+	}
+
+	cmd := exec.Command("/bin/bash", "-c", "./collect-ci-artifacts.sh")


Follow-up: avoid hardcoding the path to the script location.

timflannagan · 2022-09-28T02:49:21Z

test/e2e/util.go

+	// current test case failed. attempt to collect CI artifacts if the
+	// $ARTIFACT_DIR environment variable has been set. This variable is
+	// always present in downstream CI environments.
+	artifactDir := os.Getenv("ARTIFACT_DIR")


Potential follow-up: There's overlap between how we handle the $ARTIFACT_DIR environment variable in the Makefile invocation vs. how we're handling the value of that shared variable here. It could be worth consolidating this logic into a shared CLI flag, but it's unclear whether that's the correct implementation going forward.

The main problem with this approach is the potential for skew between how ginkgo handles relative paths vs. how we're handling relative paths within this internal testing code. The expectation is that downstream CI would set the value of $ARTIFACT_DIR variable to an absolute path, so we're in the clear there, but there's no guarantee that we're correctly configuring (or enforcing) the want/need for an absolute path when running the e2e suite locally during dev workflows.

I don't think anything I mentioned is blocking, but it's still something worth noting and potentially revisiting in future phase implementations.

timflannagan · 2022-09-28T02:53:11Z

test/e2e/collect-ci-artifacts.sh

+    # "oc" binary is located at "/cli/oc" path. This is problematic as the /cli directory
+    # doesn't exist in the $PATH environment variable, which causes issues when running
+    # this script via the exec.Command Golang function.
+    if [[ "$OPENSHIFT_CI" == "true" ]]; then


Note: We could also perform a -d check on whether the /cli directory exists. The OPENSHIFT_CI variable will always be set when running the e2e suite in downstream CI environments, so it feels like we can make a really solid assumption on where the oc binary will live going forward.

This isn't completely future proof in the case this directory location changes, but we should be fine given we're stat(ing) the configured kubectl binary immediately after performing this conditional check. In the future, updating this script and adding a conditional check for validating this directory exists could produce a more robust implementation.

openshift-ci · 2022-09-28T04:30:42Z

@timflannagan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-techpreview-operator	`4c7276b`	link	false	`/test e2e-aws-techpreview-operator`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

joelanford · 2022-09-28T12:52:15Z

test/e2e/collect-ci-artifacts.sh

+set -o nounset
+set -o errexit
+
+: "${KUBECONFIG:?}"


Do we need/use the KUBECONFIG variable?

Nope, we're not explicitly using that variable here, but it's a nice sanity check given it verifies that variable has been set, and that we're going to be able to use a kubectl/oc binary further into the control flow.

Nit: comment to that effect then? Otherwise someone might decide to delete it in the future.

tylerslaton · 2022-09-28T13:16:42Z

Any idea as to what's going on in the CI failure here? I took a look and it seems like things failed pretty quickly so I'm not sure if we know its a flake or not.

timflannagan · 2022-09-28T13:21:19Z

There's some flakes in the e2e testing suite, which is why I wanted to introduce these changes. If you dive into the test case directory that failed, you can see the relevant YAML output of the resources that get reconciled.

I had the hunch that our consistent usage of cert-manager throughout the e2e suite was problematic as we were constantly installing and then performing a cascading deletion of those contents, and the installing those same contents again without giving any buffer for that deletion to happen. It looks like that hunch is being validated looking at the following status of a failed BundleDeployment resource:

    - lastTransitionTime: "2022-09-28T03:37:23Z"
      message: 'rendered manifests contain a resource that already exists. Unable
        to continue with install: Namespace "openshift-cert-manager-operator" in namespace
        "" exists and cannot be imported into the current release: invalid ownership
        metadata; annotation validation error: key "meta.helm.sh/release-name" must
        equal "cert-managerzv79l": current value is "cert-managerlwcq9"'
      reason: InstallFailed
      status: "False"
      type: Installed

Not too sure whether this is a rukpak bug, or we need to add some buffer to ensure we're not re-installing the same contents that are being deleted. The potential problem with the latter is that we're hiding controller failures.

tylerslaton · 2022-09-28T13:21:58Z

/lgtm

timflannagan added 3 commits September 27, 2022 16:26

Ignore the test/e2e/.kube cache directory

617789d

Signed-off-by: timflannagan <timflannagan@gmail.com>

test/e2e: Introduce a bash script that gathers testing artifacts

d90f4f7

Signed-off-by: timflannagan <timflannagan@gmail.com>

test/e2e: Ensure testing artifacts are gathered during failed test cases

9e037e0

Signed-off-by: timflannagan <timflannagan@gmail.com>

openshift-ci bot requested review from perdasilva and tylerslaton September 27, 2022 20:27

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 27, 2022

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 27, 2022

openshift-ci bot added px-approved Signifies that Product Support has signed off on this PR docs-approved Signifies that Docs has signed off on this PR qe-approved Signifies that QE has signed off on this PR labels Sep 27, 2022

tylerslaton approved these changes Sep 27, 2022

View reviewed changes

openshift-ci bot assigned tylerslaton Sep 27, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 27, 2022

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 27, 2022

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 27, 2022

timflannagan mentioned this pull request Sep 28, 2022

minor cleanup and refactoring for consistency and simplicity #40

Closed

timflannagan added 2 commits September 27, 2022 22:38

test/e2e: Add the /cli directory to $PATH when running in CI

d7b8c3c

environments Signed-off-by: timflannagan <timflannagan@gmail.com>

test/e2e: Default to the oc executable when KUBECTL is unset

4c7276b

Signed-off-by: timflannagan <timflannagan@gmail.com>

timflannagan force-pushed the e2e/gather-testing-artifacts-when-failed branch from ae08d58 to 4c7276b Compare September 28, 2022 02:40

timflannagan commented Sep 28, 2022

View reviewed changes

joelanford reviewed Sep 28, 2022

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 28, 2022

openshift-merge-robot merged commit 4feb541 into openshift:main Sep 28, 2022

timflannagan deleted the e2e/gather-testing-artifacts-when-failed branch September 28, 2022 13:27

Improve the visiblity into individual e2e test case failures #68

Improve the visiblity into individual e2e test case failures #68

Uh oh!

Conversation

timflannagan commented Sep 27, 2022

Uh oh!

timflannagan commented Sep 27, 2022

Uh oh!

timflannagan commented Sep 27, 2022

Uh oh!

tylerslaton left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Sep 27, 2022

Uh oh!

timflannagan commented Sep 27, 2022

Uh oh!

timflannagan Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

timflannagan Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

timflannagan Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Sep 28, 2022

Uh oh!

joelanford Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

timflannagan Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

joelanford Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

tylerslaton commented Sep 28, 2022

Uh oh!

timflannagan commented Sep 28, 2022

Uh oh!

tylerslaton commented Sep 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants