Skip to content

Conversation

@timflannagan
Copy link
Contributor

Related to #66 which has the debug information present.

Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 27, 2022
@timflannagan
Copy link
Contributor Author

Holding for CI feedback in #66.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 27, 2022
@timflannagan
Copy link
Contributor Author

These changes only affect the dev testing suite. Adding the required no-ff labels.

/label px-approved
/label docs-approved
/label qe-approved

@openshift-ci openshift-ci bot added px-approved Signifies that Product Support has signed off on this PR docs-approved Signifies that Docs has signed off on this PR qe-approved Signifies that QE has signed off on this PR labels Sep 27, 2022
Copy link
Contributor

@tylerslaton tylerslaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 27, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 27, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: timflannagan, tylerslaton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 27, 2022
@timflannagan
Copy link
Contributor Author

Removing the hold now that #66 had e2e runs that expressed the desired behavior.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 27, 2022
environments

Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
@timflannagan timflannagan force-pushed the e2e/gather-testing-artifacts-when-failed branch from ae08d58 to 4c7276b Compare September 28, 2022 02:40
return err
}

cmd := exec.Command("/bin/bash", "-c", "./collect-ci-artifacts.sh")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: avoid hardcoding the path to the script location.

// current test case failed. attempt to collect CI artifacts if the
// $ARTIFACT_DIR environment variable has been set. This variable is
// always present in downstream CI environments.
artifactDir := os.Getenv("ARTIFACT_DIR")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential follow-up: There's overlap between how we handle the $ARTIFACT_DIR environment variable in the Makefile invocation vs. how we're handling the value of that shared variable here. It could be worth consolidating this logic into a shared CLI flag, but it's unclear whether that's the correct implementation going forward.

The main problem with this approach is the potential for skew between how ginkgo handles relative paths vs. how we're handling relative paths within this internal testing code. The expectation is that downstream CI would set the value of $ARTIFACT_DIR variable to an absolute path, so we're in the clear there, but there's no guarantee that we're correctly configuring (or enforcing) the want/need for an absolute path when running the e2e suite locally during dev workflows.

I don't think anything I mentioned is blocking, but it's still something worth noting and potentially revisiting in future phase implementations.

# "oc" binary is located at "/cli/oc" path. This is problematic as the /cli directory
# doesn't exist in the $PATH environment variable, which causes issues when running
# this script via the exec.Command Golang function.
if [[ "$OPENSHIFT_CI" == "true" ]]; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We could also perform a -d check on whether the /cli directory exists. The OPENSHIFT_CI variable will always be set when running the e2e suite in downstream CI environments, so it feels like we can make a really solid assumption on where the oc binary will live going forward.

This isn't completely future proof in the case this directory location changes, but we should be fine given we're stat(ing) the configured kubectl binary immediately after performing this conditional check. In the future, updating this script and adding a conditional check for validating this directory exists could produce a more robust implementation.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 28, 2022

@timflannagan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-techpreview-operator 4c7276b link false /test e2e-aws-techpreview-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

set -o nounset
set -o errexit

: "${KUBECONFIG:?}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need/use the KUBECONFIG variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we're not explicitly using that variable here, but it's a nice sanity check given it verifies that variable has been set, and that we're going to be able to use a kubectl/oc binary further into the control flow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: comment to that effect then? Otherwise someone might decide to delete it in the future.

@tylerslaton
Copy link
Contributor

Any idea as to what's going on in the CI failure here? I took a look and it seems like things failed pretty quickly so I'm not sure if we know its a flake or not.

@timflannagan
Copy link
Contributor Author

There's some flakes in the e2e testing suite, which is why I wanted to introduce these changes. If you dive into the test case directory that failed, you can see the relevant YAML output of the resources that get reconciled.

I had the hunch that our consistent usage of cert-manager throughout the e2e suite was problematic as we were constantly installing and then performing a cascading deletion of those contents, and the installing those same contents again without giving any buffer for that deletion to happen. It looks like that hunch is being validated looking at the following status of a failed BundleDeployment resource:

    - lastTransitionTime: "2022-09-28T03:37:23Z"
      message: 'rendered manifests contain a resource that already exists. Unable
        to continue with install: Namespace "openshift-cert-manager-operator" in namespace
        "" exists and cannot be imported into the current release: invalid ownership
        metadata; annotation validation error: key "meta.helm.sh/release-name" must
        equal "cert-managerzv79l": current value is "cert-managerlwcq9"'
      reason: InstallFailed
      status: "False"
      type: Installed

Not too sure whether this is a rukpak bug, or we need to add some buffer to ensure we're not re-installing the same contents that are being deleted. The potential problem with the latter is that we're hiding controller failures.

@tylerslaton
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 28, 2022
@openshift-merge-robot openshift-merge-robot merged commit 4feb541 into openshift:main Sep 28, 2022
@timflannagan timflannagan deleted the e2e/gather-testing-artifacts-when-failed branch September 28, 2022 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants