Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1865998: Tolerate multiple package manifests with the same name #6225

Merged

Conversation

spadgett
Copy link
Member

@spadgett spadgett commented Aug 5, 2020

Generate a unique key for package manifests in our k8s reducer when name and namespace aren't unique.

@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. label Aug 5, 2020
@openshift-ci-robot
Copy link
Contributor

@spadgett: This pull request references Bugzilla bug 1865998, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1865998: Tolerate multiple package manifests with the same name

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Aug 5, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 5, 2020
@spadgett
Copy link
Member Author

spadgett commented Aug 5, 2020

Note this is only a partial fix. Links to package manifest details pages will still be ambiguous since we rely on name + namespace to be unique in the URL.

@spadgett
Copy link
Member Author

spadgett commented Aug 5, 2020

/cherry-pick release-4.5

@openshift-cherrypick-robot

@spadgett: once the present PR merges, I will cherry-pick it on top of release-4.5 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@spadgett
Copy link
Member Author

spadgett commented Aug 5, 2020

/assign @TheRealJon @andrewballantyne

This change partially works around upstream OLM bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1814822

Generate a unique key for package manifests in our k8s reducer when name
and namespace aren't unique.
@spadgett
Copy link
Member Author

spadgett commented Aug 5, 2020

/retest

@andrewballantyne
Copy link
Contributor

/lgtm

If the tests pass I think we are good to go. It appears to have addressed the issue.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewballantyne, spadgett

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@spadgett
Copy link
Member Author

spadgett commented Aug 5, 2020

It got past the original failure, although failed a later test. I'll dig into this once the artifacts are captured. It might be a different flake.

80) Interacting with an `AllNamespaces` install mode Operator (Jaeger)
   ✔ displays subscription creation form for selected Operator
   ✔ selects all namespaces for Operator subscription
   ✔ displays Operator as subscribed in OperatorHub
   ✔ displays Operator in "Cluster Service Versions" view for "test-hjnyy" namespace
   ✔ creates Operator `Deployment`
   ✔ displays metadata about Operator in the "Overview" section
   ✔ displays empty message in the "Jaeger" section
   ✔ displays form editor for creating a new `Jaeger` instance
   ✔ displays new `Jaeger` that was created from YAML editor
   ✔ displays metadata about the created `Jaeger` in its "Overview" section
A Jasmine spec timed out. Resetting the WebDriver Control Flow.
A Jasmine spec timed out. Resetting the WebDriver Control Flow.
A Jasmine spec timed out. Resetting the WebDriver Control Flow.
   ✖ displays the raw YAML for the `Jaeger` (3 failures)

@spadgett
Copy link
Member Author

spadgett commented Aug 5, 2020

/retest

@andrewballantyne
Copy link
Contributor

andrewballantyne commented Aug 5, 2020

/retest

Hoping it's an unrelated flake?

@spadgett
Copy link
Member Author

spadgett commented Aug 5, 2020

I think it's an unrelated related flake. I'm testing locally now.

@spadgett
Copy link
Member Author

spadgett commented Aug 5, 2020

It works locally for me (both testing manually and running the protractor tests). I have a suspicion the resource updated in the background while the test was running, which prevented the test from saving the YAML. But I'm not sure.

@andrewballantyne
Copy link
Contributor

It works locally for me (both testing manually and running the protractor tests). I have a suspicion the resource updated in the background while the test was running, which prevented the test from saving the YAML. But I'm not sure.

I'm not an expert on this, but I thought you have referenced screenshots before when tests fail? Like the page as-is when the test failed.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@spadgett
Copy link
Member Author

spadgett commented Aug 5, 2020

I'm not an expert on this, but I thought you have referenced screenshots before when tests fail? Like the page as-is when the test failed.

Yeah, screenshot is here:

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_console/6225/pull-ci-openshift-console-master-e2e-gcp-console/1291048652772478976/artifacts/e2e-gcp-console/gui_test_screenshots/bc627daf4d44611dd42152266e071756.png

It's under Artifacts -> e2e-gcp-console -> gui_test_screenshots from the test details page.

@openshift-merge-robot openshift-merge-robot merged commit cd87130 into openshift:master Aug 5, 2020
@openshift-ci-robot
Copy link
Contributor

@spadgett: All pull requests linked via external trackers have merged: openshift/console#6225. Bugzilla bug 1865998 has been moved to the MODIFIED state.

In response to this:

Bug 1865998: Tolerate multiple package manifests with the same name

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@spadgett: new pull request created: #6237

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@spadgett spadgett deleted the package-manifest-qn branch August 5, 2020 20:42
@spadgett
Copy link
Member Author

spadgett commented Aug 6, 2020

Digging into the YAML flake more, I've confirmed the resource was updated in the background:

[SEVERE] https://console-openshift-console.apps.ci-op-tmbri0v5-75d12.origin-ci-int-gce.dev.openshift.com/api/kubernetes/apis/jaegertracing.io/v1/namespaces/test-hjnyy/jaegers/jaeger-all-in-one-inmemory - Failed to load resource: the server responded with a status of 409 (Conflict)

I'm tempted to remove the lines that save the YAML editor in the OLM tests. That's already well-covered by the CRUD tests and doesn't seem specific to OLM.

https://github.com/openshift/console/blob/master/frontend/packages/operator-lifecycle-manager/integration-tests/scenarios/global-installmode.scenario.ts#L201-L206

I'm surprised there isn't an error message in the screenshot, though.

@andrewballantyne
Copy link
Contributor

Digging into the YAML flake more, I've confirmed the resource was updated in the background:

[SEVERE] https://console-openshift-console.apps.ci-op-tmbri0v5-75d12.origin-ci-int-gce.dev.openshift.com/api/kubernetes/apis/jaegertracing.io/v1/namespaces/test-hjnyy/jaegers/jaeger-all-in-one-inmemory - Failed to load resource: the server responded with a status of 409 (Conflict)

I'm tempted to remove the lines that save the YAML editor in the OLM tests. That's already well-covered by the CRUD tests and doesn't seem specific to OLM.

Sounds like overlap that adds to the flakes. I think we don't need tests for this YAML save ... not sure it adds any quality to our tests.

https://github.com/openshift/console/blob/master/frontend/packages/operator-lifecycle-manager/integration-tests/scenarios/global-installmode.scenario.ts#L201-L206

I'm surprised there isn't an error message in the screenshot, though.

I was too - figured maybe the error got replaced with the info or something haha.

@spadgett
Copy link
Member Author

spadgett commented Aug 6, 2020

I was too - figured maybe the error got replaced with the info or something haha.

That's exactly it! Here's how to reproduce manually:

  1. Start editing resource YAML in browser tab 1.
  2. Commit some changes to the same resource in tab 2.
  3. Click save in tab 1. (You get a conflict error.)
  4. Commit some other changes in tab 2.

The error goes away from tab 1. I'm almost certain that's what happened in the tests, in which case we got unlucky. And things are working as expected. I'm not sure if there's a good way to fix this other than removing the save from the test.

Arguably it's a bug that we clear the error on background updates, though.

@spadgett
Copy link
Member Author

spadgett commented Aug 6, 2020

I guess an alternate fix is to always click Reload before Save if we want to keep the test.

@andrewballantyne
Copy link
Contributor

I guess an alternate fix is to always click Reload before Save if we want to keep the test.

I think that will just reduce the flake count... not eliminate it. I don't think there is anything specific to the spec of a resource that says it cannot update back to back because of some criteria is deems is needed.

@andrewballantyne
Copy link
Contributor

Arguably it's a bug that we clear the error on background updates, though.

Imo, definitely a bug :) Clearing errors is not something I think we should do until another submit is triggered. Submit errors are things that I think should remain around even if the data reloads as the user is the only one who really has control over how fast they consumed that error message and understood it... so programmatically removing it I feel is a bad UX.

@spadgett
Copy link
Member Author

spadgett commented Aug 6, 2020

I opened https://bugzilla.redhat.com/show_bug.cgi?id=1866875 for the error message getting cleared.

@spadgett spadgett added this to the v4.6 milestone Aug 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. component/core Related to console core functionality lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants