Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1898745: set progressing false on imagestream events as well if no active streams #338

Conversation

gabemontero
Copy link
Contributor

@gabemontero gabemontero commented Nov 12, 2020

Seen a few runs in CI today for various repos fail where the the samples operator did not get out of progressing == true state

When examining the pods logs, some odd message after our attempt to update progressing to false after getting a configmap delete event failed:

2020-11-12T17:29:36.043718980Z W1112 17:29:36.043675      16 reflector.go:424] github.com/openshift/client-go/image/informers/externalversions/factory.go:101: watch of *v1.ImageStream ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 225; INTERNAL_ERROR") has prevented the request from succeeding
2020-11-12T17:29:45.622049521Z time="2020-11-12T17:29:45Z" level=info msg="CRDERROR progressing false update"
2020-11-12T17:29:45.622089759Z time="2020-11-12T17:29:45Z" level=error msg="unable to sync: context deadline exceeded, requeuing"
2020-11-12T17:29:51.556538089Z time="2020-11-12T17:29:51Z" level=error msg="unable to sync: Timeout: request did not complete within requested timeout 34s, requeuing"

Even with returning an error, the delete event for the config map is not requeued.

We continue to get imagestream events.

This change (re)adds attempts to set progressing to false on imagestream events as well, when it is clear all the imagestream configmaps are gone. We may get a few conflicts when several imagestreams come in at the same time, but in the end that is no big deal.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 12, 2020
@gabemontero
Copy link
Contributor Author

@@ -379,5 +379,5 @@ func (h *Handler) processImportStatus(is *imagev1.ImageStream, cfg *v1.Config) (
}
}

return cfg, nonMatchDetail, err
return cfg, err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to clean this now unused return field when BUILD-125 merged

@gabemontero
Copy link
Contributor Author

e2e-aws

* could not run steps: step e2e-aws failed: failed to acquire lease: resources not found

@gabemontero
Copy link
Contributor Author

gabemontero commented Nov 12, 2020

ocp image-eco more env and lower level in the stack hoopla ... more of these Failed to get endpoints I've seen in other PRs today

sample operator pod logs and installed imagestreams were fine

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

/test e2e-aws-operator

@gabemontero
Copy link
Contributor Author

/test e2e-aws-upgrade

@gabemontero
Copy link
Contributor Author

samples operator upgrade looks good in the last 2 e2e-aws-upgrade runs ... other operators had issues

/test e2e-aws-upgrade

@gabemontero
Copy link
Contributor Author

samples still good on upgrade

/test e2e-aws-upgrade

@gabemontero
Copy link
Contributor Author

samples still good in the failing e2e-aws-upgrade test

@gabemontero gabemontero force-pushed the more-aggressive-progress-false branch 2 times, most recently from 673c854 to bf58580 Compare November 15, 2020 22:01
do not just depend on configmap events (many of which are deletes so no relist)
more proactive version update, move off migration condition to configmap/sample inspection
@gabemontero
Copy link
Contributor Author

/assign @coreydaley

Copy link
Member

@coreydaley coreydaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits, but overall seems to be resolving the issue.

@@ -221,6 +221,10 @@ func (h *Handler) buildFileMaps(cfg *v1.Config, forceRebuild bool) error {
cm = &corev1.ConfigMap{}
cm.Name = util.IST2ImageMap
cm.Namespace = v1.OperatorNamespace
if cm.Annotations == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, but you could move this one and the one inside of the if err == nil block right under the cm, err := h.configmapclientwrapper.Get(util.IST2ImageMap) to de-duplicate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the Get will return a nil for cm though, right? ... at least I believe that is what I observed

that said, I don't need the nil check when I'm constructing the cm from scratch here ... so I will at least get rid of that

for key, value := range h.imagestreatagToImage {
cm.Data[key] = value
}
_, err = h.configmapclientwrapper.Update(cm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if _, err = h.configmapclientwrapper.Update(cm); err != nil {
    return err
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of next push

pkg/stub/handler.go Show resolved Hide resolved
return err
}
logrus.Printf("clearImageStreamTagError: stream %s already deleted so no worries on clearing tags",
isName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This shouldn't need to be on it's own line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of next push

pkg/stub/imagestreams.go Outdated Show resolved Hide resolved
@gabemontero
Copy link
Contributor Author

updates pushed in separate commit @coreydaley ... PTAL

@coreydaley
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 16, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: coreydaley, gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gabemontero
Copy link
Contributor Author

thanks @coreydaley

I'll squash the separate commit I had for you comments and then reapply lgtm

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Nov 16, 2020
@openshift-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@gabemontero gabemontero added the lgtm Indicates that a PR is ready to be merged. label Nov 16, 2020
@gabemontero
Copy link
Contributor Author

/hold

given the e2e aws instability around openshift kube apiserver auth errors and the flakes we've seen over the last few days that is getting worked

will unhold when that clears up to avoid repeated e2e launching

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 16, 2020
@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 17, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@gabemontero
Copy link
Contributor Author

/override ci/prow/okd-e2e-aws

@openshift-ci-robot
Copy link
Contributor

@gabemontero: Overrode contexts on behalf of gabemontero: ci/prow/okd-e2e-aws

In response to this:

/override ci/prow/okd-e2e-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 39ed058 into openshift:master Nov 17, 2020
@gabemontero gabemontero deleted the more-aggressive-progress-false branch November 17, 2020 21:13
@gabemontero gabemontero changed the title set progressing false on imagestream events as well if no active streams Bug 1898745: set progressing false on imagestream events as well if no active streams Nov 18, 2020
@openshift-ci-robot
Copy link
Contributor

@gabemontero: All pull requests linked via external trackers have merged:

Bugzilla bug 1898745 has been moved to the MODIFIED state.

In response to this:

Bug 1898745: set progressing false on imagestream events as well if no active streams

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants