Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emit events for each new payload #411

Merged

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Jul 21, 2020

Since the CVO updates itself every time it updates, we always lose our logs for what happens when the update starts. Updates are infrequent enough that we can emit events for them without loading the kube-apiserver.

@smarterclayton
Copy link
Contributor

This is eminently reasonable

@smarterclayton
Copy link
Contributor

One comment, this lgtm assuming e2e and upgrade pass and output events

@mfojtik
Copy link
Member

mfojtik commented Jul 22, 2020

/lgtm

Clayton comment is reasonable to address :-)

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2020
@deads2k
Copy link
Contributor Author

deads2k commented Jul 22, 2020

One comment, this lgtm assuming e2e and upgrade pass and output events

That's tricksy. You can't see the events until the CVO doing the upgrade (current master) emits the events.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2020
pkg/cvo/cvo.go Outdated Show resolved Hide resolved
@@ -474,6 +480,8 @@ func (w *SyncWorker) syncOnce(ctx context.Context, work *SyncWork, maxWorkers in
validPayload := w.payload
if validPayload == nil || !equalUpdate(configv1.Update{Image: validPayload.ReleaseImage}, configv1.Update{Image: update.Image}) {
klog.V(4).Infof("Loading payload")
cvoObjectRef := &corev1.ObjectReference{APIVersion: "config.openshift.io/v1", Kind: "ClusterVersion", Name: "version"}
w.eventRecorder.Eventf(cvoObjectRef, corev1.EventTypeNormal, "RetrievePayload", "retrieving payload version=%q image=%q", update.Version, update.Image)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to have the version and image as structured data on the event. Is that possible, or are flat strings all we have available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to have the version and image as structured data on the event. Is that possible, or are flat strings all we have available?

To my knowledge, flat strings are what you have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we could do that as annotation, but let's not for now.

@@ -483,6 +491,7 @@ func (w *SyncWorker) syncOnce(ctx context.Context, work *SyncWork, maxWorkers in
})
info, err := w.retriever.RetrievePayload(ctx, update)
if err != nil {
w.eventRecorder.Eventf(cvoObjectRef, corev1.EventTypeWarning, "RetrievePayloadFailed", "retrieving payload failed version=%q image=%q failure=%v", update.Version, update.Image, err)
reporter.Report(SyncWorkerStatus{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could move the event recorder under reporter.Report?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could move the event recorder under reporter.Report?

that would be a significant change. It's used for all error calls and I'm not certain of the fanout that would cause. I'd like to solve the immediate "special" events and allow someone closer to the operator decide if and how to emit an event for each report.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's keep reporter separate for now.

@deads2k
Copy link
Contributor Author

deads2k commented Jul 22, 2020

/retest

@smarterclayton
Copy link
Contributor

Wierd, none of these events are showing up in the monitor. Are events getting sent?

@deads2k
Copy link
Contributor Author

deads2k commented Jul 22, 2020

Wierd, none of these events are showing up in the monitor. Are events getting sent?

no, because the version of the CVO that would produce these is master:HEAD since that's the version looking up the new payload. We won't see these until after this is merged. Same problem you had.

@wking
Copy link
Member

wking commented Jul 22, 2020

I launched a cluster-bot test with test upgrade openshift/cluster-version-operator#411 4.6.0-0.ci gcp, which succeeded. But:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1286037162390720512/artifacts/launch/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-cluster-version") | .firstTimestamp + " " + .type + " " + .reason + ": " + .message' | grep 'Payload\|Preconditions\|version='
...no hits...

Checking the source release, the build log has:

2020/07/22 20:36:15 Resolved source https://github.com/openshift/cluster-version-operator to master@28e4400e, merging: #411 7d3621f4 @system:serviceaccount:ci:ci-chat-bot

so not clear to me why I'm not seeing these events. Do we actually set a real eventRecorder in 7d3621f?

@wking
Copy link
Member

wking commented Jul 23, 2020

e2e with 42d74a0:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/411/pull-ci-openshift-cluster-version-operator-master-e2e/1286074666141618176/artifacts/e2e/gather-extra/pods/openshift-cluster-version_cluster-version-operator-b5c76fb4b-zrf76_cluster-version-operator.log | grep -A3 DEADS
#### DEADS patch  "openshift-cluster-version"!!!
I0722 23:19:59.995655       1 payload.go:230] Loading updatepayload from "/"
E0722 23:19:59.998582       1 event.go:272] Unable to write event: 'can't create an event with namespace 'default' in namespace 'openshift-cluster-version'' (may retry after sleeping)
I0722 23:19:59.998691       1 event.go:281] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'Something' someevent

So some sort of hiccup around which namespace this is living in. I'm not familiar enough with Event publishing to know exactly what we're missing...

@wking
Copy link
Member

wking commented Jul 23, 2020

Logged line is from here. Looks like our sink expects openshift-cluster-version (good), but the *Event getting fed in is using default (bad). I'm still trying to figure out how eventRecorder.Eventf picks its namespace so we can figure out how to configure it... [edit, here is how]

pkg/cvo/sync_worker.go Outdated Show resolved Hide resolved
@wking
Copy link
Member

wking commented Jul 23, 2020

New test upgrade openshift/cluster-version-operator#411 4.6.0-0.ci gcp job (job failed, but completed the update):

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1286331887161184256/artifacts/launch/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-cluster-version") | .firstTimestamp + " " + .type + " " + .reason + ": " + .message' | grep 'Payload\|Preconditions\|version='
2020-07-23T16:25:39Z Normal RetrievePayload: retrieving payload version="0.0.1-2020-07-23-161007" image="registry.svc.ci.openshift.org/ci-ln-2iqzykk/release@sha256:7e61fe41593f01779cbb6e1960201a13ae08e254e3c86215ad9916e4cff32b51"
2020-07-23T16:25:39Z Normal VerifyPayload: verifying payload version="0.0.1-2020-07-23-161007" image="registry.svc.ci.openshift.org/ci-ln-2iqzykk/release@sha256:7e61fe41593f01779cbb6e1960201a13ae08e254e3c86215ad9916e4cff32b51"
2020-07-23T16:25:40Z Normal PayloadLoaded: payload loaded version="0.0.1-2020-07-23-161007" image="registry.svc.ci.openshift.org/ci-ln-2iqzykk/release@sha256:7e61fe41593f01779cbb6e1960201a13ae08e254e3c86215ad9916e4cff32b51"
2020-07-23T16:54:01Z Normal RetrievePayload: retrieving payload version="0.0.1-2020-07-23-161007" image="registry.svc.ci.openshift.org/ci-ln-2iqzykk/release@sha256:7e61fe41593f01779cbb6e1960201a13ae08e254e3c86215ad9916e4cff32b51"
2020-07-23T16:54:01Z Normal VerifyPayload: verifying payload version="0.0.1-2020-07-23-161007" image="registry.svc.ci.openshift.org/ci-ln-2iqzykk/release@sha256:7e61fe41593f01779cbb6e1960201a13ae08e254e3c86215ad9916e4cff32b51"
2020-07-23T16:54:01Z Normal PayloadLoaded: payload loaded version="0.0.1-2020-07-23-161007" image="registry.svc.ci.openshift.org/ci-ln-2iqzykk/release@sha256:7e61fe41593f01779cbb6e1960201a13ae08e254e3c86215ad9916e4cff32b51"
2020-07-23T16:57:24Z Normal RetrievePayload: retrieving payload version="" image="registry.svc.ci.openshift.org/ci-ln-2iqzykk/release@sha256:bcdcc30fef79ad503c4931800554f1ab258c36f25ff646670e673ead68456db6"
2020-07-23T16:57:33Z Normal VerifyPayload: verifying payload version="" image="registry.svc.ci.openshift.org/ci-ln-2iqzykk/release@sha256:bcdcc30fef79ad503c4931800554f1ab258c36f25ff646670e673ead68456db6"
2020-07-23T16:57:33Z Normal PreconditionsPassed: preconditions passed for payload loaded version="" image="registry.svc.ci.openshift.org/ci-ln-2iqzykk/release@sha256:bcdcc30fef79ad503c4931800554f1ab258c36f25ff646670e673ead68456db6"
2020-07-23T16:57:33Z Normal PayloadLoaded: payload loaded version="" image="registry.svc.ci.openshift.org/ci-ln-2iqzykk/release@sha256:bcdcc30fef79ad503c4931800554f1ab258c36f25ff646670e673ead68456db6"

Do we want to include the current reconciliation mode in the events, so folks understand why we aren't running preconditions as the install-time release's CVO hops from node to node? And do we want to do anything about version="" (because for --to-image flows, we only know the version after we've loaded it. There may be a way we could get a version string into the PreconditionsPassed and PayloadLoaded events...).

Also in this space, we should teach the CVO that it does not need to download and fetch a release image pullspec that matches its current pod pullspec. That would cover the node-hopping angle.

@deads2k
Copy link
Contributor Author

deads2k commented Jul 23, 2020

Do we want to include the current reconciliation mode in the events, so folks understand why we aren't running preconditions as the install-time release's CVO hops from node to node? And do we want to do anything about version="" (because for --to-image flows, we only know the version after we've loaded it. There may be a way we could get a version string into the PreconditionsPassed and PayloadLoaded events...).

Also in this space, we should teach the CVO that it does not need to download and fetch a release image pullspec that matches its current pod pullspec. That would cover the node-hopping angle.

for my purposes, this current behavior works fine.

@deads2k
Copy link
Contributor Author

deads2k commented Jul 23, 2020

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Jul 23, 2020

amusingly, I need this in master to figure out why the latest upgrade for it failed.

@wking
Copy link
Member

wking commented Jul 23, 2020

@deads2k assures me that there should be no downstream consumers that expect specific Event patterns as an API. I've filed kubernetes/kubernetes#93396 with an attempt to formalize that. With that understanding, I am fine landing this PR as it stands, knowing we are free to reroll any and all of it later without worrying about breaking compat.

/lgtm

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2020
@openshift-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@deads2k deads2k added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 24, 2020
@deads2k
Copy link
Contributor Author

deads2k commented Jul 24, 2020

simple rebase, retagged

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit a49fef5 into openshift:master Jul 24, 2020
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 13, 2020
Only during updates, because:

* Install-time is a free-for-all, where the CVO doesn't block on
  anything.  This would be a lot of "node complete" noise about nodes
  where we had only attempted to push manifests, and that's unlikely
  to be what event-readers expect TaskNodeComplete to imply.

* Reconcile-time hopefully has very few instances where the CVO needs
  to stomp on changes, block on a recently Available=False operator,
  etc.  Eventing on each completed TaskNode would be lots of noise
  without much interesting signal.

During updates, we have the structured graph and blocking TaskNodes
described in docs/user/reconciliation.md, and the flow through that
graph is what the events from this commit will help shed light on.
You could also achieve this by preserving logs from the CVO pods as
they are repositioned throughout an update, but we don't have tooling
in CI to do that conveniently today.

The hardcoded name and namespace for cvoObjectRef isn't great (for
example, it won't work in pkg/start/start_integration_test.go , where
the ClusterVersion's name and namespace are random).  But it's the
pattern we've used since we started eventing in 475e71f (emit
events for each new payload, 2020-07-21, openshift#411), so I'm recylcing it
for now.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 13, 2020
Only during updates, because:

* Install-time is a free-for-all, where the CVO doesn't block on
  anything.  This would be a lot of "node complete" noise about nodes
  where we had only attempted to push manifests, and that's unlikely
  to be what event-readers expect TaskNodeComplete to imply.

* Reconcile-time hopefully has very few instances where the CVO needs
  to stomp on changes, block on a recently Available=False operator,
  etc.  Eventing on each completed TaskNode would be lots of noise
  without much interesting signal.

During updates, we have the structured graph and blocking TaskNodes
described in docs/user/reconciliation.md, and the flow through that
graph is what the events from this commit will help shed light on.
You could also achieve this by preserving logs from the CVO pods as
they are repositioned throughout an update, but we don't have tooling
in CI to do that conveniently today.

The hardcoded name and namespace for cvoObjectRef isn't great (for
example, it won't work in pkg/start/start_integration_test.go , where
the ClusterVersion's name and namespace are random).  But it's the
pattern we've used since we started eventing in 475e71f (emit
events for each new payload, 2020-07-21, openshift#411), so I'm recylcing it
for now.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 13, 2020
Only during updates, because:

* Install-time is a free-for-all, where the CVO doesn't block on
  anything.  This would be a lot of "node complete" noise about nodes
  where we had only attempted to push manifests, and that's unlikely
  to be what event-readers expect TaskNodeComplete to imply.

* Reconcile-time hopefully has very few instances where the CVO needs
  to stomp on changes, block on a recently Available=False operator,
  etc.  Eventing on each completed TaskNode would be lots of noise
  without much interesting signal.

During updates, we have the structured graph and blocking TaskNodes
described in docs/user/reconciliation.md, and the flow through that
graph is what the events from this commit will help shed light on.
You could also achieve this by preserving logs from the CVO pods as
they are repositioned throughout an update, but we don't have tooling
in CI to do that conveniently today.

The hardcoded name and namespace for cvoObjectRef isn't great (for
example, it won't work in pkg/start/start_integration_test.go , where
the ClusterVersion's name and namespace are random).  But it's the
pattern we've used since we started eventing in 475e71f (emit
events for each new payload, 2020-07-21, openshift#411), so I'm recylcing it
for now.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 19, 2020
Only during updates, because:

* Install-time is a free-for-all, where the CVO doesn't block on
  anything.  This would be a lot of "node complete" noise about nodes
  where we had only attempted to push manifests, and that's unlikely
  to be what event-readers expect TaskNodeComplete to imply.

* Reconcile-time hopefully has very few instances where the CVO needs
  to stomp on changes, block on a recently Available=False operator,
  etc.  Eventing on each completed TaskNode would be lots of noise
  without much interesting signal.

During updates, we have the structured graph and blocking TaskNodes
described in docs/user/reconciliation.md, and the flow through that
graph is what the events from this commit will help shed light on.
You could also achieve this by preserving logs from the CVO pods as
they are repositioned throughout an update, but we don't have tooling
in CI to do that conveniently today.

The hardcoded name and namespace for cvoObjectRef isn't great (for
example, it won't work in pkg/start/start_integration_test.go , where
the ClusterVersion's name and namespace are random).  But it's the
pattern we've used since we started eventing in 475e71f (emit
events for each new payload, 2020-07-21, openshift#411), so I'm recylcing it
for now.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 25, 2020
Only during updates, because:

* Install-time is a free-for-all, where the CVO doesn't block on
  anything.  This would be a lot of "node complete" noise about nodes
  where we had only attempted to push manifests, and that's unlikely
  to be what event-readers expect TaskNodeComplete to imply.

* Reconcile-time hopefully has very few instances where the CVO needs
  to stomp on changes, block on a recently Available=False operator,
  etc.  Eventing on each completed TaskNode would be lots of noise
  without much interesting signal.

During updates, we have the structured graph and blocking TaskNodes
described in docs/user/reconciliation.md, and the flow through that
graph is what the events from this commit will help shed light on.
You could also achieve this by preserving logs from the CVO pods as
they are repositioned throughout an update, but we don't have tooling
in CI to do that conveniently today.

The hardcoded name and namespace for cvoObjectRef isn't great (for
example, it won't work in pkg/start/start_integration_test.go , where
the ClusterVersion's name and namespace are random).  But it's the
pattern we've used since we started eventing in 475e71f (emit
events for each new payload, 2020-07-21, openshift#411), so I'm recylcing it
for now.

Also log ApplyFailed events when we fail in apply, to remove some of
the guesswork in determining what manifest(s) had trouble.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 25, 2020
Only during updates, because:

* Install-time is a free-for-all, where the CVO doesn't block on
  anything.  This would be a lot of "node complete" noise about nodes
  where we had only attempted to push manifests, and that's unlikely
  to be what event-readers expect TaskNodeComplete to imply.

* Reconcile-time hopefully has very few instances where the CVO needs
  to stomp on changes, block on a recently Available=False operator,
  etc.  Eventing on each completed TaskNode would be lots of noise
  without much interesting signal.

During updates, we have the structured graph and blocking TaskNodes
described in docs/user/reconciliation.md, and the flow through that
graph is what the events from this commit will help shed light on.
You could also achieve this by preserving logs from the CVO pods as
they are repositioned throughout an update, but we don't have tooling
in CI to do that conveniently today.

The hardcoded name and namespace for cvoObjectRef isn't great (for
example, it won't work in pkg/start/start_integration_test.go , where
the ClusterVersion's name and namespace are random).  But it's the
pattern we've used since we started eventing in 475e71f (emit
events for each new payload, 2020-07-21, openshift#411), so I'm recylcing it
for now.

Also log ApplyFailed events when we fail in apply, to remove some of
the guesswork in determining what manifest(s) had trouble.
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 2, 2020
Only during updates, because:

* Install-time is a free-for-all, where the CVO doesn't block on
  anything.  This would be a lot of "node complete" noise about nodes
  where we had only attempted to push manifests, and that's unlikely
  to be what event-readers expect TaskNodeComplete to imply.

* Reconcile-time hopefully has very few instances where the CVO needs
  to stomp on changes, block on a recently Available=False operator,
  etc.  Eventing on each completed TaskNode would be lots of noise
  without much interesting signal.

During updates, we have the structured graph and blocking TaskNodes
described in docs/user/reconciliation.md, and the flow through that
graph is what the events from this commit will help shed light on.
You could also achieve this by preserving logs from the CVO pods as
they are repositioned throughout an update, but we don't have tooling
in CI to do that conveniently today.

The hardcoded name and namespace for cvoObjectRef isn't great (for
example, it won't work in pkg/start/start_integration_test.go , where
the ClusterVersion's name and namespace are random).  But it's the
pattern we've used since we started eventing in 475e71f (emit
events for each new payload, 2020-07-21, openshift#411), so I'm recylcing it
for now.

Also log ApplyFailed events when we fail in apply, to remove some of
the guesswork in determining what manifest(s) had trouble.
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 7, 2022
…failure

For unforced updates, when signature, etc. verification fails,
RetrievePayload returns an error, and we have emitted events for
RetrievePayload errors since 475e71f (emit events for each new
payload, 2020-07-21, openshift#411).  However, when the user forces the update,
we log but do not error out on verification failures.  With this
commit, we will also emit a warning event with an error message, which
will make it easier to understand how signature verification failed,
even when we don't have access to the logs of the outgoing
cluster-version operator [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2071998#c2
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 7, 2022
…failure

For unforced updates, when signature, etc. verification fails,
RetrievePayload returns an error, and we have emitted events for
RetrievePayload errors since 475e71f (emit events for each new
payload, 2020-07-21, openshift#411).  However, when the user forces the update,
we log but do not error out on verification failures.  With this
commit, we will also emit a warning event with an error message, which
will make it easier to understand how signature verification failed,
even when we don't have access to the logs of the outgoing
cluster-version operator [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2071998#c2
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 7, 2022
…failure

For unforced updates, when signature, etc. verification fails,
RetrievePayload returns an error, and we have emitted events for
RetrievePayload errors since 475e71f (emit events for each new
payload, 2020-07-21, openshift#411).  However, when the user forces the update,
we log but do not error out on verification failures.  With this
commit, we will also emit a warning event with an error message, which
will make it easier to understand how signature verification failed,
even when we don't have access to the logs of the outgoing
cluster-version operator [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2071998#c2
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 7, 2022
…failure

For unforced updates, when signature, etc. verification fails,
RetrievePayload returns an error, and we have emitted events for
RetrievePayload errors since 475e71f (emit events for each new
payload, 2020-07-21, openshift#411).  However, when the user forces the update,
we log but do not error out on verification failures.  With this
commit, we will also emit a warning event with an error message, which
will make it easier to understand how signature verification failed,
even when we don't have access to the logs of the outgoing
cluster-version operator [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2071998#c2
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 21, 2022
…failure

For unforced updates, when signature, etc. verification fails,
RetrievePayload returns an error, and we have emitted events for
RetrievePayload errors since 475e71f (emit events for each new
payload, 2020-07-21, openshift#411).  However, when the user forces the update,
we log but do not error out on verification failures.  With this
commit, we will also emit a warning event with an error message, which
will make it easier to understand how signature verification failed,
even when we don't have access to the logs of the outgoing
cluster-version operator [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2071998#c2
wking added a commit to wking/cluster-version-operator that referenced this pull request May 23, 2022
…failure

For unforced updates, when signature, etc. verification fails,
RetrievePayload returns an error, and we have emitted events for
RetrievePayload errors since 475e71f (emit events for each new
payload, 2020-07-21, openshift#411).  However, when the user forces the update,
we log but do not error out on verification failures.  With this
commit, we will also emit a warning event with an error message, which
will make it easier to understand how signature verification failed,
even when we don't have access to the logs of the outgoing
cluster-version operator [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2071998#c2
wking added a commit to wking/cluster-version-operator that referenced this pull request May 24, 2022
…failure

For unforced updates, when signature, etc. verification fails,
RetrievePayload returns an error, and we have emitted events for
RetrievePayload errors since 475e71f (emit events for each new
payload, 2020-07-21, openshift#411).  However, when the user forces the update,
we log but do not error out on verification failures.  With this
commit, we will also emit a warning event with an error message, which
will make it easier to understand how signature verification failed,
even when we don't have access to the logs of the outgoing
cluster-version operator [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2071998#c2
wking added a commit to wking/cluster-version-operator that referenced this pull request May 24, 2022
…failure

For unforced updates, when signature, etc. verification fails,
RetrievePayload returns an error, and we have emitted events for
RetrievePayload errors since 475e71f (emit events for each new
payload, 2020-07-21, openshift#411).  However, when the user forces the update,
we log but do not error out on verification failures.  With this
commit, we will also emit a warning event with an error message, which
will make it easier to understand how signature verification failed,
even when we don't have access to the logs of the outgoing
cluster-version operator [1].

We should transition away from the maintenance-mode
github.com/pkg/errors [2], but I'm deferring that to follow-up work.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2071998#c2
[2]: https://pkg.go.dev/github.com/pkg/errors#readme-roadmap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants