Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s.io/client-go/tools/[events|record]: support context #120729

Merged
merged 2 commits into from Oct 4, 2023

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Sep 18, 2023

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

In events, using StartRecordingToSinkWithContext instead of StartRecordingToSink and StartLogging instead of StartStructuredLogging has several advantages:

  • Spawned goroutines no longer get stuck for extended periods of time during shutdown when passing in a context that gets canceled.
  • Log output can be directed towards a specific logger instead of the global default, for example one which writes to a testing.T instance.
  • The new methods return an error when something went wrong instead of merely recording the error.

That last point is the reason for deprecating the old methods instead of merely adding new alternatives.

In record, setting a context when constructing an EventBroadcaster makes calling Shutdown
optional. It can also be used to specify the logger.

The EventRecorder doesn't need a context, so its construction just takes a
logger as optional parameter.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

client-go: k8s.io/client-go/tools events and record packages have new APIs for specifying a context and logger

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 18, 2023
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 18, 2023
@pohly
Copy link
Contributor Author

pohly commented Sep 18, 2023

Once this is merged, I intend to update existing code to use the modified APIs. I can also include that in this PR, if it is preferred.

@pohly
Copy link
Contributor Author

pohly commented Sep 18, 2023

/cc @dgrisonnet

@@ -45,19 +45,20 @@ func (recorder *recorderImpl) Eventf(regarding runtime.Object, related runtime.O
message := fmt.Sprintf(note, args...)
refRegarding, err := reference.GetReference(recorder.scheme, regarding)
if err != nil {
klog.Errorf("Could not construct reference to: '%#v' due to: '%v'. Will not report event: '%v' '%v' '%v'", regarding, err, eventtype, reason, message)
// TODO: extend Eventf with ctx parameter.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are currently 226 such calls. That's a lot, but IMHO it is the right thing to do. Otherwise the error will not get logged as part of the current activity which triggered the event.

I prefer context over klog.Logger here because if we ever need to do some real work inside Eventf, that context would provide the deadline for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also an important point in favor of context: the logger is normally not needed, so the usual argument against context when a logger is needed (the extra lookup) doesn't apply.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is really worth adding a context to Eventf. We have 226 such calls in-tree, but it will also impact the consumers of client-go. Realistically, this change will impact the majority of the operators that are in the wild today.

here because if we ever need to do some real work inside Eventf

I don't think we will ever need to do anything more than create an Event object here. All the aggregation and apiserver related logics are done at the broadcaster's level today and I don't think this is bound to change.
Also, with the changes you made from this PR, we will now be able to cancel broadcasting from the context, which isn't something we could do before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to add an EventfWithLogger which takes a klog.Logger as first parameter. Using that function is optional, i.e. Eventf will not get deprecated, but code which cares about getting the error log messages recorded "correctly" have an API to do it.

Because it's not a context, Eventf can be implemented as EventfWithLogger(klog.Background(), ...). If we needed a context, Eventf would remain incomplete because it would forever have to use context.TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a different approach: EventRecorder.WithContext creates a new EventRecorder which holds a context for the Eventf call. I chose that approach because record.EventRecorder contains three calls which all would have needed separate variants with an explicit ctx parameter.

Extending a published interface that could have been implemented elsewhere is always risky because it is an API break. I think it's safe in this case.

@dgrisonnet: what do you think?

Copy link
Contributor Author

@pohly pohly Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change of plan... there are enough "fake EventSource" implementations that extending that interface is annoying. I changed the code so that EventSource is unchanged and added EventSourceContext as extension of it. To avoid type casting, constructors return that. This leads to an API change in EventBroadcaster but perhaps that one is only used and not implemented by clients?

Code using the API still works as before:

var recorder EventRecorder

recorder = broadcaster.NewRecorder(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit unsure whether ctx or logger is a better parameter. Some code calling Eventf calls has a logger, some a context. Ideally, passing one or the other would be possible without additional overhead. I'll explore that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing an interface instead of plain ctx or logger turned out to be too complicated. As some code only has a logger and changing that code would be more complicated than adding klog.FromContext in code which has a ctx, let's just use logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the version which uses WithLogger instead of WithContext.

Converting existing code is often as easy as s/recorder record.EventRecorder/recorder record.EventRecorderLogger/ and s/recorder.Event/recorder.WithLogger(logger).Event/. I converted two controllers as a trial, but won't include those changes in this PR - that can come separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with your last approach with WithLogger. It feels a bit odd to have the logger in a context in the broadcaster and a logger directly in the recorder, but since the recorder doesn't need a context, I think it makes sense.

@alexzielenski
Copy link
Contributor

alexzielenski commented Sep 19, 2023

/sig instrumentation

@k8s-ci-robot k8s-ci-robot added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Sep 19, 2023
@k8s-ci-robot
Copy link
Contributor

@alexzielenski: Those labels are not set on the issue: sig/apimachinery

In response to this:

/sig instrumentation
/remove-sig apimachinery

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alexzielenski
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 19, 2023
for {
select {
case <-e.cancelationCtx.Done():
watcher.Stop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're risking double-stopping here if the called would decide to stop it too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that is not safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems here it is:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/watch/mux.go#L316

[I just saw things across codebase, where Stop() isn't necessarily idempotent...]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely worth checking.

@wojtek-t wojtek-t self-assigned this Sep 21, 2023
@@ -45,19 +45,20 @@ func (recorder *recorderImpl) Eventf(regarding runtime.Object, related runtime.O
message := fmt.Sprintf(note, args...)
refRegarding, err := reference.GetReference(recorder.scheme, regarding)
if err != nil {
klog.Errorf("Could not construct reference to: '%#v' due to: '%v'. Will not report event: '%v' '%v' '%v'", regarding, err, eventtype, reason, message)
// TODO: extend Eventf with ctx parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is really worth adding a context to Eventf. We have 226 such calls in-tree, but it will also impact the consumers of client-go. Realistically, this change will impact the majority of the operators that are in the wild today.

here because if we ever need to do some real work inside Eventf

I don't think we will ever need to do anything more than create an Event object here. All the aggregation and apiserver related logics are done at the broadcaster's level today and I don't think this is bound to change.
Also, with the changes you made from this PR, we will now be able to cancel broadcasting from the context, which isn't something we could do before.


type config struct {
CorrelatorOptions
context.Context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the fact that you added a context here. Although it is usually not recommend to have context in structs, here I think it makes sense since Eventf doesn't really have a scope as it is delegating all the work to the broadcaster.
This makes even more sense in future events/v1 implementation since we wanted to add a backoff mechanism to the broadcaster to avoid burst of Events on the apiserver.

@dashpole
Copy link
Contributor

/triage accepted
/assign @dgrisonnet
/priority important-soon

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Sep 21, 2023
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 21, 2023
@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 26, 2023
// Package internal is needed to break an import cycle: record.EventRecorderAdapter
// needs this interface definition to implement it, but event.NewEventBroadcasterAdapter
// needs record.NewBroadcaster. Therefore this interface cannot be in event/interfaces.go.
package internal
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that this internal package is needed.

This could be avoided by moving record.EventRecorderAdapter into events, but that's going to break all code currently expecting it in record. I'm not sure why it was added there in the first place, but it is what it is now...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cyclic dependency is caused by the fact that core/v1 needs to be compatible with events/v1 and events/v1 with core/v1.
But I don't know why this wasn't added to the events package instead of record. Might be an oversight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, I think it is fine to have an internal package because of this limitation

@pohly pohly force-pushed the events-context branch 3 times, most recently from 4aa0c3f to 4e27815 Compare September 27, 2023 06:18
Using StartRecordingToSinkWithContext instead of StartRecordingToSink and
StartLogging instead of StartStructuredLogging has several advantages:

- Spawned goroutines no longer get stuck for extended periods of
  time during shutdown when passing in a context that gets canceled.
- Log output can be directed towards a specific logger instead of the global
  default, for example one which writes to a testing.T instance.
- The new methods return an error when something went wrong instead of
  merely recording the error.

That last point is the reason for deprecating the old methods instead of merely
adding new alternatives.

Setting a context when constructing an EventBroadcaster makes calling Shutdown
optional. It can also be used to specify the logger.

Both EventRecorder interfaces in tools/events and tools/record now have a
WithLogger helper. Using that method is optional, but recommended to support
contextual logging properly. Without it, errors that occur while emitting an
event are not associated with the caller.
Because the EventBroadcaster code now has a a context, changing the EventSink
interface so that the methods accepts a context instead of using context.TODO
becomes possible.
// StartLogging starts sending events received from this EventBroadcaster to the structured logger.
// To adjust verbosity, use the logger's V method (i.e. pass `logger.V(3)` instead of `logger`).
// The returned function can be ignored or used to stop recording, if desired.
func (e *eventBroadcasterImpl) StartLogging(logger klog.Logger) (func(), error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to propagate this logger inside of the broadcaster context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one broadcaster context, and it was set up earlier already. I don't think we should change it.

Besides, StartLogging might be called more than once and each invocation starts recording events to some other logger. This might never be done, but it shows the conceptual difference: this logger really is only for this particular call.

@dgrisonnet
Copy link
Member

This looks good to me.

@wojtek-t could you have a second look at this PR when you get the chance?

@wojtek-t
Copy link
Member

wojtek-t commented Oct 2, 2023

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 2, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 491322d8d300b0d4d72803b1e774e4945c60f7c7

@pohly
Copy link
Contributor Author

pohly commented Oct 2, 2023

/assign @sttts

For dependency approval. The new internal module shows up in vendor/modules.txt, that's all.

@sttts
Copy link
Contributor

sttts commented Oct 4, 2023

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, sttts, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2023
@pohly
Copy link
Contributor Author

pohly commented Oct 4, 2023

/retest

@k8s-ci-robot k8s-ci-robot merged commit f936f69 into kubernetes:master Oct 4, 2023
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants