Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client-go: Add support for API streaming to the reflector #110772

Merged

Conversation

p0lyn0mial
Copy link
Contributor

@p0lyn0mial p0lyn0mial commented Jun 24, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

This pull request introduces new alpha functionality to the reflector, allowing users to enable API streaming.
To activate this feature, users can set the ENABLE_CLIENT_GO_WATCH_LIST_ALPHA environmental variable.
It is important to note that the server must support streaming for this feature to function properly.
If streaming is not supported by the server, the reflector will revert to the previous method
of obtaining data through LIST/WATCH semantics.

Which issue(s) this PR fixes:

xref: kubernetes/enhancements#3157
xref: #115402
xref: #110960

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Introduces new alpha functionality to the reflector, allowing user to enable API streaming.

To activate this feature, users can set the `ENABLE_CLIENT_GO_WATCH_LIST_ALPHA` environmental variable.
It is important to note that the server must support streaming for this feature to function properly.
If streaming is not supported by the server, the reflector will revert to the previous method
of obtaining data through LIST/WATCH semantics.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/3157-watch-list

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 24, 2022
@p0lyn0mial
Copy link
Contributor Author

/assign @wojtek-t

@k8s-ci-robot k8s-ci-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 24, 2022
@p0lyn0mial p0lyn0mial force-pushed the upstream-reflector-gets-stream branch from deea5e9 to 8632649 Compare June 28, 2022 20:41
@p0lyn0mial p0lyn0mial force-pushed the upstream-reflector-gets-stream branch from 8632649 to ac00d9e Compare June 29, 2022 12:26
@leilajal
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 30, 2022
@p0lyn0mial p0lyn0mial force-pushed the upstream-reflector-gets-stream branch from ac00d9e to 04bcbfa Compare July 1, 2022 14:54
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 1, 2022
@p0lyn0mial p0lyn0mial force-pushed the upstream-reflector-gets-stream branch 4 times, most recently from 7b7c78c to 25a8421 Compare July 4, 2022 14:10
@k8s-ci-robot k8s-ci-robot removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 4, 2022
@p0lyn0mial p0lyn0mial force-pushed the upstream-reflector-gets-stream branch from 95f3bd1 to 19fd154 Compare March 9, 2023 13:53
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 9, 2023
@p0lyn0mial p0lyn0mial force-pushed the upstream-reflector-gets-stream branch 2 times, most recently from 5dc3c67 to 718904b Compare March 9, 2023 14:14
if isErrorRetriableWithSideEffectsFn(err) {
continue
}
klog.V(2).Infof("%s: watch-list of %v ended with: %v", r.name, r.expectedGVK, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if bookmarkReceived {
break // success we got initial data and the bookmark
}
// try one more time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this comment - it doesn't bring any value.

}
// try one more time
}
r.setIsLastSyncResourceVersionUnavailable(false) // watch-list was successful
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

// We successfully got initial state from watch-list confirmed by the
// "k8s.io/initial-events-end" bookmark.
initTrace.Step(...)
r.setIsLastSyncResource...
...

@@ -651,17 +801,35 @@ func (r *Reflector) relistResourceVersion() string {
r.lastSyncResourceVersionMutex.RLock()
defer r.lastSyncResourceVersionMutex.RUnlock()

if r.lastSyncResourceVersion == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct - I mean, we're changing semantics here.

If isLastSyncResourceVersionUnavailable is true, then previously we would return "", and now we will return '0" if also lastSyncResourceVersion is empty.

Let's not change the semantics for the old path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[FWIW, I'm not sure why we actually did that, but even if we want to change it, we should do that explicitly in a separate PR and not change it as part of larger complex PR.]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't setIsLastSyncResourceVersionUnavailable imply that r. lastSyncResourceVersion is != "" ? After all we must have provided some RV to the server.

Anyway I can bring it back to the original state just in case.

// It begins with synthetic "Added" events of all resources up to the most recent ResourceVersion.
// It ends with a synthetic "Bookmark" event containing the most recent ResourceVersion.
// After receiving a "Bookmark" event the reflector is considered to be synchronized.
// It replaces its internal store with the collected items (syncWith) and reuses the current watch requests for getting further events.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove "(syncWith)" and split the line a bit :)

var err error
var temporaryStore Store
var resourceVersion string
var bookmarkReceived *bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This one isn't used in the scope of function - let's define it where it is used only.

var temporaryStore Store
var resourceVersion string
var bookmarkReceived *bool
isErrorRetriableWithSideEffectsFn := func(err error) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not a huge fan of those nested functions - I'm wondering if we can make it an actual reflector function...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in the future, for now it applies only to the watchList method.
I can add a TODO for me - make it a method and find a way to unify error handling in the reflector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not great, but I can probably live with this.

@@ -546,16 +682,13 @@ func watchHandler(start time.Time,
name string,
expectedTypeName string,
setLastSyncResourceVersion func(string),
exitOnInitEventBookmark *bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm sorry - that was my mental shortcut, I think we should name it
"exitOnInitialEventsEndBookmark"
(for consistency with annotation name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np

@@ -609,6 +742,11 @@ loop:
}
case watch.Bookmark:
// A `Bookmark` means watch has synced here, just update the resourceVersion
if _, ok := meta.GetAnnotations()["k8s.io/initial-events-end"]; ok {
if exitOnInitEventBookmark != nil {
exitOnInitEventBookmark = pointer.Bool(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*exitOn.. = true ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right :) - thx.

@p0lyn0mial p0lyn0mial force-pushed the upstream-reflector-gets-stream branch 3 times, most recently from cb4b9cb to aac3889 Compare March 9, 2023 16:47
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the test coverage - I added some comments, but there isn't anything big there.

if !apierrors.IsInvalid(err) {
return err
}
klog.V(2).Info("the watch-list feature is not supported by the server, falling back to the previous LIST/WATCH semantic")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a warning

}
klog.V(2).Info("the watch-list feature is not supported by the server, falling back to the previous LIST/WATCH semantic")
fallbackToList = true
w = nil // just in case it was actually set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit;

// Ensure that we won't accidentally pass some garbage down the watch.
w = nil

var temporaryStore Store
var resourceVersion string
var bookmarkReceived *bool
isErrorRetriableWithSideEffectsFn := func(err error) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not great, but I can probably live with this.

}
return nil, err
}
// if we are here, that means we either hit the case 1 or the case 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment - this doesn't bring any value

return nil, err
}
if *bookmarkReceived {
break // success we got initial data and the bookmark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this comment - this is obvious from variables naming.

lw.listCounter++
lw.requestOptions = append(lw.requestOptions, options)
if lw.listCounter == lw.closeAfterListRequests {
defer lw.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's defer here? Can't we just close it immediately?

if lw.watchCounter == lw.closeAfterWatchRequests {
defer lw.close()
}
if lw.watcherErrorPredicate != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's call it maybe watchOptionsPredicate

{
name: "the reflector can fall back to old LIST/WATCH semantics when a server doesn't support streaming",
watchServerErrorPredicate: func(options metav1.ListOptions) error {
if options.ResourceVersionMatch == metav1.ResourceVersionMatchNotOlderThan {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more intuitive to check options.SendInitialEvents here...

returnedWatchEvents: []watch.Event{
{Type: watch.Added, Object: makePod("p1", "1")},
// second request
{Type: watch.Added, Object: makePod("p2", "2")},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would sent the above p1 event again here - so that it looks more like a retry.

}
}(),
stopAfterWatchEvents: 3,
closeAfterWatchEvents: 5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand it - there aren't 5 events here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have copy&past it from the previous test cases. Will remove.
It didn't affect the test - was unused.

@p0lyn0mial p0lyn0mial force-pushed the upstream-reflector-gets-stream branch from aac3889 to 3a238c3 Compare March 10, 2023 09:34
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 10, 2023
@p0lyn0mial p0lyn0mial changed the title reflector gets a stream of data client-go: Add support for API streaming to the reflector Mar 10, 2023
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last 3 minor nits - other than that LGTM.

go func() {
for i, e := range scenario.watchEvents {
listWatcher.fakeWatcher.Action(e.Type, e.Object)
i++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand it. I played with it:
https://go.dev/play/p/FEuLkFBGb_g

having i++ there and not having it actually doesn't change the output - that's super misleading...

It would be much more intuitive to me if you instead change to:

  • removing this i++ fline
  • switching the following ifs to if i+1 == ....

actualPods = append(actualPods, *p.(*v1.Pod))
}

for _, expectedPod := range expectedPods {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, let's just sort the both lists by name and compare the whole lists - it's more intuitive.

[This code in general would return true for the following lists:

  • [a, a, b]
  • [a, b, b]

if r.UseWatchList {
w, err = r.watchList(stopCh)
if w == nil && err == nil {
return nil // stopCh requested
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

// stopCh was closed
return nil

@p0lyn0mial p0lyn0mial force-pushed the upstream-reflector-gets-stream branch from 3a238c3 to 9d872db Compare March 10, 2023 11:14
@p0lyn0mial p0lyn0mial force-pushed the upstream-reflector-gets-stream branch from 9d872db to 966b26d Compare March 10, 2023 11:16
@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1e1503c42c1a45be2c251bab6cabcfed37b1e119

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2023
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 10, 2023

@p0lyn0mial: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce-ubuntu-containerd 077c5bd link true /test pull-kubernetes-e2e-gce-ubuntu-containerd
pull-kubernetes-e2e-gce-100-performance 077c5bd link true /test pull-kubernetes-e2e-gce-100-performance

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@p0lyn0mial
Copy link
Contributor Author

/test pull-kubernetes-e2e-kind-ipv6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants