-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix endpointslicemirroring controller not create endpointslice when delete endpointslice after kube-controller-manager restarted #112197
Conversation
@Dingshujie: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Dingshujie: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@thockin @robscott @swetharepakula @aojea PTAL, thanks |
/assgin @thockin @robscott @swetharepakula @aojea |
/test pull-kubernetes-conformance-kind-ga-only-parallel |
return | ||
} | ||
// requeue the service for another sync, if | ||
// 1. endpointSliceTracker don't has endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should not be requeued if it is managed and the endpointSliceTracker
has the endpoint?
- I create endpoint and is mirrored, is not added to the tracker here?
- I delete the endpointslice, is in the tracker but I don't requeue so it is not recreated
I think I should recreate it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, my mistake, misspelled, endpointSliceTracker don't has this endpointslice not endpoint
- I create endpoint and is mirrored, is not added to the tracker here?
when endpoint mirroring controller create this endpointslice, then add endpointslice to tracker, but if kube-controller-manager restart, it won't be add to tracker.
- I delete the endpointslice, is in the tracker but I don't requeue so it is not recreated
I think I should recreate it
Under normal conditions(mirrored endpointslice created and kcm not restarted),if i delete a endpointslice that exist in tracker, and onEndpointSliceDelete will check tracker whether has this endpointslice, and this endpointslice whether marked want deleted, so if i delete endpointslice is in the tracker, now controller will requeue this endpointslice
but if kcm is restart, tracker don't has this endpointslice, now if i delete endpointslice, it don't requeue
// onEndpointSliceDelete queues a sync for the relevant Endpoints resource for a
// sync if the EndpointSlice resource version does not match the expected
// version in the endpointSliceTracker.
func (c *Controller) onEndpointSliceDelete(obj interface{}) {
endpointSlice := getEndpointSliceFromDeleteAction(obj)
if endpointSlice == nil {
utilruntime.HandleError(fmt.Errorf("onEndpointSliceDelete() expected type discovery.EndpointSlice, got %T", obj))
return
}
if managedByController(endpointSlice) && c.endpointSliceTracker.Has(endpointSlice) {
// This returns false if we didn't expect the EndpointSlice to be
// deleted. If that is the case, we queue the Service for another sync.
if !c.endpointSliceTracker.HandleDeletion(endpointSlice) {
c.queueEndpointsForEndpointSlice(endpointSlice)
}
}
}
so i change the check method, if tracker don't has this endpointslice, requeue this endpoints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, no, I didn't mean the naming, I mean, what happens if the endpointSliceTracker has the the endpoint slice and is deleted? why don't we reconcile
Is not the problem that the slice is not in the tracker? should not handle this case and add it to the tracker?
@robscott have a better understanding of the internals of this controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
under my understanding, tracker may avoid some race condition, if controller create endpointslice,but may not receive event from apiserver,and lister can not get this endpointslice from lister, and if we do a reconcile,may lead something wrong. so controller call tracker ExpectDeletion/Update function when it call Create/Update/Delete API, update new generation in tracker.
if we has stale information, will skip this sync, try later
if c.endpointSliceTracker.StaleSlices(svc, endpointSlices) {
return endpointsliceutil.NewStaleInformerCache("EndpointSlice informer cache is out of date")
}
/assign @robscott |
I'd like to see an integration test verifying this scenarios, can you please add one? |
yeah, my pleasure. |
…elete endpointslice after kube-controller-manager restarted Signed-off-by: DingShujie <dingshujie@huawei.com>
adfeae8
to
c6b1824
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Dingshujie The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -528,6 +528,123 @@ func TestEndpointSliceMirroringSelectorTransition(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestEndpointSliceMirroringDeleteWhenEndpoointSliceMirroringControllerRestart(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I run this test without the other commit it passes,
t.Fatalf("Error deleting EndpointSlices(%s/%s): %v", ns.Name, esList.Items[0].Name, err) | ||
} | ||
// wait endpoint to be created | ||
err = waitForMirroredSlices(t, client, ns.Name, service.Name, len(esList.Items)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you have to replace len(esList.Items)
by 1 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, nevermind, this has to be > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think first we have to assert the current slice has been deleted, otherwise we can List the slice being deleted
diff --git a/test/integration/endpointslice/endpointslicemirroring_test.go b/test/integration/endpointslice/endpointslicemirroring_test.go
index 1959872d219..993a33a0db3 100644
--- a/test/integration/endpointslice/endpointslicemirroring_test.go
+++ b/test/integration/endpointslice/endpointslicemirroring_test.go
@@ -26,7 +26,9 @@ import (
corev1 "k8s.io/api/core/v1"
discovery "k8s.io/api/discovery/v1"
apiequality "k8s.io/apimachinery/pkg/api/equality"
+ apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/informers"
clientset "k8s.io/client-go/kubernetes"
@@ -638,8 +640,22 @@ func TestEndpointSliceMirroringDeleteWhenEndpoointSliceMirroringControllerRestar
if err != nil {
t.Fatalf("Error deleting EndpointSlices(%s/%s): %v", ns.Name, esList.Items[0].Name, err)
}
+ // wait endpoint slice to be deleted
+ err = wait.PollImmediate(1*time.Second, wait.ForeverTestTimeout, func() (bool, error) {
+ _, err := client.DiscoveryV1().EndpointSlices(ns.Name).Get(ctx, esList.Items[0].Name, metav1.GetOptions{})
+ if err != nil {
+ if apierrors.IsNotFound(err) {
+ return true, nil
+ }
+ return false, err
+ }
+ return false, nil
+ })
+ if err != nil {
+ t.Fatalf("Error deleting EndpointSlices(%s/%s): %v", ns.Name, esList.Items[0].Name, err)
+ }
// wait endpoint to be created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this way we avoid the race
it need to wait add event to process complete, i will try to update integration test for reproduce it |
with this patch #112197 (comment) it waits until it confirms the original has been deleted and asserts that it has been recreated ... I think that the controller recreates it |
6460737
to
c4c56a6
Compare
@aojea PTAL, I run this test without this commits it failed |
the test is racy https://github.com/kubernetes/kubernetes/pull/112197/files#r961381285 once you Delete the slice is not inmidiatly removed, and the List can get the slice that is being deleted, giving a false positive, we have to assert that the Slice is deleted and a new one is created , maybe you can store the UUID from the old one and compare it to the new one? |
…ntSliceMirroringControllerRestart Signed-off-by: DingShujie <dingshujie@huawei.com>
c4c56a6
to
68002a7
Compare
Good point, i change to compared UID, PTAL, thanks |
/test pull-kubernetes-e2e-kind |
@@ -228,6 +228,12 @@ func (c *Controller) Run(workers int, stopCh <-chan struct{}) { | |||
<-stopCh | |||
} | |||
|
|||
// Queue return Controller ratelimit queue | |||
// only for testing | |||
func (c *Controller) Queue() workqueue.RateLimitingInterface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we want to export the queue and check on the queue len on the tests if we can assert on the slices created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment still stands
I will not have much time this week, but I still wonder if this scenario should be part of the reconcile loop:
but better wait for @robscott , he will return from vacation soon |
ok. waiting for @robscott reply |
@robscott do you have time to review this bugfixs? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
fix endpointslicemirroring controller not create endpointslice when delete endpointslice after kube-controller-manager restarted
Which issue(s) this PR fixes:
Fixes #112143
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: