Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpointslicemirroring controller not create endpointslice #112143

Open
Dingshujie opened this issue Aug 31, 2022 · 12 comments
Open

endpointslicemirroring controller not create endpointslice #112143

Dingshujie opened this issue Aug 31, 2022 · 12 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@Dingshujie
Copy link
Member

What happened?

  1. create a service without selector, and manual create endpoints for this service
  2. endpointslicemirroring controller will create an endpointslice for this service.
  3. after endpoint slice created, and kube-controller-manager restart.
  4. after kube-controller-manager restart, someone delete this endpointslice

and endpointslice not recreate until endpoint/service update event or kube-controller-manager restart

What did you expect to happen?

endpointslice will be recreated

How can we reproduce it (as minimally and precisely as possible)?

  1. create a service without selector, and manual create endpoints for this service
  2. endpointslicemirroring controller will create an endpointslice for this service.
  3. after endpoint slice created, and kube-controller-manager restart.
  4. after kube-controller-manager restart, someone delete this endpointslice

Anything else we need to know?

after restart, endpointslicemirroring controller sync endpoint once, at that time, endpointslice exist and no need to update, so this endpointslice not add to endpointSliceTracker.

when endpointslicemirroring receive an endpointslice delete event, will check endpointSliceTracker has this endpointslice, if not exist, it will not requeue this endpoint slice,so if there is no relevant event happened,endpointslice will not be recreated.

// onEndpointSliceDelete queues a sync for the relevant Endpoints resource for a
// sync if the EndpointSlice resource version does not match the expected
// version in the endpointSliceTracker.
func (c *Controller) onEndpointSliceDelete(obj interface{}) {
	endpointSlice := getEndpointSliceFromDeleteAction(obj)
	if endpointSlice == nil {
		utilruntime.HandleError(fmt.Errorf("onEndpointSliceDelete() expected type discovery.EndpointSlice, got %T", obj))
		return
	}
	if managedByController(endpointSlice) && c.endpointSliceTracker.Has(endpointSlice) {
		// This returns false if we didn't expect the EndpointSlice to be
		// deleted. If that is the case, we queue the Service for another sync.
		if !c.endpointSliceTracker.HandleDeletion(endpointSlice) {
			c.queueEndpointsForEndpointSlice(endpointSlice)
		}
	}
}

Kubernetes version

$ kubectl version
# paste output here

v1.23.5

Cloud provider

huawei cloud

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

CentOS 7.6

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@Dingshujie Dingshujie added the kind/bug Categorizes issue or PR as related to a bug. label Aug 31, 2022
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 31, 2022
@Dingshujie
Copy link
Member Author

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 31, 2022
@Dingshujie
Copy link
Member Author

Dingshujie commented Aug 31, 2022

@robscott @freehan @aojea @thockin @kubernetes/sig-network-bugs

@aojea
Copy link
Member

aojea commented Aug 31, 2022

when endpointslicemirroring receive an endpointslice delete event, will check endpointSliceTracker has this endpointslice, if not exist, it will not requeue this endpoint slice,so if there is no relevant event happened,endpointslice will not be recreated.

@Dingshujie the first thing the informer does after restart is to list all the endpoint slices existing and send them to the event handler, unless there is a race it seems that the endpointslice should be present on the tracker

Checking the code that you linked, how do you know if the comparison is not failing !c.endpointSliceTracker.HandleDeletion(endpointSlice) instead?

// HandleDeletion removes the generation in this endpointSliceTracker for the
// provided EndpointSlice. This returns true if the tracker expected this
// EndpointSlice to be deleted and false if not.
func (est *endpointSliceTracker) HandleDeletion(endpointSlice *discovery.EndpointSlice) bool {
est.lock.Lock()
defer est.lock.Unlock()
gfs, ok := est.generationsForSliceUnsafe(endpointSlice)
if ok {
g, ok := gfs[endpointSlice.UID]
delete(gfs, endpointSlice.UID)
if ok && g != deletionExpected {
return false
}
}
return true
}

@Dingshujie
Copy link
Member Author

@Dingshujie the first thing the informer does after restart is to list all the endpoint slices existing and send them to the event handler, unless there is a race it seems that the endpointslice should be present on the tracker

@aojea thanks for reploy, endpointslicemirroring controller only add/update endpointsclice to endpointSliceTracker when endpointslice is need to create or update.when controller restarted, informer recevied all endpoint slices, and after reconcile, find nothing to create or update, so the endpointslices not present on endpointSliceTracker.

// finalize creates, updates, and deletes slices as specified
func (r *reconciler) finalize(endpoints *corev1.Endpoints, slices slicesByAction) error {
	// If there are slices to create and delete, recycle the slices marked for
	// deletion by replacing creates with updates of slices that would otherwise
	// be deleted.
	recycleSlices(&slices)

	epsClient := r.client.DiscoveryV1().EndpointSlices(endpoints.Namespace)

	// Don't create more EndpointSlices if corresponding Endpoints resource is
	// being deleted.
	if endpoints.DeletionTimestamp == nil {
		for _, endpointSlice := range slices.toCreate {
			createdSlice, err := epsClient.Create(context.TODO(), endpointSlice, metav1.CreateOptions{})
			if err != nil {
				// If the namespace is terminating, creates will continue to fail. Simply drop the item.
				if errors.HasStatusCause(err, corev1.NamespaceTerminatingCause) {
					return nil
				}
				return fmt.Errorf("failed to create EndpointSlice for Endpoints %s/%s: %v", endpoints.Namespace, endpoints.Name, err)
			}
			r.endpointSliceTracker.Update(createdSlice)
			metrics.EndpointSliceChanges.WithLabelValues("create").Inc()
		}
	}

	for _, endpointSlice := range slices.toUpdate {
		updatedSlice, err := epsClient.Update(context.TODO(), endpointSlice, metav1.UpdateOptions{})
		if err != nil {
			return fmt.Errorf("failed to update %s EndpointSlice for Endpoints %s/%s: %v", endpointSlice.Name, endpoints.Namespace, endpoints.Name, err)
		}
		r.endpointSliceTracker.Update(updatedSlice)
		metrics.EndpointSliceChanges.WithLabelValues("update").Inc()
	}

	for _, endpointSlice := range slices.toDelete {
		err := epsClient.Delete(context.TODO(), endpointSlice.Name, metav1.DeleteOptions{})
		if err != nil {
			return fmt.Errorf("failed to delete %s EndpointSlice for Endpoints %s/%s: %v", endpointSlice.Name, endpoints.Namespace, endpoints.Name, err)
		}
		r.endpointSliceTracker.ExpectDeletion(endpointSlice)
		metrics.EndpointSliceChanges.WithLabelValues("delete").Inc()
	}

	return nil
}

@Dingshujie
Copy link
Member Author

/assign

@thockin
Copy link
Member

thockin commented Sep 1, 2022

@Dingshujie did you self-assign because you want to try to fix?

@Dingshujie
Copy link
Member Author

@Dingshujie did you self-assign because you want to try to fix?

yes, can you review this bugfixs? #112197 thanks

@aojea
Copy link
Member

aojea commented Sep 2, 2022

I see, so the problem is that the Delete is not handled because the endpointSlice is not in the tracker ...

... and the slice is not in the tracker because after the restart, the tracker starts clean, and the reconcile loop doesn't add it to the tracker, since there is no action to perform on the slice ...

if the controller relies on the tracker, maybe we should consider this case on the reconcile loop , so the slice is correctly added to the tracker ...

@bridgetkromhout
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2022
@thockin thockin removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2022
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
8 participants