Implement ServiceL2Status CRD to expose service announcing status #2198

lwabish · 2023-12-05T09:57:33Z

This is an implement of what I was assigned in #2158.
The core implementation have almost been done and I have test the feature in my local dev-env.
I will continue doing some detailed work such as the helm chart files, etc.
Please do some early code review if possible, any suggestions would be appreciated.

/kind feature

Implement ServiceL2Status CRD to expose service announcing status in layer2 mode

oribon · 2023-12-06T16:23:00Z

Thanks a lot for opening this!
I think we should take a bit different approach regarding how we manage these resources.
Instead of an imperative approach where this is embedded in the existing layer2_controller and we take action according to the current step, we should go for a more declarative one where we have a new k8s controller that can react to general events, fetching the desired state of a service and building the status accordignly.
More specifically - we can make the layer2_controller expose the state of advertised services (e.g by a func that accepts a namespaced name for a service and return its advertisement state), add a new k8s controller that listens to a channel and feed that channel from the layer2_controller every time a state of an advertised service changes with an event that service x has changed. When the new k8s controller gets the event that service x has changed it queries its state and builds the relevant status.
You can take inspiration from how we did it in frr-k8s, where the nodestate controller fetches the different pieces to create the state from other components:
https://github.com/metallb/frr-k8s/blob/main/internal/controller/frrstate_controller.go
https://github.com/metallb/frr-k8s/blob/8ea5c2e33beaab49af944d14452c85993095e844/cmd/main.go#L197

This will allow us to delegate the concerns of managing the statuses to a single point which should be easy to follow.
Also, there are still some corners we need to make sure we're covering such as not leaving stale statuses around and making sure the user doesn't edit them but we can leave them for later once we settle on the general picture. wdyt?

Again, thanks a lot for your efforts and please let me know if you need any clarifications or help!

lwabish · 2023-12-07T03:07:14Z

Got it. I'll do some refactor.

lwabish · 2023-12-15T02:29:08Z

@oribon Please review the new implement

oribon

Thanks a lot for the PR!
did a first pass, can you please add an e2e as well so we have a general sense if it's working? Please let me know if you need anything :)

oribon · 2023-12-18T09:28:03Z

internal/layer2/announcer.go

@@ -14,6 +14,8 @@ import (
 	"github.com/go-kit/log/level"
 )

+type StatusFetcher func(string) []IPAdvertisement


this should be defined in layer2_status_controller.go (see https://go.dev/wiki/CodeReviewComments#interfaces)
also I think it is nicer if it would receive namespace+name instead of namespaced name

oribon · 2023-12-18T10:13:34Z

speaker/main.go

+		Layer2StatusChange: func(svc *v1.Service) {
+			statusNotifyChan <- event.GenericEvent{Object: svc}
+		},


Since we aren't really using the service object for this purpose, it makes sense to define a new object for the event under the new controller:

type l2StatusEvent struct { metav1.TypeMeta metav1.ObjectMeta } func (evt *l2StatusEvent) DeepCopyObject() runtime.Object { res := new(l2StatusEvent) res.Name = evt.Name res.Namespace = evt.Namespace return res } func NewL2StatusEvent(namespace, name string) event.GenericEvent { evt := l2StatusEvent{} evt.Name = name evt.Namespace = namespace return event.GenericEvent{Object: &evt} }

This is mostly copied from the new frrk8s_config_controller.go which is worth comparing against and making the adjustments (I'll refer to it later again probably)

oribon · 2023-12-18T10:21:18Z

speaker/layer2_controller.go

+	// Before, without exposing status to cluster, all status are inside speakers.
+	// So We didn't need the svc instance and pure name of string is enough to act as key to maintain internal status.
+	// But Now we have to parse the name to build an object due to controller channel limitations.
+	parts := strings.Split(name, string(types.Separator))
+	if len(parts) != 2 {
+		level.Warn(l).Log("name is not a namespacedName", name)
+		return fmt.Errorf("not a namespacedName")
+	}
+	svc := &v1.Service{}
+	svc.Namespace, svc.Name = parts[0], parts[1]


this can be replaced by:

svcNamespace, svcName, err := cache.SplitMetaNamespaceKey(name) ...

from "k8s.io/client-go/tools/cache"

oribon · 2023-12-18T10:23:54Z

speaker/layer2_controller.go

+		if !statusUpdated {
+			c.onStatusChange(svc)
+			statusUpdated = true
+		}


I think it makes sense to change this to updateStatus := false and calling onStatusChange after the loop (to call the status after we finished processing)

oribon · 2023-12-18T10:30:45Z

speaker/main.go

@@ -66,6 +67,8 @@ type service interface {
 	Errorf(svc *v1.Service, desc, msg string, args ...interface{})
 }

+type svcStatusChange func(*v1.Service)


nit: personal preference, I think it is cleaner to drop this struct, that is (with the move to ns/name):

Layer2StatusChange func(namespace, name string)

oribon · 2023-12-18T10:36:09Z

internal/k8s/controllers/layer2_status_controller.go

+		err = r.Client.Create(ctx, state)
+	}
+	if err != nil {
+		level.Error(r.Logger).Log("controller", "Layer2StatusReconciler", "failed to get", err)


nit: this is redundant, controller-runtime will log this for us

oribon · 2023-12-18T10:40:37Z

internal/k8s/controllers/layer2_status_controller.go

+	if len(ipAdvS) < 1 {
+		level.Debug(r.Logger).Log("controller", "Layer2StatusReconciler", "delete serviceL2status", req.NamespacedName.String())
+		if err = r.Client.Delete(ctx, state); err != nil {
+			level.Error(r.Logger).Log("controller", "Layer2StatusReconciler", "failed to delete", err)
+			return ctrl.Result{}, err
+		}
+		return ctrl.Result{}, nil
+	}


this can be reduced to

if len(ipAdvS) == 0 { err := r.Delete(ctx, state) return ctrl.Result{}, client.IgnoreNotFound(err) }

also, correct me if I'm wrong but by doing this is the speakers would delete eachother's statuses, each thinking the object shouldn't exist because they are not advertising it. if that's the case, we need to have a way for the speaker only to delete only what belongs to it.

The channel some speaker is watching gets an element only from the same speaker after it ensures that some service belongs to itself. So It seems that this is not a problem.

not sure I understand, the controller listens for both events that come from the channel and those that come from a change to an object (the For(&v1beta1.ServiceL2Status{})). what prevents speakers from deleting eachother's resources?

oribon · 2023-12-18T10:45:28Z

internal/k8s/controllers/layer2_status_controller.go

+}
+
+func (r *Layer2StatusReconciler) buildDesiredStatus(advertisements []layer2.IPAdvertisement) v1beta1.MetalLBServiceL2Status {
+	s := v1beta1.MetalLBServiceL2Status{


as in the design doc we should also put the metallb.io/node and service labels

oribon · 2023-12-18T10:59:44Z

internal/k8s/controllers/layer2_status_controller.go

+	state.Status = desiredStatus
+	err = r.Client.Status().Update(ctx, state)
+	if err != nil {
+		level.Error(r.Logger).Log("controller", "Layer2StatusReconciler", "failed to update status", err)


nit: redundant

oribon · 2023-12-18T11:03:20Z

internal/k8s/controllers/layer2_status_controller.go

can you add some tests under reconciliation_test.go (and maybe layer2_status_controller_test.go)?

lwabish · 2023-12-20T02:41:05Z

Thanks for your reviewing and sugguestions.I'll keep working on it. @oribon

lwabish · 2023-12-21T07:44:08Z

Thanks a lot for the PR! did a first pass, can you please add an e2e as well so we have a general sense if it's working? Please let me know if you need anything :)

I met this problem while trying to process go module dependencies in e2etest directory.

❯ cd e2etest
❯ go mod download
go: k8s.io/endpointslice@v0.0.0: invalid version: unknown revision v0.0.0
❯ go mod why k8s.io/endpointslice
go: downloading k8s.io/endpointslice v0.0.0
# k8s.io/endpointslice
(main module does not need package k8s.io/endpointslice)

Somehow this didn't affect inv e2etest.
Adding k8s.io/endpointslice => k8s.io/endpointslice v0.28.2 to go.mod could solve this problem but I'm not quite sure if this is an env specific problem.
Should I commit my e2etest/go.mod and related go module files?
Of course this is not a fatal or urgent problem but a little confusing.

oribon · 2023-12-21T14:48:42Z

it's a bit hard for me to say, but in general committing whatever go mod tidy changes should be fine 😅

lwabish · 2024-01-08T07:12:38Z

@oribon I have made the changes according to your sugguestions, please review them

oribon · 2024-01-09T08:25:54Z

@oribon I have made the changes according to your sugguestions, please review them

thanks a lot, I'll get to this in the following days :)
in the meanwhile I ran CI (and will rerun it again whenever you push)

lwabish · 2024-01-11T07:44:34Z

I ran some git rebase commands according to git help page to fix DCO test.Now The DCO test is ok.
But It seems that my commit log is mixed with some other commits.Not sure if this is a problem.
Besides, the CI workflow seems to be waiting for approval to run
@oribon

oribon · 2024-01-11T08:18:38Z

I ran some git rebase commands according to git help page to fix DCO test.Now The DCO test is ok. But It seems that my commit log is mixed with some other commits.Not sure if this is a problem. Besides, the CI workflow seems to be waiting for approval to run @oribon

approved again, about the commit log it might cause some problems but anyways I'll ask you sooner or later to clean it up a bit 😅 the easiest way imo would be squashing your commits together (for example using rebase -i, cherry-picking them on top of main and overriding this branch. Also, it'll be easier to review when the commits are organized into a few (I guess it shouldn't be more than ~4-5 commits for this PR: API / Manifests / implementation / E2E)

lwabish · 2024-01-11T11:20:29Z

I ran some git rebase commands according to git help page to fix DCO test.Now The DCO test is ok. But It seems that my commit log is mixed with some other commits.Not sure if this is a problem. Besides, the CI workflow seems to be waiting for approval to run @oribon

approved again, about the commit log it might cause some problems but anyways I'll ask you sooner or later to clean it up a bit 😅 the easiest way imo would be squashing your commits together (for example using rebase -i, cherry-picking them on top of main and overriding this branch. Also, it'll be easier to review when the commits are organized into a few (I guess it shouldn't be more than ~4-5 commits for this PR: API / Manifests / implementation / E2E)

No problem.This is what I planned to to after solving other problems.

lwabish · 2024-01-11T12:21:47Z

@oribon workflow needs approval😅

oribon

looks a lot better thanks a lot!

oribon · 2024-01-14T09:55:09Z

internal/k8s/controllers/layer2_status_controller.go

+	client.Client
+	Logger   log.Logger
+	NodeName string
+	Chan     <-chan event.GenericEvent


nit: give this a more meaningful name (reconcileChan/updateChan/..)

oribon · 2024-01-14T10:02:57Z

internal/k8s/controllers/layer2_status_controller.go

+		Node: r.NodeName,
+	}
+	// multiple advertisement objects share all fields except lb ip, so we use the first one
+	adv := advertisements[0]


I know it's like this in the original design, but I think it'd add value to include all the ips we advertise as well (which should match the service's ips unless we did something really wrong).
that way the user can go directly to the status resource to understand if/how the service is reachable
wdyt? cc @fedepaol

oribon · 2024-01-14T10:05:35Z

internal/k8s/controllers/layer2_status_controller.go

+		return ctrl.Result{}, err
+	}
+
+	state.Status = desiredState.Status


redundant here? this is assigned before (also as in the other comments, you can revert to what you initially did and remove the todo)

oribon · 2024-01-14T10:13:14Z

internal/k8s/controllers/layer2_status_controller.go

+		// only trigger reconcile function when the source object is not a ServiceL2Status
+		// this prevents reconciling cycle because we operate ServiceL2Status itself in the reconcile function


I'd say this should be the opposite, reconciling only when the relevant resource changes.
A way to avoid a reconciling cycle is comparing the desired status with the current object and not updating if they're equal.

Yes I got you but in our case I think it's slightly different.
Withctrl.NewControllerManagedBy(mgr).For().WatchesRawSource(), both ServiceL2Status and GenericEvent objects can trigger the reconcile and the reconciler itself may create ServiceL2Status object, which leads to a cycle reconciling.
Meanwhile, I have tried deleting the For() but the controller stops working.So We have to keep the For() but add an EventFilter to filter out ServiceL2Status from triggering the reconciling.

I understand, I'm talking about us wanting to reconcile on ServiceL2Status objects to make sure they always match the speaker. For example, if the user (or an unrelated speaker) modifies status.node, you'd expect the relevant speaker to override it to the correct value.
Regarding the cycle reconciliation, we can compare the current object with the desired status and updating only if it's needed, that way we won't trigger another reconciliation, e.g in frrk8s_config_controller.go we do:

if reflect.DeepEqual(current.Spec, r.desiredConfiguration.Spec) { level.Debug(r.Logger).Log("controller", "FRRK8sReconciler", "event", "not reconciling because of no change") return ctrl.Result{}, nil }

Yes I got you but in our case I think it's slightly different. Withctrl.NewControllerManagedBy(mgr).For().WatchesRawSource(), both ServiceL2Status and GenericEvent objects can trigger the reconcile and the reconciler itself may create ServiceL2Status object, which leads to a cycle reconciling. Meanwhile, I have tried deleting the For() but the controller stops working.So We have to keep the For() but add an EventFilter to filter out ServiceL2Status from triggering the reconciling.

Sorry I didn't notice your reply above.This also explains your question above:#2198 (comment)

I understand, I'm talking about us wanting to reconcile on ServiceL2Status objects to make sure they always match the speaker. For example, if the user (or an unrelated speaker) modifies status.node, you'd expect the relevant speaker to override it to the correct value.

Indeed.In this case, it seems that I have to remove the EventFilter to allow ServiceL2Status object triggering the reconcile funtion.I'll work on it to see if there is any problem.

oribon · 2024-01-14T13:19:42Z

speaker/layer2_controller.go

@@ -145,6 +146,7 @@ func (c *layer2Controller) ShouldAnnounce(l log.Logger, name string, toAnnounce

 func (c *layer2Controller) SetBalancer(l log.Logger, name string, lbIPs []net.IP, pool *config.Pool, client service, svc *v1.Service) error {
 	ifs := c.announcer.GetInterfaces()
+	statusUpdated := false


nit: rename to "updateStatus"

oribon · 2024-01-14T13:27:55Z

internal/k8s/controllers/layer2_status_controller.go

+	"sigs.k8s.io/controller-runtime/pkg/source"
+)
+
+type StatusFetcher func(namespace, name string) []layer2.IPAdvertisement


sorry in a second thought maybe it's better to have this one namespaced name instead separated as it's more common in the code (sorry for the hassle)

oribon · 2024-01-14T13:30:01Z

internal/k8s/controllers/reconciliation_test.go

+				}
+				return toCheck.Status.Node
+			}, 5*time.Second, 200*time.Millisecond).Should(Equal(testNodeName))
+


can you add an update to the interface and verify they are reflected in the status?

oribon · 2024-01-14T13:31:16Z

e2etest/l2tests/interface_selector.go

@@ -83,6 +85,40 @@ var _ = ginkgo.Describe("L2-interface selector", func() {
 			framework.ExpectNoError(err)
 		})

+		ginkgo.It("Validate ServiceL2Status interface", func() {
+			ginkgo.By("generate a random int to choose some interface for announcing")
+			i := rand.Intn(len(NodeNics))


let's just choose the first no need for this rand (and also fail the test if len(NodeNics) == 0

oribon · 2024-01-14T13:32:16Z

e2etest/l2tests/interface_selector.go

+				var err error
+				l2Status, err = status.GetL2Status(ConfigUpdater.Client(), svc)
+				return err
+			}, 2*time.Minute, 1*time.Second).ShouldNot(gomega.HaveOccurred())


can you add a 5s gomega.Consistently after this to make sure the speakers do not fight?

oribon · 2024-01-14T13:32:31Z

e2etest/l2tests/l2.go

+					return err.Error()
+				}
+				return node.Name
+			}, time.Minute, time.Second).Should(gomega.Equal(l2Status.Status.Node))


same (adding a consistently)

lwabish · 2024-01-18T12:22:13Z

@oribon All done except for the proposal to add advertise ips. Please rerun the ci workflow 😄

lwabish · 2024-01-20T04:02:16Z

@oribon
I think there's something about the reconcile funtion logic we should dive a little deeper.
Here are the two different approaches I have organized regarding how the reconcile function works.

The way that I wrote before:
With an eventfilter that filter out all serviceL2Status objects, the reconciler only got triggered by the channel elements, which means only the relavent speaker reconcile for a single service, no confilcts occur. But just like you mentioned recently, if some other controller or even users change the serviceL2Status, our reconciler can't get triggered.
This is also how the frr controller got implemented.
Without an eventfilter:
Every speaker can reconcile for a single service and external operations to the serviceL2Status object is considerd, which is more cloud-native. The difficulty lies in: the most direct status is stored only in one particular speaker, others can't determine if a serviceL2Status should be delete or not from the status inside itself.
In my latest reconcile funtion, besides removing the eventfilter, I added some conditions to help reconciler determine if a serviceL2Status should be deleted, which solved the confilcts and things seems ok right now according to the unit tests and e2e tests.
But I'm not sure if there is any other edge situation about this approach, wdyt ?

oribon · 2024-01-21T07:43:48Z

@oribon I think there's something about the reconcile funtion logic we should dive a little deeper. Here are the two different approaches I have organized regarding how the reconcile function works.

The way that I wrote before:
With an eventfilter that filter out all serviceL2Status objects, the reconciler only got triggered by the channel elements, which means only the relavent speaker reconcile for a single service, no confilcts occur. But just like you mentioned recently, if some other controller or even users change the serviceL2Status, our reconciler can't get triggered.
This is also how the frr controller got implemented.

Without an eventfilter:
Every speaker can reconcile for a single service and external operations to the serviceL2Status object is considerd, which is more cloud-native. The difficulty lies in: the most direct status is stored only in one particular speaker, others can't determine if a serviceL2Status should be delete or not from the status inside itself.
In my latest reconcile funtion, besides removing the eventfilter, I added some conditions to help reconciler determine if a serviceL2Status should be deleted, which solved the confilcts and things seems ok right now according to the unit tests and e2e tests.
But I'm not sure if there is any other edge situation about this approach, wdyt ?

I see what you're saying, our problem is that currently a speaker can't know if it's responsible for a given status resource. I think a good approach (and the most bulletproof one) would be embedding the node's name as part of the resource object like we intend doing in the bgp service status resource.
That way the question "is this speaker responsible for the object" boils down to "does the resource's name have the node's name as a suffix" (leveraging the fact that a resource's name is immutable) and reflecting this in the event filter.
Doing that, we can revert to your original implementation without the new auxiliary function, deleting the object if it has 0 advs but still having the advantage of listening for objects changes.
wdyt? cc @fedepaol

fedepaol · 2024-01-22T12:51:31Z

@oribon @lwabish haven't checked the implementation yet.

My gut feeling is, if a given l2 status is shared among different speakers (i.e. the service moves from one speaker to the other, for example because an endpoint moved or a node selector changed), the old owner only knows it doesn't own the service anymore unless we add extra logic to understand if no one should advertise the service at all.
So a given speaker will have to choose wether delete the status or not, just because it's not advertising the service, and getting this right looking at the current status is not easy (what if the speaker that should delete crashes for example)?

So, given we are doing the same for BGP, I am not against to not naming the resource as the service but having a "per node" l2status instance (where the service name / node name can be added as labels as in BGP). The name can be the concatenation of service and node, or something autogenerated, but by doing this any speaker will take care of the lifecycle of a single instance and its life (and ours) will be easier.

Hope I got things right

lwabish · 2024-01-23T07:44:39Z

@oribon @lwabish haven't checked the implementation yet.

My gut feeling is, if a given l2 status is shared among different speakers (i.e. the service moves from one speaker to the other, for example because an endpoint moved or a node selector changed), the old owner only knows it doesn't own the service anymore unless we add extra logic to understand if no one should advertise the service at all. So a given speaker will have to choose wether delete the status or not, just because it's not advertising the service, and getting this right looking at the current status is not easy (what if the speaker that should delete crashes for example)?

So, given we are doing the same for BGP, I am not against to not naming the resource as the service but having a "per node" l2status instance (where the service name / node name can be added as labels as in BGP). The name can be the concatenation of service and node, or something autogenerated, but by doing this any speaker will take care of the lifecycle of a single instance and its life (and ours) will be easier.

Hope I got things right

With this design, I think the controller logic could be simpler.
As a user , if I want to investigate my service annoucement status, my command would be like:

kubectl get servicel2status -n $NS --selector=metallb.io/service=$SVC-NAME

whose output would contains a couple of servicel2status objects.
If I am understanding this right, maybe we should optimize the servicel2status status object design.For example the status.node as a string would be empty except for the only one which is advertising, in this case a boolean type may be more clear?

fedepaol · 2024-01-23T10:18:59Z

@oribon @lwabish haven't checked the implementation yet.
My gut feeling is, if a given l2 status is shared among different speakers (i.e. the service moves from one speaker to the other, for example because an endpoint moved or a node selector changed), the old owner only knows it doesn't own the service anymore unless we add extra logic to understand if no one should advertise the service at all. So a given speaker will have to choose wether delete the status or not, just because it's not advertising the service, and getting this right looking at the current status is not easy (what if the speaker that should delete crashes for example)?
So, given we are doing the same for BGP, I am not against to not naming the resource as the service but having a "per node" l2status instance (where the service name / node name can be added as labels as in BGP). The name can be the concatenation of service and node, or something autogenerated, but by doing this any speaker will take care of the lifecycle of a single instance and its life (and ours) will be easier.
Hope I got things right

With this design, I think the controller logic could be simpler. As a user , if I want to investigate my service annoucement status, my command would be like:
kubectl get servicel2status -n $NS --selector=metallb.io/service=$SVC-NAME

And this would be the same as we are proposing for BGP (because then, we'd have multiple nodes advertising the service)

whose output would contains a couple of servicel2status objects. If I am understanding this right, maybe we should optimize the servicel2status status object design.For example the status.node as a string would be empty except for the only one which is advertising, in this case a boolean type may be more clear?

What I am suggesting here is, if a given speaker is not advertising the service anymore, then it should delete its own instance. So at a given time, there would be only one l2 adv per service (assuming one speaker is advertising).

oribon

left a few comments but this looks really good and almost ready!
I have a few concerns now about something related to the naming we chose in regards to the cleanup, but we'll defer it to another PR after we merge this (which ofc you can take if you'd like) - I'll open a separate issue for that. thanks again!

oribon · 2024-01-30T07:27:57Z

speaker/layer2_controller.go

+
+	svcNamespace, svcName, err := cache.SplitMetaNamespaceKey(name)
+	if err != nil {
+		level.Warn(l).Log("name is not a namespacedName", name)


nit: can you use the same format as the others? e.g

level.Error(l).Log("op", "DeleteBalancer", "protocol", "layer2", "service", name, "msg", "received a non namespaced name")

oribon · 2024-01-30T07:28:22Z

speaker/layer2_controller.go

+	svc := &v1.Service{}
+	svc.Namespace, svc.Name = svcNamespace, svcName


nit: this is unnecessary, we can pass:

c.onStatusChange(types.NamespacedName{Name:svcName, Namespace: svcNamespace})

oribon · 2024-01-30T07:31:27Z

speaker/main.go

+			if ns, name, err := cache.SplitMetaNamespaceKey(namespacedName.String()); err != nil {
+				level.Error(logger).Log("op", "startup", "msg", "failed to split meta namespace key", "error", err)
+			} else {
+				statusNotifyChan <- controllers.NewL2StatusEvent(ns, name)


nit: omit the if/else, you can just use:

controllers.NewL2StatusEvent(namespacedName.Namespace, namespacedName.Name)

oribon · 2024-01-30T07:40:16Z

speaker/main.go

@@ -254,26 +270,30 @@ func newController(cfg controllerConfig) (*controller, error) {
 	}
 	protocols := []config.Proto{config.BGP}

+	var layer2StatusFetcher controllers.StatusFetcher


nit: change this to

layer2StatusFetcher := func(types.NamespacedName) []layer2.IPAdvertisement { return nil }

that way it's clear that we don't care about statuses if l2 is disabled (and also I think that the current way assigns nil here)

oribon · 2024-01-30T08:26:15Z

internal/k8s/controllers/layer2_status_controller.go

+
+	ipAdvS := r.StatusFetcher(req.NamespacedName)
+
+	objName := fmt.Sprintf("%s-%s", req.Name, r.NodeName)


I think this is going to be problematic, since events can come from both the channel and k8s in different formats, that is: when a k8s object event changes it is going to be "name-node" and objName would be "name-node-node" and when the event comes from the cannel this is going to be "name-node".
I suggest we make all the requests to look as if they were coming from k8s (with the "name-node" format), see my other comment.
When all the events come with that format, this can be changed to:

svcName := strings.TrimSuffix(req.Name, fmt.Sprintf("-%s",r.NodeName)) ipAdvs := r.StatusFetcher(svcName, req.Namespace) ...

and objName will always match req.Name

Something wrong related to this did occured when I was debugging the tests.But later I found this worked just because wrong objNames fetched none statues so the reconcile stopped.
Of course the optimization you are offering is really good, will do it

oribon · 2024-01-30T08:32:01Z

internal/k8s/controllers/layer2_status_controller.go

+		Complete(r)
+}
+
+func (r *Layer2StatusReconciler) buildDesiredStatus(objName, svcNamespace, svcName string, advertisements []layer2.IPAdvertisement) v1beta1.ServiceL2Status {


wdyt about making this func build only the status portion, and let the reconciliation part handle the meta (calling this only to fill the state's status)?

oribon · 2024-01-30T08:39:47Z

internal/k8s/controllers/layer2_status_controller.go

+func (r *Layer2StatusReconciler) SetupWithManager(mgr ctrl.Manager) error {
+	return ctrl.NewControllerManagedBy(mgr).
+		For(&v1beta1.ServiceL2Status{}).
+		WatchesRawSource(&source.Channel{Source: r.ReconcileChan}, &handler.EnqueueRequestForObject{}).


to get what I proposed above (unifying the formats when an event is reconciled) the handler can be changed to:

handler.EnqueueRequestsFromMapFunc(func(ctx context.Context, obj client.Object) []reconcile.Request { evt, ok := obj.(*l2StatusEvent) if !ok { level.Error(r.Logger).Log("controller", "Layer2StatusReconciler", "error", "received an object that is not an l2 status event from the channel") return []reconcile.Request{} } level.Debug(r.Logger).Log("controller", "Layer2StatusReconciler", "enqueueing", evt.Name) return []reconcile.Request{{NamespacedName: types.NamespacedName{Namespace: evt.Namespace, Name: fmt.Sprintf("%s-%s", evt.Name, r.NodeName)}}} })

oribon · 2024-01-30T08:42:20Z

internal/k8s/controllers/layer2_status_controller.go

+	if result, err = controllerutil.CreateOrPatch(ctx, r.Client, state, func() error {
+		state.Labels = desiredState.Labels
+		state.Status = desiredState.Status
+		return nil
+	}); err != nil {
+		return ctrl.Result{}, err
+	} else if result == controllerutil.OperationResultCreated {
+		// According to https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/controller/controllerutil#CreateOrPatch
+		// If the object is created for the first time, we have to requeue it to ensure that the status is updated.
+		return ctrl.Result{Requeue: true}, nil
+	}


nit: (we try to avoid elses 😅 ) format to

if result, err = controllerutil.CreateOrPatch(ctx, r.Client, state, func() error { state.Labels = desiredState.Labels state.Status = desiredState.Status return nil }) if err != nil { return ctrl.Result{}, err } if result == controllerutil.OperationResultCreated { // According to https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/controller/controllerutil#CreateOrPatch // If the object is created for the first time, we have to requeue it to ensure that the status is updated. return ctrl.Result{Requeue: true}, nil }

oribon · 2024-01-30T10:26:06Z

e2etest/pkg/status/l2status.go

+		if s, err := GetL2Status(cs, svc, node.Name); err == nil {
+			l2Statuses = append(l2Statuses, s)
+		} else if errors.IsNotFound(err) {
+			continue
+		} else {
+			return nil, err
+		}


same as before, let's avoid elses 😅

s, err := GetL2Status(cs, svc, node.Name) if err != nil && errors.IsNotFound(err) { continue } if err != nil { return nil, err } l2Statuses = append(l2Statuses, s)

oribon · 2024-01-30T10:29:44Z

internal/k8s/controllers/layer2_status_controller.go

+	return ctrl.NewControllerManagedBy(mgr).
+		For(&v1beta1.ServiceL2Status{}).
+		WatchesRawSource(&source.Channel{Source: r.ReconcileChan}, &handler.EnqueueRequestForObject{}).
+		Complete(r)


we can now add a WithEventFilter to reconcile only those that end with the node's name

lwabish · 2024-01-31T07:51:43Z

@oribon Done :)

oribon

lgtm! thanks a lot, I know this took a while but the effort you put here is going to be very useful with the next status resources we want to implement 😄
can you please rebase your commits again (also on top of main)?
ccing @fedepaol to also give this a look

lwabish · 2024-02-04T10:10:57Z

@oribon Wonderful experience working with you guys and I learned a lot from your reviews.Thank you, too.
Should I still rebase the commits to four like before or squash them all?

oribon · 2024-02-04T10:12:20Z

@oribon Wonderful experience working with you guys and I learned a lot from your reviews.Thank you, too. Should I still rebase the commits to four like before or squash them all?

4-5 where each has its role :)

fedepaol · 2024-02-15T15:02:50Z

Folks, this looks really neat!
Sending to merge queue. Thanks a lot @lwabish for doing this, and @oribon for assisting!

fedepaol · 2024-02-15T15:08:29Z

@lwabish I think you need to regenerate the manifests?

lwabish · 2024-02-15T23:20:22Z

@lwabish I think you need to regenerate the manifests?

Sorry I will fix that tonight

fedepaol · 2024-02-16T17:07:48Z

Please rebase to the right commit

lwabish · 2024-02-16T17:15:07Z

Please rebase to the right commit

Ok, just finished

fedepaol · 2024-03-04T10:38:59Z

@lwabish apologies, I was sure I sent this to be merged.
Would you be so kind to rebase again and fix the conflicts? Thanks a lot!

lwabish · 2024-03-04T11:59:01Z

@lwabish apologies, I was sure I sent this to be merged. Would you be so kind to rebase again and fix the conflicts? Thanks a lot!

No problem.

Add a new go struct called ServiceL2Status for later implements of exposing layer2 service status. Signed-off-by: lwabish <wubw@pku.edu.cn> (cherry picked from commit a4f2654)

This is the core implement of layer2 service status exposing. A new k8s controller called layer2_status_controller was added to speakers. Signed-off-by: lwabish <wubw@pku.edu.cn>

Add unit tests for layer2 status controller. Integrate layer2 status exposing to E2E test. Signed-off-by: lwabish <wubw@pku.edu.cn>

This includes the following non go codes for layer2 status related function: helm chart kubebuilder crds/rbac changelog deepcopy go files Signed-off-by: lwabish <wubw@pku.edu.cn>

lwabish · 2024-03-04T12:29:51Z

@lwabish apologies, I was sure I sent this to be merged. Would you be so kind to rebase again and fix the conflicts? Thanks a lot!

No problem.

@fedepaol Just finished.

lwabish requested review from fedepaol, russellb, gclawes and oribon as code owners December 5, 2023 09:57

oribon reviewed Dec 18, 2023

View reviewed changes

lwabish force-pushed the main branch 2 times, most recently from e6e5583 to 2f5ce8d Compare January 11, 2024 07:26

lwabish force-pushed the main branch from 72e15ac to f078af2 Compare January 11, 2024 11:51

lwabish closed this Jan 11, 2024

lwabish force-pushed the main branch from f078af2 to 544588e Compare January 11, 2024 11:56

lwabish reopened this Jan 11, 2024

oribon reviewed Jan 14, 2024

View reviewed changes

oribon reviewed Jan 30, 2024

View reviewed changes

oribon approved these changes Feb 4, 2024

View reviewed changes

lwabish force-pushed the main branch 2 times, most recently from 306cc87 to 30ff915 Compare February 5, 2024 01:49

lwabish force-pushed the main branch from 28413af to 7582865 Compare February 16, 2024 17:14

feat: layer2 status api

e15361a

Add a new go struct called ServiceL2Status for later implements of exposing layer2 service status. Signed-off-by: lwabish <wubw@pku.edu.cn> (cherry picked from commit a4f2654)

lwabish force-pushed the main branch from ccf4c89 to e15361a Compare March 4, 2024 12:20

lwabish added 3 commits March 4, 2024 20:26

feat: implement of layer2 status

b7cd6b0

This is the core implement of layer2 service status exposing. A new k8s controller called layer2_status_controller was added to speakers. Signed-off-by: lwabish <wubw@pku.edu.cn>

tests: layer2 status

bd8b478

Add unit tests for layer2 status controller. Integrate layer2 status exposing to E2E test. Signed-off-by: lwabish <wubw@pku.edu.cn>

manifest: layer2 status

6b020ea

This includes the following non go codes for layer2 status related function: helm chart kubebuilder crds/rbac changelog deepcopy go files Signed-off-by: lwabish <wubw@pku.edu.cn>

github-actions bot added the kind/feature label Mar 4, 2024

fedepaol enabled auto-merge March 4, 2024 13:47

Merge branch 'main' into main

30e83c7

fedepaol added this pull request to the merge queue Mar 4, 2024

Merged via the queue into metallb:main with commit fe94349 Mar 4, 2024
33 of 35 checks passed

lwabish mentioned this pull request Apr 9, 2024

Fix #2311: change layer2status object namespace and add owner reference #2351

Merged

fedepaol mentioned this pull request Apr 9, 2024

CRD Status: Implement IPAddressPoolStatus #2155

Open

2 tasks

		// only trigger reconcile function when the source object is not a ServiceL2Status
		// this prevents reconciling cycle because we operate ServiceL2Status itself in the reconcile function

		svc := &v1.Service{}
		svc.Namespace, svc.Name = svcNamespace, svcName


		ipAdvS := r.StatusFetcher(req.NamespacedName)

		objName := fmt.Sprintf("%s-%s", req.Name, r.NodeName)

Implement ServiceL2Status CRD to expose service announcing status #2198

Implement ServiceL2Status CRD to expose service announcing status #2198

Conversation

lwabish commented Dec 5, 2023 • edited by fedepaol Loading

oribon commented Dec 6, 2023

lwabish commented Dec 7, 2023

lwabish commented Dec 15, 2023

oribon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oribon Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

lwabish commented Dec 20, 2023 • edited Loading

lwabish commented Dec 21, 2023 • edited Loading

oribon commented Dec 21, 2023

lwabish commented Jan 8, 2024

oribon commented Jan 9, 2024

lwabish commented Jan 11, 2024

oribon commented Jan 11, 2024

lwabish commented Jan 11, 2024

lwabish commented Jan 11, 2024

oribon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lwabish commented Jan 18, 2024

lwabish commented Jan 20, 2024

oribon commented Jan 21, 2024 • edited Loading

fedepaol commented Jan 22, 2024

lwabish commented Jan 23, 2024

fedepaol commented Jan 23, 2024

oribon left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lwabish commented Jan 31, 2024

oribon left a comment

Choose a reason for hiding this comment

lwabish commented Feb 4, 2024

oribon commented Feb 4, 2024

fedepaol commented Feb 15, 2024

fedepaol commented Feb 15, 2024

lwabish commented Feb 15, 2024

fedepaol commented Feb 16, 2024

lwabish commented Feb 16, 2024

fedepaol commented Mar 4, 2024

lwabish commented Mar 4, 2024

lwabish commented Mar 4, 2024

lwabish commented Dec 5, 2023 •

edited by fedepaol

Loading

oribon Dec 18, 2023 •

edited

Loading

lwabish commented Dec 20, 2023 •

edited

Loading

lwabish commented Dec 21, 2023 •

edited

Loading

oribon commented Jan 21, 2024 •

edited

Loading

oribon left a comment •

edited

Loading