Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ServiceL2Status CRD to expose service announcing status #2198

Merged
merged 5 commits into from
Mar 4, 2024

Conversation

lwabish
Copy link
Contributor

@lwabish lwabish commented Dec 5, 2023

This is an implement of what I was assigned in #2158.
The core implementation have almost been done and I have test the feature in my local dev-env.
I will continue doing some detailed work such as the helm chart files, etc.
Please do some early code review if possible, any suggestions would be appreciated.

/kind feature

Implement ServiceL2Status CRD to expose service announcing status in layer2 mode

@oribon
Copy link
Member

oribon commented Dec 6, 2023

Thanks a lot for opening this!
I think we should take a bit different approach regarding how we manage these resources.
Instead of an imperative approach where this is embedded in the existing layer2_controller and we take action according to the current step, we should go for a more declarative one where we have a new k8s controller that can react to general events, fetching the desired state of a service and building the status accordignly.
More specifically - we can make the layer2_controller expose the state of advertised services (e.g by a func that accepts a namespaced name for a service and return its advertisement state), add a new k8s controller that listens to a channel and feed that channel from the layer2_controller every time a state of an advertised service changes with an event that service x has changed. When the new k8s controller gets the event that service x has changed it queries its state and builds the relevant status.
You can take inspiration from how we did it in frr-k8s, where the nodestate controller fetches the different pieces to create the state from other components:
https://github.com/metallb/frr-k8s/blob/main/internal/controller/frrstate_controller.go
https://github.com/metallb/frr-k8s/blob/8ea5c2e33beaab49af944d14452c85993095e844/cmd/main.go#L197

This will allow us to delegate the concerns of managing the statuses to a single point which should be easy to follow.
Also, there are still some corners we need to make sure we're covering such as not leaving stale statuses around and making sure the user doesn't edit them but we can leave them for later once we settle on the general picture. wdyt?

Again, thanks a lot for your efforts and please let me know if you need any clarifications or help!

@lwabish
Copy link
Contributor Author

lwabish commented Dec 7, 2023

Got it. I'll do some refactor.

@lwabish
Copy link
Contributor Author

lwabish commented Dec 15, 2023

@oribon Please review the new implement

Copy link
Member

@oribon oribon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR!
did a first pass, can you please add an e2e as well so we have a general sense if it's working? Please let me know if you need anything :)

@@ -14,6 +14,8 @@ import (
"github.com/go-kit/log/level"
)

type StatusFetcher func(string) []IPAdvertisement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be defined in layer2_status_controller.go (see https://go.dev/wiki/CodeReviewComments#interfaces)
also I think it is nicer if it would receive namespace+name instead of namespaced name

speaker/main.go Outdated
Comment on lines 164 to 166
Layer2StatusChange: func(svc *v1.Service) {
statusNotifyChan <- event.GenericEvent{Object: svc}
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we aren't really using the service object for this purpose, it makes sense to define a new object for the event under the new controller:

type l2StatusEvent struct {
	metav1.TypeMeta
	metav1.ObjectMeta
}

func (evt *l2StatusEvent) DeepCopyObject() runtime.Object {
	res := new(l2StatusEvent)
	res.Name = evt.Name
	res.Namespace = evt.Namespace
	return res
}

func NewL2StatusEvent(namespace, name string) event.GenericEvent {
	evt := l2StatusEvent{}
	evt.Name = name
	evt.Namespace = namespace
	return event.GenericEvent{Object: &evt}
}

This is mostly copied from the new frrk8s_config_controller.go which is worth comparing against and making the adjustments (I'll refer to it later again probably)

Comment on lines 176 to 185
// Before, without exposing status to cluster, all status are inside speakers.
// So We didn't need the svc instance and pure name of string is enough to act as key to maintain internal status.
// But Now we have to parse the name to build an object due to controller channel limitations.
parts := strings.Split(name, string(types.Separator))
if len(parts) != 2 {
level.Warn(l).Log("name is not a namespacedName", name)
return fmt.Errorf("not a namespacedName")
}
svc := &v1.Service{}
svc.Namespace, svc.Name = parts[0], parts[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be replaced by:

svcNamespace, svcName, err := cache.SplitMetaNamespaceKey(name)
...

from "k8s.io/client-go/tools/cache"

Comment on lines 162 to 165
if !statusUpdated {
c.onStatusChange(svc)
statusUpdated = true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to change this to updateStatus := false and calling onStatusChange after the loop (to call the status after we finished processing)

speaker/main.go Outdated
@@ -66,6 +67,8 @@ type service interface {
Errorf(svc *v1.Service, desc, msg string, args ...interface{})
}

type svcStatusChange func(*v1.Service)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: personal preference, I think it is cleaner to drop this struct, that is (with the move to ns/name):

Layer2StatusChange           func(namespace, name string)

err = r.Client.Create(ctx, state)
}
if err != nil {
level.Error(r.Logger).Log("controller", "Layer2StatusReconciler", "failed to get", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is redundant, controller-runtime will log this for us

Comment on lines 45 to 84
if len(ipAdvS) < 1 {
level.Debug(r.Logger).Log("controller", "Layer2StatusReconciler", "delete serviceL2status", req.NamespacedName.String())
if err = r.Client.Delete(ctx, state); err != nil {
level.Error(r.Logger).Log("controller", "Layer2StatusReconciler", "failed to delete", err)
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be reduced to

	if len(ipAdvS) == 0 {
		err := r.Delete(ctx, state)
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

also, correct me if I'm wrong but by doing this is the speakers would delete eachother's statuses, each thinking the object shouldn't exist because they are not advertising it. if that's the case, we need to have a way for the speaker only to delete only what belongs to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel some speaker is watching gets an element only from the same speaker after it ensures that some service belongs to itself. So It seems that this is not a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand, the controller listens for both events that come from the channel and those that come from a change to an object (the For(&v1beta1.ServiceL2Status{})). what prevents speakers from deleting eachother's resources?

}

func (r *Layer2StatusReconciler) buildDesiredStatus(advertisements []layer2.IPAdvertisement) v1beta1.MetalLBServiceL2Status {
s := v1beta1.MetalLBServiceL2Status{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in the design doc we should also put the metallb.io/node and service labels

state.Status = desiredStatus
err = r.Client.Status().Update(ctx, state)
if err != nil {
level.Error(r.Logger).Log("controller", "Layer2StatusReconciler", "failed to update status", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: redundant

Copy link
Member

@oribon oribon Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some tests under reconciliation_test.go (and maybe layer2_status_controller_test.go)?

@lwabish
Copy link
Contributor Author

lwabish commented Dec 20, 2023

Thanks for your reviewing and sugguestions.I'll keep working on it. @oribon

@lwabish
Copy link
Contributor Author

lwabish commented Dec 21, 2023

Thanks a lot for the PR! did a first pass, can you please add an e2e as well so we have a general sense if it's working? Please let me know if you need anything :)

I met this problem while trying to process go module dependencies in e2etest directory.

❯ cd e2etest
❯ go mod download
go: k8s.io/endpointslice@v0.0.0: invalid version: unknown revision v0.0.0
❯ go mod why k8s.io/endpointslice
go: downloading k8s.io/endpointslice v0.0.0
# k8s.io/endpointslice
(main module does not need package k8s.io/endpointslice)

Somehow this didn't affect inv e2etest.
Adding k8s.io/endpointslice => k8s.io/endpointslice v0.28.2 to go.mod could solve this problem but I'm not quite sure if this is an env specific problem.
Should I commit my e2etest/go.mod and related go module files?
Of course this is not a fatal or urgent problem but a little confusing.

@oribon
Copy link
Member

oribon commented Dec 21, 2023

it's a bit hard for me to say, but in general committing whatever go mod tidy changes should be fine 😅

@lwabish
Copy link
Contributor Author

lwabish commented Jan 8, 2024

@oribon I have made the changes according to your sugguestions, please review them

@oribon
Copy link
Member

oribon commented Jan 9, 2024

@oribon I have made the changes according to your sugguestions, please review them

thanks a lot, I'll get to this in the following days :)
in the meanwhile I ran CI (and will rerun it again whenever you push)

@lwabish
Copy link
Contributor Author

lwabish commented Jan 11, 2024

I ran some git rebase commands according to git help page to fix DCO test.Now The DCO test is ok.
But It seems that my commit log is mixed with some other commits.Not sure if this is a problem.
Besides, the CI workflow seems to be waiting for approval to run
@oribon

@oribon
Copy link
Member

oribon commented Jan 11, 2024

I ran some git rebase commands according to git help page to fix DCO test.Now The DCO test is ok. But It seems that my commit log is mixed with some other commits.Not sure if this is a problem. Besides, the CI workflow seems to be waiting for approval to run @oribon

approved again, about the commit log it might cause some problems but anyways I'll ask you sooner or later to clean it up a bit 😅 the easiest way imo would be squashing your commits together (for example using rebase -i, cherry-picking them on top of main and overriding this branch. Also, it'll be easier to review when the commits are organized into a few (I guess it shouldn't be more than ~4-5 commits for this PR: API / Manifests / implementation / E2E)

@lwabish
Copy link
Contributor Author

lwabish commented Jan 11, 2024

I ran some git rebase commands according to git help page to fix DCO test.Now The DCO test is ok. But It seems that my commit log is mixed with some other commits.Not sure if this is a problem. Besides, the CI workflow seems to be waiting for approval to run @oribon

approved again, about the commit log it might cause some problems but anyways I'll ask you sooner or later to clean it up a bit 😅 the easiest way imo would be squashing your commits together (for example using rebase -i, cherry-picking them on top of main and overriding this branch. Also, it'll be easier to review when the commits are organized into a few (I guess it shouldn't be more than ~4-5 commits for this PR: API / Manifests / implementation / E2E)

No problem.This is what I planned to to after solving other problems.

@lwabish
Copy link
Contributor Author

lwabish commented Jan 11, 2024

@oribon workflow needs approval😅

Copy link
Member

@oribon oribon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks a lot better thanks a lot!

client.Client
Logger log.Logger
NodeName string
Chan <-chan event.GenericEvent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: give this a more meaningful name (reconcileChan/updateChan/..)

Node: r.NodeName,
}
// multiple advertisement objects share all fields except lb ip, so we use the first one
adv := advertisements[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's like this in the original design, but I think it'd add value to include all the ips we advertise as well (which should match the service's ips unless we did something really wrong).
that way the user can go directly to the status resource to understand if/how the service is reachable
wdyt? cc @fedepaol

return ctrl.Result{}, err
}

state.Status = desiredState.Status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant here? this is assigned before (also as in the other comments, you can revert to what you initially did and remove the todo)

Comment on lines 105 to 106
// only trigger reconcile function when the source object is not a ServiceL2Status
// this prevents reconciling cycle because we operate ServiceL2Status itself in the reconcile function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this should be the opposite, reconciling only when the relevant resource changes.
A way to avoid a reconciling cycle is comparing the desired status with the current object and not updating if they're equal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I got you but in our case I think it's slightly different.
Withctrl.NewControllerManagedBy(mgr).For().WatchesRawSource(), both ServiceL2Status and GenericEvent objects can trigger the reconcile and the reconciler itself may create ServiceL2Status object, which leads to a cycle reconciling.
Meanwhile, I have tried deleting the For() but the controller stops working.So We have to keep the For() but add an EventFilter to filter out ServiceL2Status from triggering the reconciling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, I'm talking about us wanting to reconcile on ServiceL2Status objects to make sure they always match the speaker. For example, if the user (or an unrelated speaker) modifies status.node, you'd expect the relevant speaker to override it to the correct value.
Regarding the cycle reconciliation, we can compare the current object with the desired status and updating only if it's needed, that way we won't trigger another reconciliation, e.g in frrk8s_config_controller.go we do:

	if reflect.DeepEqual(current.Spec, r.desiredConfiguration.Spec) {
		level.Debug(r.Logger).Log("controller", "FRRK8sReconciler", "event", "not reconciling because of no change")
		return ctrl.Result{}, nil
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I got you but in our case I think it's slightly different. Withctrl.NewControllerManagedBy(mgr).For().WatchesRawSource(), both ServiceL2Status and GenericEvent objects can trigger the reconcile and the reconciler itself may create ServiceL2Status object, which leads to a cycle reconciling. Meanwhile, I have tried deleting the For() but the controller stops working.So We have to keep the For() but add an EventFilter to filter out ServiceL2Status from triggering the reconciling.

Sorry I didn't notice your reply above.This also explains your question above:#2198 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, I'm talking about us wanting to reconcile on ServiceL2Status objects to make sure they always match the speaker. For example, if the user (or an unrelated speaker) modifies status.node, you'd expect the relevant speaker to override it to the correct value.

Indeed.In this case, it seems that I have to remove the EventFilter to allow ServiceL2Status object triggering the reconcile funtion.I'll work on it to see if there is any problem.

@@ -145,6 +146,7 @@ func (c *layer2Controller) ShouldAnnounce(l log.Logger, name string, toAnnounce

func (c *layer2Controller) SetBalancer(l log.Logger, name string, lbIPs []net.IP, pool *config.Pool, client service, svc *v1.Service) error {
ifs := c.announcer.GetInterfaces()
statusUpdated := false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to "updateStatus"

"sigs.k8s.io/controller-runtime/pkg/source"
)

type StatusFetcher func(namespace, name string) []layer2.IPAdvertisement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry in a second thought maybe it's better to have this one namespaced name instead separated as it's more common in the code (sorry for the hassle)

}
return toCheck.Status.Node
}, 5*time.Second, 200*time.Millisecond).Should(Equal(testNodeName))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an update to the interface and verify they are reflected in the status?

@@ -83,6 +85,40 @@ var _ = ginkgo.Describe("L2-interface selector", func() {
framework.ExpectNoError(err)
})

ginkgo.It("Validate ServiceL2Status interface", func() {
ginkgo.By("generate a random int to choose some interface for announcing")
i := rand.Intn(len(NodeNics))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just choose the first no need for this rand (and also fail the test if len(NodeNics) == 0

var err error
l2Status, err = status.GetL2Status(ConfigUpdater.Client(), svc)
return err
}, 2*time.Minute, 1*time.Second).ShouldNot(gomega.HaveOccurred())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a 5s gomega.Consistently after this to make sure the speakers do not fight?

return err.Error()
}
return node.Name
}, time.Minute, time.Second).Should(gomega.Equal(l2Status.Status.Node))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same (adding a consistently)

@lwabish
Copy link
Contributor Author

lwabish commented Jan 18, 2024

@oribon All done except for the proposal to add advertise ips. Please rerun the ci workflow 😄

@lwabish
Copy link
Contributor Author

lwabish commented Jan 20, 2024

@oribon
I think there's something about the reconcile funtion logic we should dive a little deeper.
Here are the two different approaches I have organized regarding how the reconcile function works.

  1. The way that I wrote before:
    With an eventfilter that filter out all serviceL2Status objects, the reconciler only got triggered by the channel elements, which means only the relavent speaker reconcile for a single service, no confilcts occur. But just like you mentioned recently, if some other controller or even users change the serviceL2Status, our reconciler can't get triggered.
    This is also how the frr controller got implemented.
  2. Without an eventfilter:
    Every speaker can reconcile for a single service and external operations to the serviceL2Status object is considerd, which is more cloud-native. The difficulty lies in: the most direct status is stored only in one particular speaker, others can't determine if a serviceL2Status should be delete or not from the status inside itself.
    In my latest reconcile funtion, besides removing the eventfilter, I added some conditions to help reconciler determine if a serviceL2Status should be deleted, which solved the confilcts and things seems ok right now according to the unit tests and e2e tests.
    But I'm not sure if there is any other edge situation about this approach, wdyt ?

@oribon
Copy link
Member

oribon commented Jan 21, 2024

@oribon I think there's something about the reconcile funtion logic we should dive a little deeper. Here are the two different approaches I have organized regarding how the reconcile function works.

  1. The way that I wrote before:
    With an eventfilter that filter out all serviceL2Status objects, the reconciler only got triggered by the channel elements, which means only the relavent speaker reconcile for a single service, no confilcts occur. But just like you mentioned recently, if some other controller or even users change the serviceL2Status, our reconciler can't get triggered.
    This is also how the frr controller got implemented.
  2. Without an eventfilter:
    Every speaker can reconcile for a single service and external operations to the serviceL2Status object is considerd, which is more cloud-native. The difficulty lies in: the most direct status is stored only in one particular speaker, others can't determine if a serviceL2Status should be delete or not from the status inside itself.
    In my latest reconcile funtion, besides removing the eventfilter, I added some conditions to help reconciler determine if a serviceL2Status should be deleted, which solved the confilcts and things seems ok right now according to the unit tests and e2e tests.
    But I'm not sure if there is any other edge situation about this approach, wdyt ?

I see what you're saying, our problem is that currently a speaker can't know if it's responsible for a given status resource. I think a good approach (and the most bulletproof one) would be embedding the node's name as part of the resource object like we intend doing in the bgp service status resource.
That way the question "is this speaker responsible for the object" boils down to "does the resource's name have the node's name as a suffix" (leveraging the fact that a resource's name is immutable) and reflecting this in the event filter.
Doing that, we can revert to your original implementation without the new auxiliary function, deleting the object if it has 0 advs but still having the advantage of listening for objects changes.
wdyt? cc @fedepaol

@fedepaol
Copy link
Member

@oribon @lwabish haven't checked the implementation yet.

My gut feeling is, if a given l2 status is shared among different speakers (i.e. the service moves from one speaker to the other, for example because an endpoint moved or a node selector changed), the old owner only knows it doesn't own the service anymore unless we add extra logic to understand if no one should advertise the service at all.
So a given speaker will have to choose wether delete the status or not, just because it's not advertising the service, and getting this right looking at the current status is not easy (what if the speaker that should delete crashes for example)?

So, given we are doing the same for BGP, I am not against to not naming the resource as the service but having a "per node" l2status instance (where the service name / node name can be added as labels as in BGP). The name can be the concatenation of service and node, or something autogenerated, but by doing this any speaker will take care of the lifecycle of a single instance and its life (and ours) will be easier.

Hope I got things right

@lwabish
Copy link
Contributor Author

lwabish commented Jan 23, 2024

@oribon @lwabish haven't checked the implementation yet.

My gut feeling is, if a given l2 status is shared among different speakers (i.e. the service moves from one speaker to the other, for example because an endpoint moved or a node selector changed), the old owner only knows it doesn't own the service anymore unless we add extra logic to understand if no one should advertise the service at all. So a given speaker will have to choose wether delete the status or not, just because it's not advertising the service, and getting this right looking at the current status is not easy (what if the speaker that should delete crashes for example)?

So, given we are doing the same for BGP, I am not against to not naming the resource as the service but having a "per node" l2status instance (where the service name / node name can be added as labels as in BGP). The name can be the concatenation of service and node, or something autogenerated, but by doing this any speaker will take care of the lifecycle of a single instance and its life (and ours) will be easier.

Hope I got things right

With this design, I think the controller logic could be simpler.
As a user , if I want to investigate my service annoucement status, my command would be like:

kubectl get servicel2status -n $NS --selector=metallb.io/service=$SVC-NAME

whose output would contains a couple of servicel2status objects.
If I am understanding this right, maybe we should optimize the servicel2status status object design.For example the status.node as a string would be empty except for the only one which is advertising, in this case a boolean type may be more clear?

@fedepaol
Copy link
Member

@oribon @lwabish haven't checked the implementation yet.
My gut feeling is, if a given l2 status is shared among different speakers (i.e. the service moves from one speaker to the other, for example because an endpoint moved or a node selector changed), the old owner only knows it doesn't own the service anymore unless we add extra logic to understand if no one should advertise the service at all. So a given speaker will have to choose wether delete the status or not, just because it's not advertising the service, and getting this right looking at the current status is not easy (what if the speaker that should delete crashes for example)?
So, given we are doing the same for BGP, I am not against to not naming the resource as the service but having a "per node" l2status instance (where the service name / node name can be added as labels as in BGP). The name can be the concatenation of service and node, or something autogenerated, but by doing this any speaker will take care of the lifecycle of a single instance and its life (and ours) will be easier.
Hope I got things right

With this design, I think the controller logic could be simpler. As a user , if I want to investigate my service annoucement status, my command would be like:

kubectl get servicel2status -n $NS --selector=metallb.io/service=$SVC-NAME

And this would be the same as we are proposing for BGP (because then, we'd have multiple nodes advertising the service)

whose output would contains a couple of servicel2status objects. If I am understanding this right, maybe we should optimize the servicel2status status object design.For example the status.node as a string would be empty except for the only one which is advertising, in this case a boolean type may be more clear?

What I am suggesting here is, if a given speaker is not advertising the service anymore, then it should delete its own instance. So at a given time, there would be only one l2 adv per service (assuming one speaker is advertising).

Copy link
Member

@oribon oribon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments but this looks really good and almost ready!
I have a few concerns now about something related to the naming we chose in regards to the cleanup, but we'll defer it to another PR after we merge this (which ofc you can take if you'd like) - I'll open a separate issue for that. thanks again!


svcNamespace, svcName, err := cache.SplitMetaNamespaceKey(name)
if err != nil {
level.Warn(l).Log("name is not a namespacedName", name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you use the same format as the others? e.g

level.Error(l).Log("op", "DeleteBalancer", "protocol", "layer2", "service", name, "msg", "received a non namespaced name")

Comment on lines 163 to 164
svc := &v1.Service{}
svc.Namespace, svc.Name = svcNamespace, svcName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is unnecessary, we can pass:

c.onStatusChange(types.NamespacedName{Name:svcName, Namespace: svcNamespace})

speaker/main.go Outdated
Comment on lines 164 to 167
if ns, name, err := cache.SplitMetaNamespaceKey(namespacedName.String()); err != nil {
level.Error(logger).Log("op", "startup", "msg", "failed to split meta namespace key", "error", err)
} else {
statusNotifyChan <- controllers.NewL2StatusEvent(ns, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: omit the if/else, you can just use:

controllers.NewL2StatusEvent(namespacedName.Namespace, namespacedName.Name)

speaker/main.go Outdated
@@ -254,26 +270,30 @@ func newController(cfg controllerConfig) (*controller, error) {
}
protocols := []config.Proto{config.BGP}

var layer2StatusFetcher controllers.StatusFetcher
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change this to

layer2StatusFetcher := func(types.NamespacedName) []layer2.IPAdvertisement { return nil }

that way it's clear that we don't care about statuses if l2 is disabled (and also I think that the current way assigns nil here)


ipAdvS := r.StatusFetcher(req.NamespacedName)

objName := fmt.Sprintf("%s-%s", req.Name, r.NodeName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to be problematic, since events can come from both the channel and k8s in different formats, that is: when a k8s object event changes it is going to be "name-node" and objName would be "name-node-node" and when the event comes from the cannel this is going to be "name-node".
I suggest we make all the requests to look as if they were coming from k8s (with the "name-node" format), see my other comment.
When all the events come with that format, this can be changed to:

svcName := strings.TrimSuffix(req.Name, fmt.Sprintf("-%s",r.NodeName))

ipAdvs := r.StatusFetcher(svcName, req.Namespace)
...

and objName will always match req.Name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something wrong related to this did occured when I was debugging the tests.But later I found this worked just because wrong objNames fetched none statues so the reconcile stopped.
Of course the optimization you are offering is really good, will do it

Complete(r)
}

func (r *Layer2StatusReconciler) buildDesiredStatus(objName, svcNamespace, svcName string, advertisements []layer2.IPAdvertisement) v1beta1.ServiceL2Status {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about making this func build only the status portion, and let the reconciliation part handle the meta (calling this only to fill the state's status)?

func (r *Layer2StatusReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&v1beta1.ServiceL2Status{}).
WatchesRawSource(&source.Channel{Source: r.ReconcileChan}, &handler.EnqueueRequestForObject{}).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to get what I proposed above (unifying the formats when an event is reconciled) the handler can be changed to:

handler.EnqueueRequestsFromMapFunc(func(ctx context.Context, obj client.Object) []reconcile.Request {
			evt, ok := obj.(*l2StatusEvent)
			if !ok {
				level.Error(r.Logger).Log("controller", "Layer2StatusReconciler", "error", "received an object that is not an l2 status event from the channel")
				return []reconcile.Request{}
			}

			level.Debug(r.Logger).Log("controller", "Layer2StatusReconciler", "enqueueing", evt.Name)
			return []reconcile.Request{{NamespacedName: types.NamespacedName{Namespace: evt.Namespace, Name: fmt.Sprintf("%s-%s", evt.Name, r.NodeName)}}}
		})

Comment on lines 84 to 99
if result, err = controllerutil.CreateOrPatch(ctx, r.Client, state, func() error {
state.Labels = desiredState.Labels
state.Status = desiredState.Status
return nil
}); err != nil {
return ctrl.Result{}, err
} else if result == controllerutil.OperationResultCreated {
// According to https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/controller/controllerutil#CreateOrPatch
// If the object is created for the first time, we have to requeue it to ensure that the status is updated.
return ctrl.Result{Requeue: true}, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (we try to avoid elses 😅 ) format to

	if result, err = controllerutil.CreateOrPatch(ctx, r.Client, state, func() error {
		state.Labels = desiredState.Labels
		state.Status = desiredState.Status
		return nil
	})

        if err != nil {
		return ctrl.Result{}, err
	}

       if result == controllerutil.OperationResultCreated {
		// According to https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/controller/controllerutil#CreateOrPatch
		// If the object is created for the first time, we have to requeue it to ensure that the status is updated.
		return ctrl.Result{Requeue: true}, nil
	}

Comment on lines 30 to 36
if s, err := GetL2Status(cs, svc, node.Name); err == nil {
l2Statuses = append(l2Statuses, s)
} else if errors.IsNotFound(err) {
continue
} else {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before, let's avoid elses 😅

		s, err := GetL2Status(cs, svc, node.Name)
		if err != nil && errors.IsNotFound(err) {
			continue
		}
		if err != nil {
			return nil, err
		}
		l2Statuses = append(l2Statuses, s)

return ctrl.NewControllerManagedBy(mgr).
For(&v1beta1.ServiceL2Status{}).
WatchesRawSource(&source.Channel{Source: r.ReconcileChan}, &handler.EnqueueRequestForObject{}).
Complete(r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can now add a WithEventFilter to reconcile only those that end with the node's name

@lwabish
Copy link
Contributor Author

lwabish commented Jan 31, 2024

@oribon Done :)

Copy link
Member

@oribon oribon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks a lot, I know this took a while but the effort you put here is going to be very useful with the next status resources we want to implement 😄
can you please rebase your commits again (also on top of main)?
ccing @fedepaol to also give this a look

@lwabish
Copy link
Contributor Author

lwabish commented Feb 4, 2024

@oribon Wonderful experience working with you guys and I learned a lot from your reviews.Thank you, too.
Should I still rebase the commits to four like before or squash them all?

@oribon
Copy link
Member

oribon commented Feb 4, 2024

@oribon Wonderful experience working with you guys and I learned a lot from your reviews.Thank you, too. Should I still rebase the commits to four like before or squash them all?

4-5 where each has its role :)

@fedepaol
Copy link
Member

Folks, this looks really neat!
Sending to merge queue. Thanks a lot @lwabish for doing this, and @oribon for assisting!

@fedepaol
Copy link
Member

@lwabish I think you need to regenerate the manifests?

@lwabish
Copy link
Contributor Author

lwabish commented Feb 15, 2024

@lwabish I think you need to regenerate the manifests?

Sorry I will fix that tonight

@fedepaol
Copy link
Member

Please rebase to the right commit

@lwabish
Copy link
Contributor Author

lwabish commented Feb 16, 2024

Please rebase to the right commit

Ok, just finished

@fedepaol
Copy link
Member

fedepaol commented Mar 4, 2024

@lwabish apologies, I was sure I sent this to be merged.
Would you be so kind to rebase again and fix the conflicts? Thanks a lot!

@lwabish
Copy link
Contributor Author

lwabish commented Mar 4, 2024

@lwabish apologies, I was sure I sent this to be merged. Would you be so kind to rebase again and fix the conflicts? Thanks a lot!

No problem.

Add a new go struct called ServiceL2Status for later implements
of exposing layer2 service status.

Signed-off-by: lwabish <wubw@pku.edu.cn>
(cherry picked from commit a4f2654)
This is the core implement of layer2 service status exposing.
A new k8s controller called layer2_status_controller was added to speakers.

Signed-off-by: lwabish <wubw@pku.edu.cn>
Add unit tests for layer2 status controller.
Integrate layer2 status exposing to E2E test.

Signed-off-by: lwabish <wubw@pku.edu.cn>
This includes the following non go codes for layer2 status related function:
	helm chart
	kubebuilder crds/rbac
	changelog
        deepcopy go files

Signed-off-by: lwabish <wubw@pku.edu.cn>
@lwabish
Copy link
Contributor Author

lwabish commented Mar 4, 2024

@lwabish apologies, I was sure I sent this to be merged. Would you be so kind to rebase again and fix the conflicts? Thanks a lot!

No problem.

@fedepaol Just finished.

@fedepaol fedepaol added this pull request to the merge queue Mar 4, 2024
Merged via the queue into metallb:main with commit fe94349 Mar 4, 2024
33 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants