Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dst: Update GetProfile's stream when pod associated to HostPort lookup changes #11334

Merged
merged 9 commits into from Sep 28, 2023

Conversation

alpeb
Copy link
Member

@alpeb alpeb commented Sep 4, 2023

Followup to #11328

Implements a new pod watcher, instantiated along the other ones in the Destination server. It also watches on Servers and carries all the logic from ServerWatcher, which has now been decommissioned.

The CreateAddress() function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. getAnnotatedOpaquePorts() got also moved for similar reasons.

Other things to note about PodWatcher:

  • It publishes a new pair of metrics ip_port_subscribers and ip_port_updates leveraging the framework in prometheus.go.
  • The complexity in updatePod() is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event.

Finally, endpointProfileTranslator's endpoint (*pb.WeightedAddr) not being a static object anymore, the Update() function now receives an Address that allows it to rebuild the endpoint on the fly (and so createEndpoint() was converted into a method of endpointProfileTranslator).


Original PR description

(What follows is the original description of this PR's first approach, which evolved as described above after taking into account all the feedback).

Implements a new pod watcher, instantiated along the other ones in the Destination server. It's generic enough to catch all pod events in the cluster, so it's up to the subscribers to filter out the ones they're interested in, and to set up any metrics.

The endointProfileTranslator was refactored so it subscribes to the pod watcher events. Handling of Server subscriptions are now also handled by it, which are recycled whenever the pod changes.

A new gauge metric host_port_subscribers has been created, tracking the number of subscribers for a given HostIP+port combination.

Other Changes

  • Moved the server.createAddress method into a static function in endpoints_watcher.go, for better reusability.
  • The "Return profile for host port pods" test introduced in dst: Stop overriding Host IP with Pod IP on HostPort lookup #11328 was extended to track the ensuing events after a pod is deleted and then recreated (:taco: to @adleong for the test).
  • Given that test consumes multiple events, we had to change the profileStream test helper to have GetProfile run in a goroutine so that it keeps on running. This implies the stream now needs to be manually canceled and access to the stream updates have to be guarded with a mutex.

@alpeb alpeb requested a review from a team as a code owner September 4, 2023 22:17
@alpeb alpeb changed the title Complete fix for hostport staleness - pod watcher dst: Update GetProfile's stream when pod associated to HostPort lookup changes Sep 5, 2023
Copy link
Member

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Think you broke the problem down really well too, it was easy for me to jump in with no context. I left a few comments, I need to understand the adaptor a bit better. Using the pattern here is difficult because it would otherwise be hard to slot the pod watcher into the endpoint translator? (e.g. running a watcher that can propagate updates to endpoint translator listeners). The idea being that the adaptor is the listener that understands pod events?

As far as testing goes, I gave this a try and it works well!

INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:2886860802} port:9999} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"deployment" value:"nginx-deployment"} metric_labels:{key:"namespace" value:"default"} metric_labels:{key:"pod" value:"nginx-deployment-674b6bd49b-5tj4r"} metric_labels:{key:"pod_template_hash" value:"674b6bd49b"} metric_labels:{key:"serviceaccount" value:"default"} tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}}
//
// RESTART POD
//
INFO[0000] 
INFO[0008] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:2886860802} port:9999} weight:10000}
INFO[0008]
INFO[0012] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:2886860802} port:9999} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"deployment" value:"nginx-deployment"} metric_labels:{key:"namespace" value:"default"} metric_labels:{key:"pod" value:"nginx-deployment-674b6bd49b-6ngm2"} metric_labels:{key:"pod_template_hash" value:"674b6bd49b"} metric_labels:{key:"serviceaccount" value:"default"} tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}}
//
// RESTART POD
//
INFO[0036] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:2886860802} port:9999} weight:10000}
INFO[0036]
INFO[0041] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:2886860802} port:9999} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"deployment" value:"nginx-deployment"} metric_labels:{key:"namespace" value:"default"} metric_labels:{key:"pod" value:"nginx-deployment-674b6bd49b-gnd6w"} metric_labels:{key:"pod_template_hash" value:"674b6bd49b"} metric_labels:{key:"serviceaccount" value:"default"} tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}}
INFO[0041]

The only issue is that we first send the endpoint and then the endpoint information in a subsequent update (i.e. pod metrics). idk if this is a big problem, it's not fresh in my brain how the proxy would handle the connections here, since the IP is the same I assume it just re-uses the connection, it won't tear it down.

controller/api/destination/watcher/pod_watcher.go Outdated Show resolved Hide resolved
controller/api/destination/watcher/pod_watcher.go Outdated Show resolved Hide resolved
controller/api/destination/watcher/pod_watcher.go Outdated Show resolved Hide resolved
controller/api/destination/watcher/pod_watcher.go Outdated Show resolved Hide resolved
controller/api/destination/watcher/pod_watcher.go Outdated Show resolved Hide resolved
controller/api/destination/hostport_adaptor.go Outdated Show resolved Hide resolved
@alpeb
Copy link
Member Author

alpeb commented Sep 5, 2023

Thanks for the detailed review @mateiidavid.

Using the pattern here is difficult because it would otherwise be hard to slot the pod watcher into the endpoint translator? (e.g. running a watcher that can propagate updates to endpoint translator listeners). The idea being that the adaptor is the listener that understands pod events?

You're right inquire about the logic behind that separation. I think it made sense at some point during my implementation process, but I see no reason for that anymore. So I'll be coalescing all the logic into a refactored endpointProfileTranslator.

The only issue is that we first send the endpoint and then the endpoint information in a subsequent update (i.e. pod metrics). idk if this is a big problem, it's not fresh in my brain how the proxy would handle the connections here, since the IP is the same I assume it just re-uses the connection, it won't tear it down.

I think the reason behind those separate messages is that after deleting the pod the hostip:port no longer has a backing pod so the following message only sends the IP. And then when the new pod becomes ready a new message is sent with all the pod's info (metric labels, identity and protocol hint).
This seemed to behave well from my cursory testing, but I'd love to get more confirmation that this is a right behavior.

Base automatically changed from alpeb/hostport-fixup-stopgap to main September 6, 2023 15:35
@olix0r
Copy link
Member

olix0r commented Sep 6, 2023

This seems big enough that I think we ought to hold merging this until we've branched release/stable-2.14.

@adleong
Copy link
Member

adleong commented Sep 8, 2023

There are few things here that deviate from the patterns established in other watchers/translators:

  • subscribing to watches in a translator
  • having a watcher broadcast ALL events to all listeners without filtering

These are surprising and make this code difficult for me to follow. In particular, the logic in the endpointProfileTransformer has become very complex. I had something different in mind:

Rather than having the PodWatcher broadcast all pod events to all listeners, instead have it contain a map of "<ip>:<port>" to PodPublisher, following a similar pattern to the other watches. PodWatcher would need to register event handlers for both Pods and Servers. When it gets a pod event, it should update all the registered PodPublishers which match the pod IP or which match the hostport if the pod has one. When it gets a server event, it should update all the registered PodPublishers which have a pod which matches the server selector. This moves most of the complexity into the watcher (where it belongs, imo).

This makes PodWatcher follow the same pattern as EndpointsWatcher, but for Pods instead of for Services.

The result of this is that the endpointProfileTranslator just needs to be updated to implement PodListener instead of ServerListener. All endpointProfile queries (i.e. calls through subscribeToEndpointProfile) can use PodWatcher regardless of if the query is for pod network ip or a host network ip. ServerWatcher becomes unused and can be deleted.

Let me know if this makes sense. Happy to discuss further.

@alpeb
Copy link
Member Author

alpeb commented Sep 11, 2023

Thanks @adleong for the comprehensive guidelines, it all makes sense to me.

instead have it contain a map of ":" to PodPublisher

You mean a map of ip:port?

@adleong
Copy link
Member

adleong commented Sep 11, 2023

Yes, I got tripped up by markdown formatting. Fixed.

@alpeb
Copy link
Member Author

alpeb commented Sep 14, 2023

This latest change moves most of the logic into the PodWatcher, which now watches over pods and Servers. As suggested, ServerWatcher has been decomissioned.

The CreateAddress() function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. getAnnotatedOpaquePorts() got also moved for similar reasons.

Other things to note about PodWatcher:

  • It publishes a new pair of metrics ip_port_subscribers and ip_port_updates leveraging the framework in prometheus.go.
  • The complexity in updatePod() is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event. Let me know if that logic isn't clear enough.

Finally, endpointProfileTranslator's endpoint (*pb.WeightedAddr) not being a static object anymore, the Update() function now receives an Address that allows it to rebuild the endpoint on the fly (and so createEndpoint() was converted into a method of endpointProfileTranslator).

go func(pp *podPublisher) {
updated := false
for _, listener := range pp.listeners {
addr, err := CreateAddress(pw.k8sAPI, pw.metadataAPI, pw.defaultOpaquePorts, pp.pod, pp.ip, pp.port)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about how opaque ports flow from servers to pods. It seems like, whenever we process a server updated, we find all publishers with pods which match that server and then call into CreateAddress which then goes and finds all Servers which select that pod and builds the opaque ports that way.

Is the reason we do this because it's not possible to determine the new opaque ports for a pod just by looking at the updated server because it may have opaque ports which overlap with other servers which match that pod? That sounds right, but it's a subtle point that probably warrants a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, taking this thinking a step further, we probably need to potentially update ALL podPublishers, even ones which don't match the server's selector because the server's selector may have changed so that it no longer selects certain pods. Those pods which are no longer selected may therefore need their opaque ports updated.

Taking this all together, I think that any time any server is updated, we need to recalculate all the opaque ports for all the pod publishers. Assuming we're using client-go caches, this shouldn't be too expensive since it won't result in API calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason we do this because it's not possible to determine the new opaque ports for a pod just by looking at the updated server because it may have opaque ports which overlap with other servers which match that pod? That sounds right, but it's a subtle point that probably warrants a comment.

Right, the handler carries information limited to the changed Server, and so it only acts to signal that the final state for the relevant pods needs to be recalculated. I'll add a comment to that effect.

Actually, taking this thinking a step further, we probably need to potentially update ALL podPublishers, even ones which don't match the server's selector because the server's selector may have changed so that it no longer selects certain pods. Those pods which are no longer selected may therefore need their opaque ports updated.

Taking this all together, I think that any time any server is updated, we need to recalculate all the opaque ports for all the pod publishers. Assuming we're using client-go caches, this shouldn't be too expensive since it won't result in API calls.

Good point. The Server watcher and the lister used in CreateAddress() rely on the same informer so indeed, this should result in extra network calls. I'm a little worried though about CPU usage. I think we can be a little smarter in the Server watcher handlers. Currently all events are handled via this single updateServer() method. I think it's fine to keep the same logic for the add and delete events, in that only the pods matching the selectors are concerned. For the update event, we can check whether the selector changed, and consider the union of pods that match both the old and new selectors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we estimate whether this is going to cause problematic CPU usage? Server updates probably wouldn't happen more than once a second. On each Server update we would have to evaluate a selector against every pod in the cluster. Assuming a 10000 pod cluster, doing 10000 selector evaluations every time a server updates doesn't seem like it should be too expensive. Are there other computational costs we should consider?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I wasn't taking into account the frequency of Server updates. I'll leave the logic as is for now and see if I can come up with a way to actually test resource consumption when an update is triggered. Another thing we'd like to avoid is sending dupe messages to every stream on every Server update. I'll add something into the endpointProfileTranslator to avoid that.

controller/api/destination/watcher/pod_watcher.go Outdated Show resolved Hide resolved
controller/api/destination/watcher/pod_watcher.go Outdated Show resolved Hide resolved
@alpeb
Copy link
Member Author

alpeb commented Sep 19, 2023

I Moved the Address/pod object creation logic into the podWatcher, on the getOrNewPodPublisher() method, called from Subscribe(). The latter receives service, hostname and ip, some of which can be empty depending on the case.

With this change also the auxiliary functions getIndexedPod() and podReceivingTraffic() were moved into the watcher package.

Also the Server updates handler now triggers updates for all podPublishers regardless of the Server selectors. The endpointProfileTranslator is now tracking the last message sent, to avoid triggering unneeded dupe messages.

Other changes:

  • Added fields defaultOpaquePorts, k8sAPI and metadataAPI into the podPublisher struct.
  • Removed the fields ip and port from the endpointProfileTranslator, which are not used.

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@alpeb
Copy link
Member Author

alpeb commented Sep 26, 2023

@mateiidavid and @adleong , while testing this again I think I came across an oversight that I wanted to check with you before fixing it.
SetToServerProtocol will update the Address object's OpaqueProtocol field whenever a Server resource gets updated. But we're only checking for ContainerPort in there, ignoring if there's a match against a HostPort, so I'll have to add that as well. Sounds right?

@adleong
Copy link
Member

adleong commented Sep 26, 2023

after thinking about this a bit, yes, I think you're right. Because the container is available at both the container port (on the pod network) AND the host port (on the host network).

Copy link
Member

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow! great change, lots of moving parts but it all makes sense and the design is imo really clean.

This is ready to ship from my side. Left two concurrency related questions though.

controller/api/destination/server.go Show resolved Hide resolved
controller/api/destination/watcher/pod_watcher.go Outdated Show resolved Hide resolved
@@ -1321,7 +1321,8 @@
case intstr.String:
for _, c := range address.Pod.Spec.Containers {
for _, p := range c.Ports {
if p.ContainerPort == int32(port) && p.Name == server.Spec.Port.StrVal {
if (p.ContainerPort == int32(port) || p.HostPort == int32(port)) &&

Check failure

Code scanning / CodeQL

Incorrect conversion between integer types High

Incorrect conversion of an integer with architecture-dependent bit size from
strconv.Atoi
to a lower bit size type int32 without an upper bound check.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alert is bogus because right after the strconv.Atoi call mentioned there we're already checking for boundaries.

controller/api/destination/watcher/endpoints_watcher.go Dismissed Show dismissed Hide dismissed
alpeb and others added 9 commits September 27, 2023 17:48
Followup to #11328, based off of `alpeb/hostport-fixup-stopgap`.

Implements a new pod watcher, instantiated along the other ones in the
Destination server. It's generic enough to catch all pod events in the
cluster, so it's up to the subscribers to filter out the ones they're
interested in, and to set up any metrics.

In the Destination server's `subscribeToEndpointProfile` method, we
create a new `HostPortAdaptor` that is subscribed to the pod watcher,
and forwards the pod and protocol updates to the
`endpointProfileTranslator`. Handling of Server subscriptions are now
handled by this adaptor, which are recycled whenever the pod changes.

A new gauge metric `host_port_subscribers` has been created, tracking
the number of subscribers for a given HostIP+port combination.

## Other Changes

- Moved the `server.createAddress` method into a static function in
  `endpoints_watcher.go`, for better reusability.
- The "Return profile for host port pods" test introduced in #11328 was
  extended to track the ensuing events after a pod is deleted and then
  recreated (:taco: to @adleong for the test).
- Given that test consumes multiple events, we had to change the
  `profileStream` test helper to allow for the `GetProfile` call to
  block. Callers to `profileStream` now need to manually cancel the
  returned stream.
This change moves most of the logic into the PodWatcher, which now watches over pods and Servers. As suggested, ServerWatcher has been decomissioned.

The `CreateAddress()` function has been moved into a function of the PodWatcher, because now we're calling it on every every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. `getAnnotatedOpaquePorts()` shared the same fate for similar reasons.

Other things to note about PodWatcher:
- It publishes a new pair of metrics `ip_port_subscribers` and `ip_port_updates` leveraging the framework in `prometheus.go`.
- The complexity in `updatePod()` is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event. Let me know if that logic isn't clear enough.

Finally, endpointProfileTranslator's `endpoint` (*pb.WeightedAddr) not being a static object anymore, its `Update()` function now receives an Address that allows it to rebuild the endpoint on the fly (and so `createEndpoint()` was converted into a method of endpointProfileTranslator.
Moved the Address/pod object creation logic into the podWatcher, on the `getOrNewPodPublisher()` method, called from `Subscribe()`. The latter receives `service`, `hostname` and `ip`, which can be empty depending on the case.

With this change also the auxiliary functions `getIndexedPod()` and `podReceivingTraffic()` were moved into the `watcher` package.

Also the Server updates handler now triggers updates for all podPublishers regardless of the Server selectors. The endpointProfileTranslator is now tracking the last message sent, to avoid sending dupe messages.

Other changes:
- Added fields `defaultOpaquePorts`, `k8sAPI` and `metadataAPI` into the podPublisher struct.
- Removed the fields `ip` and `port` from the endpointProfileTranslator, which are not used.
Co-authored-by: Matei David <matei@buoyant.io>
Copy link
Member

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have my 🦭 of ✅ !

@alpeb
Copy link
Member Author

alpeb commented Sep 28, 2023

Thanks folks for the great feedback 😄
I'm merging this and will shortly followup with an extension of the unit test to cover the bug I fixed (described in the #11334 (comment) above).

@alpeb alpeb closed this Sep 28, 2023
@alpeb alpeb reopened this Sep 28, 2023
@alpeb alpeb merged commit 65ddba4 into main Sep 28, 2023
66 checks passed
@alpeb alpeb deleted the alpeb/hostport-podwatcher branch September 28, 2023 13:57
alpeb added a commit that referenced this pull request Sep 28, 2023
Followup to #11334 (comment)

This extends the test introduced in #11334 to excercise upgrading a
Server associated to a pod's HostPort, and observing how the stream
updates the OpaqueProtocol field.

Helper functions were refactored a bit to allow retrieving the
l5dCRDClientSet used when building the fake API.
olix0r added a commit that referenced this pull request Sep 28, 2023
This edge release makes Linkerd even better.

* Added a controlPlaneVersion override to the `linkerd-control-plane` Helm chart
  to support including SHA256 image digests in Linkerd manifests (thanks
  @cromulentbanana!) ([#11406])
* Improved `linkerd viz check` to attempt to validate that the Prometheus scrape
  interval will work well with the CLI and Web query parameters ([#11376])
* Fixed an issue where the destination controller would not update pod metadata
  for profile resolutions for a pod accessed via the host network (e.g.
  HostPort endpoints) ([#11334]).
* Added a validating webhook config for httproutes.gateway.networking.k8s.io
  resources (thanks @mikutas!) ([#11150])

[#11150]: #11150
[#11334]: #11334
[#11376]: #11376
[#11406]: #11406
@olix0r olix0r mentioned this pull request Sep 28, 2023
olix0r added a commit that referenced this pull request Sep 28, 2023
This edge release makes Linkerd even better.

* Added a controlPlaneVersion override to the `linkerd-control-plane` Helm chart
  to support including SHA256 image digests in Linkerd manifests (thanks
  @cromulentbanana!) ([#11406])
* Improved `linkerd viz check` to attempt to validate that the Prometheus scrape
  interval will work well with the CLI and Web query parameters ([#11376])
* Fixed an issue where the destination controller would not update pod metadata
  for profile resolutions for a pod accessed via the host network (e.g.
  HostPort endpoints) ([#11334]).
* Added a validating webhook config for httproutes.gateway.networking.k8s.io
  resources (thanks @mikutas!) ([#11150])

[#11150]: #11150
[#11334]: #11334
[#11376]: #11376
[#11406]: #11406
olix0r added a commit that referenced this pull request Sep 29, 2023
This edge release makes Linkerd even better.

* Added a controlPlaneVersion override to the `linkerd-control-plane` Helm chart
  to support including SHA256 image digests in Linkerd manifests (thanks
  @cromulentbanana!) ([#11406])
* Improved `linkerd viz check` to attempt to validate that the Prometheus scrape
  interval will work well with the CLI and Web query parameters ([#11376])
* Improved CLI error handling to print differentiated error information when
  versioncheck.linkerd.io cannot be resolved (thanks @dtaskai) ([#11377])
* Fixed an issue where the destination controller would not update pod metadata
  for profile resolutions for a pod accessed via the host network (e.g.
  HostPort endpoints) ([#11334]).
* Added a validating webhook config for httproutes.gateway.networking.k8s.io
  resources (thanks @mikutas!) ([#11150])
* Introduced a new `multicluster check --timeout` flag to limit the time
  allowed for Kubernetes API calls (thanks @moki1202) ([#11420])

[#11150]: #11150
[#11334]: #11334
[#11376]: #11376
[#11377]: #11377
[#11406]: #11406
[#11420]: #11420
alpeb added a commit that referenced this pull request Oct 2, 2023
Followup to #11334 (comment)

This extends the test introduced in #11334 to excercise upgrading a
Server associated to a pod's HostPort, and observing how the stream
updates the OpaqueProtocol field.

Helper functions were refactored a bit to allow retrieving the
l5dCRDClientSet used when building the fake API.
mateiidavid pushed a commit that referenced this pull request Oct 26, 2023
…kup changes (#11334)

Followup to #11328

Implements a new pod watcher, instantiated along the other ones in the Destination server. It also watches on Servers and carries all the logic from ServerWatcher, which has now been decommissioned.

The `CreateAddress()` function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. `getAnnotatedOpaquePorts()` got also moved for similar reasons.

Other things to note about PodWatcher:

- It publishes a new pair of metrics `ip_port_subscribers` and `ip_port_updates` leveraging the framework in `prometheus.go`.
- The complexity in `updatePod()` is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event.
- 
Finally, endpointProfileTranslator's `endpoint` (*pb.WeightedAddr) not being a static object anymore, the `Update()` function now receives an Address that allows it to rebuild the endpoint on the fly (and so `createEndpoint()` was converted into a method of endpointProfileTranslator).
mateiidavid pushed a commit that referenced this pull request Oct 26, 2023
Followup to #11334 (comment)

This extends the test introduced in #11334 to excercise upgrading a
Server associated to a pod's HostPort, and observing how the stream
updates the OpaqueProtocol field.

Helper functions were refactored a bit to allow retrieving the
l5dCRDClientSet used when building the fake API.
mateiidavid added a commit that referenced this pull request Oct 26, 2023
This stable release fixes issues in the proxy and Destination controller which
can result in Linkerd proxies sending traffic to stale endpoints. In addition,
it contains a bug fix for profile resolutions for pods bound on host ports and
includes patches for security advisory [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3

* Control Plane
  * Fixed an issue where the Destination controller could stop processing
    changes in the endpoints of a destination, if a proxy subscribed to that
    destination stops reading service discovery updates. This issue results in
    proxies attempting to send traffic for that destination to stale endpoints
    ([#11483], fixes [#11480], [#11279], [#10590])
  * Fixed an issue where the Destination controller would not update pod
    metadata for profile resolutions for a pod accessed via the host network
    (e.g. HostPort endpoints) ([#11334])
  * Addressed [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 by upgrading several
    dependencies (including Go's gRPC and net libraries)

* Proxy
  * Fixed a regression where the proxy rendered `grpc_status` metric labels as
    a string rather than as the numeric status code ([linkerd2-proxy#2480];
    fixes [#11449])
  * Fixed a regression introduced in stable-2.13.0 where proxies would not
    terminate unusred service discovery watches, exerting backpressure on the
    Destination controller which could cause it to become stuck
    ([linkerd2-proxy#2484])

[#10590]: #10590
[#11279]: #11279
[#11483]: #11483
[#11480]: #11480
[#11334]: #11334
[#11449]: #11449
[CVE-2023-44487]: GHSA-qppj-fm5r-hxr3
[linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480
[linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484

Signed-off-by: Matei David <matei@buoyant.io>
@mateiidavid mateiidavid mentioned this pull request Oct 26, 2023
mateiidavid added a commit that referenced this pull request Oct 26, 2023
This stable release fixes issues in the proxy and Destination controller which
can result in Linkerd proxies sending traffic to stale endpoints. In addition,
it contains a bug fix for profile resolutions for pods bound on host ports and
includes patches for security advisory [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3

* Control Plane
  * Fixed an issue where the Destination controller could stop processing
    changes in the endpoints of a destination, if a proxy subscribed to that
    destination stops reading service discovery updates. This issue results in
    proxies attempting to send traffic for that destination to stale endpoints
    ([#11483], fixes [#11480], [#11279], [#10590])
  * Fixed an issue where the Destination controller would not update pod
    metadata for profile resolutions for a pod accessed via the host network
    (e.g. HostPort endpoints) ([#11334])
  * Addressed [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 by upgrading several
    dependencies (including Go's gRPC and net libraries)

* Proxy
  * Fixed a regression where the proxy rendered `grpc_status` metric labels as
    a string rather than as the numeric status code ([linkerd2-proxy#2480];
    fixes [#11449])
  * Fixed a regression introduced in stable-2.13.0 where proxies would not
    terminate unusred service discovery watches, exerting backpressure on the
    Destination controller which could cause it to become stuck
    ([linkerd2-proxy#2484])

[#10590]: #10590
[#11279]: #11279
[#11483]: #11483
[#11480]: #11480
[#11334]: #11334
[#11449]: #11449
[CVE-2023-44487]: GHSA-qppj-fm5r-hxr3
[linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480
[linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484

---------

Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
alpeb added a commit that referenced this pull request Nov 7, 2023
This removes `endpointProfileTranslator`'s dependency on the k8sAPI and
metadataAPI, which are not used. This was introduced in one of #11334's
refactorings, but ended up being not required. No functional changes
here.
alpeb added a commit that referenced this pull request Nov 14, 2023
…11587)

This removes `endpointProfileTranslator`'s dependency on the k8sAPI and
metadataAPI, which are not used. This was introduced in one of #11334's
refactorings, but ended up being not required. No functional changes
here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants