Manage endpoints when KIngress status contains an IP address #11843

dprotaso · 2021-08-20T20:58:10Z

Part of: #11821

Proposed Changes

add an error status were the route doesn't own the endpoints object
Ingress LBs with IP now have priority over Domain & DomainInternal
continue preserving the clusterIP if set - include a test
refactor the route constructor to remove duplication
manage endpoints when the Ingress returns an IP load balancer status

TODO

Test with a fork of net-istio
Test with a fork of net-contour
Test with a fork of net-kourier

Release Note

NONE

codecov · 2021-08-20T22:51:00Z

Codecov Report

Merging #11843 (e1d50a1) into main (21e0d8e) will decrease coverage by 0.07%.
The diff coverage is 84.61%.

@@            Coverage Diff             @@
##             main   #11843      +/-   ##
==========================================
- Coverage   87.81%   87.73%   -0.08%     
==========================================
  Files         196      196              
  Lines        9393     9430      +37     
==========================================
+ Hits         8248     8273      +25     
- Misses        890      895       +5     
- Partials      255      262       +7

Impacted Files	Coverage Δ
pkg/reconciler/route/route.go	`79.89% <ø> (+0.19%)`	⬆️
pkg/reconciler/route/reconcile_resources.go	`76.66% <71.42%> (-4.94%)`	⬇️
pkg/reconciler/route/resources/service.go	`93.02% <97.10%> (+1.77%)`	⬆️
pkg/apis/serving/v1/route_lifecycle.go	`100.00% <100.00%> (ø)`
pkg/reconciler/route/controller.go	`100.00% <100.00%> (ø)`
pkg/reconciler/revision/reconcile_resources.go	`80.72% <0.00%> (-2.41%)`	⬇️
pkg/reconciler/revision/controller.go	`86.00% <0.00%> (-0.28%)`	⬇️
pkg/reconciler/gc/controller.go	`0.00% <0.00%> (ø)`
pkg/queue/health/health_state.go	`100.00% <0.00%> (ø)`
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21e0d8e...e1d50a1. Read the comment docs.

dprotaso · 2021-08-21T16:55:15Z

Contour changes are here: knative-extensions/net-contour#582 - all I'm doing is including the service IP with the Domain*

dprotaso · 2021-08-21T17:41:17Z

Istio changes are here: knative-extensions/net-istio#731 - same sorta diff as contour

note: probably not worth including this change to istio unless they disable external name forwarding like contour

dprotaso · 2021-08-21T17:54:29Z

Kourier changes are here: knative-extensions/net-kourier#605 - same diff as the others

dprotaso · 2021-08-21T21:32:55Z

/retest

dprotaso · 2021-08-22T00:48:23Z

I think the istio mesh tests are just flakey - as the code path hasn't changed
/retest

dprotaso · 2021-08-22T00:50:02Z

Everything else went green which is great

knative-prow-robot · 2021-08-22T02:41:21Z

@dprotaso: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Rerun command
pull-knative-serving-istio-stable-mesh	`c5b8f66`	link	`/test pull-knative-serving-istio-stable-mesh`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dprotaso · 2021-08-22T03:53:17Z

ok - so testgrid is showing incorrect results - no scale tests appear there

https://testgrid.k8s.io/r/knative-own-testgrid/serving#istio-stable-mesh&show-stale-tests=

but in fact it is failing ie. latest run - https://storage.googleapis.com/knative-prow/logs/ci-knative-serving-istio-stable-mesh/1429246291283546112/build-log.txt

nak3 · 2021-08-23T00:19:15Z

mesh test is often killed in the mid of the scale test.
It is related to this proposal #11552 to change the test order. (This may not fix the scale test perfectly but it exposes some HA tests which are also hidden by the scale test.)

nak3 · 2021-08-23T03:34:28Z

pkg/reconciler/route/controller.go

@@ -60,6 +61,7 @@ func newController(
 ) *controller.Impl {
 	logger := logging.FromContext(ctx)
 	serviceInformer := serviceinformer.Get(ctx)
+	endpointsInformer := endpointsinformer.Get(ctx)


We don't add event handler endpointsInformer.Informer().AddEventHandler(handleControllerOf) for the endpoints?

I thought about but thought it wasn't necessary since

Users will probably not have access to modify this endpoint object (because of CVE-2021-25740: Endpoint & EndpointSlice permissions allow cross-Namespace forwarding kubernetes/kubernetes#103675) - and we have the option for the visibility label on the service

There's no information from the endpoint being propagated to the route

Though we may want to 'scope' the endpoint informer to only watch endpoints with that controller route label.

pkg/reconciler/route/resources/service.go

nak3 · 2021-08-23T05:36:28Z

pkg/reconciler/route/resources/service.go


-	lbStatus := ingressStatus.PublicLoadBalancer
-	if isPrivate || ingressStatus.PrivateLoadBalancer != nil {
+	if isPrivate || privateLB != nil && len(privateLB.Ingress) != 0 {


Does len(privateLB.Ingress) != 0 need here? The following line checks len(privateLB.Ingress) if 0 or more than 1.

I added this because I don't think we should error out if the private loadBalancers has no ingress statuses but the public one does

pkg/reconciler/route/reconcile_resources.go

nak3 · 2021-08-23T06:21:05Z

/test pull-knative-serving-istio-stable-mesh-short

nak3 · 2021-08-23T11:32:41Z

pkg/reconciler/route/reconcile_resources.go

+				case corev1.ServiceTypeExternalName:
+					canUpdate = true
+				default:
+					// Transitions from ClusterIP to ExternalName Fail


We need to delete the placeholder services manually when downgrading the serving. We probably need to document it?

Good point will add this as release note

I think this is bad as this could cause downgrading fails by default for all users without any manual work. And I really feel that we should handle the backward compatibility issue instead of just documenting it.

Would it be OK to:

putting this feature behind a flag and set the flag to false in 0.26 release.

handling the backward compatibility issue in 0.26 release

enabling the feature by default in 0.27 release

Did you see my comment here: #11843 (comment) about a potential downgrade path that doesn't require manually deleting k8s services?

Also the prior behaviour is preserved - it's only until the net-* plugins start setting the LoadBalancers status IP that triggers the changes in this PR. So if net-istio/kourier don't play on setting that value then downgrade will work.

Thanks for the pointer, Dave. The comment makes sense.

For downgrading, I am not sure how easy it would be for users to do the downgrade based on the orders mentioned in the comment. To make the downgrade easier, I would suggest we ship this PR in 0.26 release, and populate Kingress IP from the net-* repo in 0.27+ release if the maintainers are OK with it. WDYT @nak3 ?

Yes, net-contour needs this change ASAP but net-istio/kourier do not need to rush so adopting it on 0.27+ would be fine. It depends on each repo maintainer's decision as you also said, though.

net-istio/kourier do not need to rush so adopting it on 0.27+ would be fine. It depends on each repo maintainer's decision as you also said, though.

I'd say only set the IP address if you need to

julz

this is awesome, very cool that this works \o/ ❤️

pkg/apis/serving/v1/route_lifecycle.go

pkg/reconciler/route/reconcile_resources.go

evankanderson

Do we need a flag here (default false this release) to handle the downgrade case?

dprotaso · 2021-08-24T17:10:39Z

Re: Downgrade

So it's actually the ingress provider that drives us to managing an endpoints object by setting the IP property in the Kingress status. So upgrade and downgrade will work fine when the net-* version remains stable.

If the net-* version were to vary then the safest thing to do would be

When upgrading:

Upgrade knative-serving
Upgrade net-* plugin

When downgrading:

Downgrade net-* plugin
Wait for things to reconcile back to what they were before
Downgrade knative-serving

dprotaso · 2021-08-24T17:11:39Z

Probably worth stating explicitly:

I'm not going to PR the mentioned IP changes to net-kourier and net-istio - they were done to verify the changes in this PR.. I think it's for the net-* maintainer to make the call and secondly whether they can even support routing traffic to headless services (I don't see why it wouldn't work).

For net-contour we'll make the change since the default setting is to not support these ExternalName Services.

dprotaso · 2021-08-24T17:57:33Z

/hold cancel

pkg/reconciler/route/resources/service.go

ZhiminXiang · 2021-08-24T22:02:30Z

pkg/reconciler/route/reconcile_resources.go

+				case corev1.ServiceTypeExternalName:
+					canUpdate = true
+				default:
+					// Transitions from ClusterIP to ExternalName Fail


I think this is bad as this could cause downgrading fails by default for all users without any manual work. And I really feel that we should handle the backward compatibility issue instead of just documenting it.

Would it be OK to:

putting this feature behind a flag and set the flag to false in 0.26 release.

handling the backward compatibility issue in 0.26 release

enabling the feature by default in 0.27 release

nak3 · 2021-08-26T02:14:30Z

/lgtm
/hold

/hold for other reviewers.

julz

🔥 this is great

/lgtm

knative-prow-robot · 2021-08-26T09:05:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, julz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/apis/OWNERS~~ [dprotaso,julz]
~~pkg/reconciler/route/OWNERS~~ [dprotaso,julz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ZhiminXiang

/lgtm

dprotaso · 2021-08-26T17:35:09Z

/hold cancel

…#11843) * add an error status were the route doesn't own the endpoints object * Ingress LBs with IP now have priority over Domain & DomainInternal * continue preserving the clusterIP if set - include a test * refactor the route constructor to remove duplication * manage endpoints when the Ingress returns an IP load balancer status * fix comment & drop deleted function * fix comment * fix linter warning - remove unused function * address PR feedback * fix comment

…native#11843)" This reverts commit ce627e5.

knative-prow-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 20, 2021

google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Aug 20, 2021

knative-prow-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/API API objects and controllers area/networking approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 20, 2021

dprotaso added 4 commits August 20, 2021 16:58

add an error status were the route doesn't own the endpoints object

17d674c

Ingress LBs with IP now have priority over Domain & DomainInternal

8c0cb48

continue preserving the clusterIP if set - include a test

7170dfe

refactor the route constructor to remove duplication

3c2f0f0

dprotaso force-pushed the route-endpoint-management branch from 85f30f5 to 675c723 Compare August 20, 2021 22:33

knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 20, 2021

dprotaso added 2 commits August 20, 2021 18:44

manage endpoints when the Ingress returns an IP load balancer status

0075eeb

fix comment & drop deleted function

f1640e5

dprotaso force-pushed the route-endpoint-management branch from b385454 to f1640e5 Compare August 20, 2021 22:44

fix comment

4f74779

knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 22, 2021

nak3 reviewed Aug 23, 2021

View reviewed changes

fix linter warning - remove unused function

84c2f20

knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 23, 2021

nak3 reviewed Aug 23, 2021

View reviewed changes

nak3 mentioned this pull request Aug 23, 2021

Fali to create/update placeholder service when k8s open IPv4/IPv6 dual-stack feature #9045

Open

julz reviewed Aug 23, 2021

View reviewed changes

evankanderson reviewed Aug 23, 2021

View reviewed changes

address PR feedback

ae1a8d8

knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 24, 2021

ZhiminXiang reviewed Aug 24, 2021

View reviewed changes

fix comment

e1d50a1

knative-prow-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Aug 26, 2021

julz approved these changes Aug 26, 2021

View reviewed changes

knative-prow-robot assigned julz Aug 26, 2021

ZhiminXiang reviewed Aug 26, 2021

View reviewed changes

knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 26, 2021

knative-prow-robot merged commit ce627e5 into knative:main Aug 26, 2021

dprotaso deleted the route-endpoint-management branch August 26, 2021 20:08

nak3 added a commit to nak3/serving that referenced this pull request Oct 12, 2021

Revert "Manage endpoints when KIngress status contains an IP address (k…

3d87ae6

…native#11843)" This reverts commit ce627e5.

dprotaso mentioned this pull request Oct 27, 2021

Disable Knative's Contour support for ExternalName K8s Services knative-extensions/net-contour#582

Merged

This was referenced Apr 15, 2024

Not all Gateways have a local K8s Service knative-extensions/net-gateway-api#665

Closed

Contour External has ExternalName Services Support knative-extensions/net-gateway-api#638

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage endpoints when KIngress status contains an IP address #11843

Manage endpoints when KIngress status contains an IP address #11843

dprotaso commented Aug 20, 2021 •

edited

Loading

codecov bot commented Aug 20, 2021 •

edited

Loading

dprotaso commented Aug 21, 2021

dprotaso commented Aug 21, 2021

dprotaso commented Aug 21, 2021

dprotaso commented Aug 21, 2021

dprotaso commented Aug 22, 2021

dprotaso commented Aug 22, 2021

knative-prow-robot commented Aug 22, 2021

dprotaso commented Aug 22, 2021

nak3 commented Aug 23, 2021 •

edited

Loading

nak3 Aug 23, 2021

dprotaso Aug 23, 2021 •

edited

Loading

nak3 Aug 23, 2021

dprotaso Aug 24, 2021 •

edited

Loading

nak3 commented Aug 23, 2021

nak3 Aug 23, 2021

dprotaso Aug 24, 2021

ZhiminXiang Aug 24, 2021

dprotaso Aug 25, 2021

ZhiminXiang Aug 25, 2021 •

edited

Loading

nak3 Aug 25, 2021

dprotaso Aug 26, 2021

julz left a comment

evankanderson left a comment

dprotaso commented Aug 24, 2021

dprotaso commented Aug 24, 2021 •

edited

Loading

dprotaso commented Aug 24, 2021

ZhiminXiang Aug 24, 2021

nak3 commented Aug 26, 2021

julz left a comment

knative-prow-robot commented Aug 26, 2021

ZhiminXiang left a comment

dprotaso commented Aug 26, 2021

Manage endpoints when KIngress status contains an IP address #11843

Manage endpoints when KIngress status contains an IP address #11843

Conversation

dprotaso commented Aug 20, 2021 • edited Loading

Proposed Changes

TODO

codecov bot commented Aug 20, 2021 • edited Loading

Codecov Report

dprotaso commented Aug 21, 2021

dprotaso commented Aug 21, 2021

dprotaso commented Aug 21, 2021

dprotaso commented Aug 21, 2021

dprotaso commented Aug 22, 2021

dprotaso commented Aug 22, 2021

knative-prow-robot commented Aug 22, 2021

dprotaso commented Aug 22, 2021

nak3 commented Aug 23, 2021 • edited Loading

Choose a reason for hiding this comment

dprotaso Aug 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dprotaso Aug 24, 2021 • edited Loading

Choose a reason for hiding this comment

nak3 commented Aug 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhiminXiang Aug 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julz left a comment

Choose a reason for hiding this comment

evankanderson left a comment

Choose a reason for hiding this comment

dprotaso commented Aug 24, 2021

dprotaso commented Aug 24, 2021 • edited Loading

dprotaso commented Aug 24, 2021

Choose a reason for hiding this comment

nak3 commented Aug 26, 2021

julz left a comment

Choose a reason for hiding this comment

knative-prow-robot commented Aug 26, 2021

ZhiminXiang left a comment

Choose a reason for hiding this comment

dprotaso commented Aug 26, 2021

dprotaso commented Aug 20, 2021 •

edited

Loading

codecov bot commented Aug 20, 2021 •

edited

Loading

nak3 commented Aug 23, 2021 •

edited

Loading

dprotaso Aug 23, 2021 •

edited

Loading

dprotaso Aug 24, 2021 •

edited

Loading

ZhiminXiang Aug 25, 2021 •

edited

Loading

dprotaso commented Aug 24, 2021 •

edited

Loading