-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manage endpoints when KIngress status contains an IP address #11843
Manage endpoints when KIngress status contains an IP address #11843
Conversation
85f30f5
to
675c723
Compare
b385454
to
f1640e5
Compare
Codecov Report
@@ Coverage Diff @@
## main #11843 +/- ##
==========================================
- Coverage 87.81% 87.73% -0.08%
==========================================
Files 196 196
Lines 9393 9430 +37
==========================================
+ Hits 8248 8273 +25
- Misses 890 895 +5
- Partials 255 262 +7
Continue to review full report at Codecov.
|
Contour changes are here: knative-extensions/net-contour#582 - all I'm doing is including the service IP with the |
Istio changes are here: knative-extensions/net-istio#731 - same sorta diff as contour note: probably not worth including this change to istio unless they disable external name forwarding like contour |
Kourier changes are here: knative-extensions/net-kourier#605 - same diff as the others |
/retest |
I think the istio mesh tests are just flakey - as the code path hasn't changed |
@dprotaso: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
ok - so testgrid is showing incorrect results - no scale tests appear there https://testgrid.k8s.io/r/knative-own-testgrid/serving#istio-stable-mesh&show-stale-tests= but in fact it is failing ie. latest run - https://storage.googleapis.com/knative-prow/logs/ci-knative-serving-istio-stable-mesh/1429246291283546112/build-log.txt |
mesh test is often killed in the mid of the scale test. |
@@ -60,6 +61,7 @@ func newController( | |||
) *controller.Impl { | |||
logger := logging.FromContext(ctx) | |||
serviceInformer := serviceinformer.Get(ctx) | |||
endpointsInformer := endpointsinformer.Get(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't add event handler endpointsInformer.Informer().AddEventHandler(handleControllerOf)
for the endpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about but thought it wasn't necessary since
- Users will probably not have access to modify this endpoint object (because of CVE-2021-25740: Endpoint & EndpointSlice permissions allow cross-Namespace forwarding kubernetes/kubernetes#103675) - and we have the option for the visibility label on the service
- There's no information from the endpoint being propagated to the route
Though we may want to 'scope' the endpoint informer to only watch endpoints with that controller route label.
|
||
lbStatus := ingressStatus.PublicLoadBalancer | ||
if isPrivate || ingressStatus.PrivateLoadBalancer != nil { | ||
if isPrivate || privateLB != nil && len(privateLB.Ingress) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does len(privateLB.Ingress) != 0
need here? The following line checks len(privateLB.Ingress)
if 0
or more than 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this because I don't think we should error out if the private loadBalancers has no ingress statuses but the public one does
/test pull-knative-serving-istio-stable-mesh-short |
case corev1.ServiceTypeExternalName: | ||
canUpdate = true | ||
default: | ||
// Transitions from ClusterIP to ExternalName Fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to delete the placeholder services manually when downgrading the serving. We probably need to document it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point will add this as release note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is bad as this could cause downgrading fails by default for all users without any manual work. And I really feel that we should handle the backward compatibility issue instead of just documenting it.
Would it be OK to:
- putting this feature behind a flag and set the flag to false in 0.26 release.
- handling the backward compatibility issue in 0.26 release
- enabling the feature by default in 0.27 release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see my comment here: #11843 (comment) about a potential downgrade path that doesn't require manually deleting k8s services?
Also the prior behaviour is preserved - it's only until the net-*
plugins start setting the LoadBalancers status IP that triggers the changes in this PR. So if net-istio/kourier don't play on setting that value then downgrade will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer, Dave. The comment makes sense.
For downgrading, I am not sure how easy it would be for users to do the downgrade based on the orders mentioned in the comment. To make the downgrade easier, I would suggest we ship this PR in 0.26 release, and populate Kingress IP from the net-*
repo in 0.27+ release if the maintainers are OK with it. WDYT @nak3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, net-contour needs this change ASAP but net-istio/kourier do not need to rush so adopting it on 0.27+ would be fine. It depends on each repo maintainer's decision as you also said, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
net-istio/kourier do not need to rush so adopting it on 0.27+ would be fine. It depends on each repo maintainer's decision as you also said, though.
I'd say only set the IP address if you need to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome, very cool that this works \o/ ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a flag here (default false this release) to handle the downgrade case?
Re: Downgrade So it's actually the ingress provider that drives us to managing an endpoints object by setting the If the When upgrading:
When downgrading:
|
Probably worth stating explicitly: I'm not going to PR the mentioned For net-contour we'll make the change since the default setting is to not support these |
/hold cancel |
case corev1.ServiceTypeExternalName: | ||
canUpdate = true | ||
default: | ||
// Transitions from ClusterIP to ExternalName Fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is bad as this could cause downgrading fails by default for all users without any manual work. And I really feel that we should handle the backward compatibility issue instead of just documenting it.
Would it be OK to:
- putting this feature behind a flag and set the flag to false in 0.26 release.
- handling the backward compatibility issue in 0.26 release
- enabling the feature by default in 0.27 release
/lgtm
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 this is great
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprotaso, julz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold cancel |
…#11843) * add an error status were the route doesn't own the endpoints object * Ingress LBs with IP now have priority over Domain & DomainInternal * continue preserving the clusterIP if set - include a test * refactor the route constructor to remove duplication * manage endpoints when the Ingress returns an IP load balancer status * fix comment & drop deleted function * fix comment * fix linter warning - remove unused function * address PR feedback * fix comment
…native#11843)" This reverts commit ce627e5.
Part of: #11821
Proposed Changes
TODO
Release Note