Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ipvs metaproxier endpointslice support #84089

Closed
wants to merge 1 commit into from

Conversation

khenidak
Copy link
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:
Adds endpointSlice support metaproxier (ipv6 dualstack).

Which issue(s) this PR fixes:
n/a

Special notes for your reviewer:
n/a

Does this PR introduce a user-facing change?:

EndpointSlice and ipv6 dualstack (ipvs) are no longer mutually exclusive features. They can be enabled concurrently on a cluster 

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]:  https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/20180612-ipv4-ipv6-dual-stack.md
- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/20190603-EndpointSlice-API.md

@robscott FYI..

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 18, 2019
@k8s-ci-robot k8s-ci-robot added area/ipvs sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 18, 2019
@robscott
Copy link
Member

robscott commented Oct 18, 2019

@khenidak Thanks for taking this on! Could you add some tests that cover the combination of ipvs, dualstack, and endpointslices?

@khenidak
Copy link
Contributor Author

@robscott the metaproxy is just a thin shell over actual proxy. there is no dualstackness per se, it calls what actual proxies do. A test covering meta proxy itself specially the part it selects which proxy to call might be a good idea, but beyond that is really up to proxy implementation, and is tested separately

@@ -153,25 +153,30 @@ func (proxier *metaProxier) OnEndpointsSynced() {
// OnEndpointSliceAdd is called whenever creation of a new endpoint slice object
// is observed.
func (proxier *metaProxier) OnEndpointSliceAdd(endpointSlice *discovery.EndpointSlice) {
// noop
proxier.ipv4Proxier.OnEndpointSliceAdd(endpointSlice)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the IP family need to be checked first similar to the OnEndpoints* methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if EndpointSlice can contain address in both families should we still check if it contains only v4 or only v6 first before calling both event handlers??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would metaProxier be used if the dual stack feature gate hadn't been enabled? If not, it seems likely that EndpointSlices would consistently contain both v4 and v6 addresses consistently. Are there other use cases where there might only be v4 or v6 addresses in an EndpointSlice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be over optimization, giving that a slice can grow to a 1000 entry we may actually get opposite results. I understand that you are trying to avoid the call to syncProxyRules() Let us talk tom, i want to understand more.

@andrewsykim
Copy link
Member

/assign

@robscott robscott mentioned this pull request Oct 25, 2019
8 tasks
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 5, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: khenidak
To complete the pull request process, please assign andrewsykim
You can assign the PR to them by writing /assign @andrewsykim in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@khenidak
Copy link
Contributor Author

khenidak commented Nov 5, 2019

@andrewsykim the filtering is now done at the poxy level. it should work, though i am not totally comfortable with this change.

@andrewsykim
Copy link
Member

I thought we were goign to check if an EndpointSlice is v4 or v6 (by checking it's endpoints) and then call the appropriate proxier from there. That can happen at the meta proxier level.

@robscott
Copy link
Member

robscott commented Nov 5, 2019

@andrewsykim EndpointSlices can contain both v4 and v6 addresses in the addresses field for each Endpoint. This is different than the approach with Endpoints where addresses are separated in different v4 and v6 resources.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 5, 2019
@andrewsykim
Copy link
Member

andrewsykim commented Nov 5, 2019

Gotcha, thanks for clarifying. The problem is that syncProxyRules is not cheap even if we are skipping individual addresses not in the same IP family. Wondering if the logic can be something like this instead:

func (proxier *metaProxier) OnEndpointSliceUpdate(old, endpointSlice *discovery.EndpointSlice) {
       hasIPv4, hasIPv6 := getIPFamilies(endpointSlice)
       if hasIPv4{
           proxier.ipv4Proxier.OnEndpointSliceUpdate(old, endpointSlice)
      }

      if hasIPv6{
          proxier.ipv6Proxier.OnEndpointSliceUpdate(old, endpointSlice)
      }
}

@khenidak
Copy link
Contributor Author

khenidak commented Nov 5, 2019

Gotcha, thanks for clarifying. The problem is that syncProxyRules is not cheap even if we are skipping addresses not in the same IP family. Wondering if the logic can be something like this instead:

func (proxier *metaProxier) OnEndpointSliceUpdate(old, endpointSlice *discovery.EndpointSlice) {
       hasIPv4, hasIPv6 := getIPFamilies(endpointSlice)
       if hasIPv4{
           proxier.ipv4Proxier.OnEndpointSliceUpdate(old, endpointSlice)
      }

      if hasIPv6{
          proxier.ipv6Proxier.OnEndpointSliceUpdate(old, endpointSlice)
      }
}

Wouldn't that still require skipping in syncProxyRules()

@andrewsykim
Copy link
Member

Skipping syncProxyRules() (when IP family doesn't match) is what we want though? What we should avoid is the IPv6 proxier running syncProxyRules() when an EndpointSlice is all v4 (and vice versa).

Comment on lines +930 to +934
// (khenidak) the previous assumption is valid only for single stack
// for dual stack, we have no way for endpointslice (or general endpoint processing)
// to know if this endpoint is being processed for an IPv6 or IPv4 proxier
// TODO: Find a way where the endpoints/slice can be loaded with knowledge
// of proxy/ipfamily and perform the filtering there.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe our caching structures (endpointsMap and/or endpointSliceCache?) could be updated in the future to make this easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a strategy covering the both proxies and endpoint processing. But that strategy can not be implemented now it needs a deeper look at the stack. For example, there is a proxy under development that does not care about ipfamily (nftables iirc). That strategy also needs to be ready to handle dualstack services (as in dual cluster IPs). that said, this strategy is way beyond the scope of this PR.

What i am trying to say this PR is a stop gap until this strategy is made, and applied.

@@ -153,25 +153,30 @@ func (proxier *metaProxier) OnEndpointsSynced() {
// OnEndpointSliceAdd is called whenever creation of a new endpoint slice object
// is observed.
func (proxier *metaProxier) OnEndpointSliceAdd(endpointSlice *discovery.EndpointSlice) {
// noop
proxier.ipv4Proxier.OnEndpointSliceAdd(endpointSlice)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would metaProxier be used if the dual stack feature gate hadn't been enabled? If not, it seems likely that EndpointSlices would consistently contain both v4 and v6 addresses consistently. Are there other use cases where there might only be v4 or v6 addresses in an EndpointSlice?

Comment on lines +942 to +948
// ignore stale services that does not match ipfamily of the proxier
isIpv6Svc := utilnet.IsIPv6(svcInfo.ClusterIP())
if proxier.isIPv6 != isIpv6Svc {
klog.V(10).Infof("ipvs proxier (ipv6:%v) is ignoring a stale service:%v (ipv6:%v) because it is not of same ipfamily", proxier.isIPv6, svcPortName, isIpv6Svc)
continue
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we could do in the caches instead of syncProxyRules? As I understand it, right now a change to a v4 address would be considered a change by the v6 proxier since both addresses would be stored in both caches. If on the other hand we filtered addresses before storing them in the cache, changes would be more accurate by the time they got here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khenidak
Copy link
Contributor Author

khenidak commented Nov 5, 2019

skipping/ignoring is what we thought is the best approach the last time we discussed this. Because EndPointSlices carry mixed ip families. I do believe skipping/ignoring is ok not great. but ok.

Please do note pods in a dualstack clusters carry dualstack IPs (v4/v6 or v6/v4). As in, a change in pod (create, update, delete) will always result into a change in two EndPoints. That means max of two EndpointSlices will be affected when a pod ips change (only one EndpointSlice if the two endpoints landed in the same slice, which what the controller always attempt to do). So skipping will only happen in max one extra EndpointSlice.

For single stack, no skipping will happen since all endpoints will carry ips from the same family.

@robscott
Copy link
Member

robscott commented Nov 5, 2019

Thanks for the explanation @khenidak! Since EndpointSlice endpoints are tied directly to a Pod, a change affecting a Pod should only change 1 EndpointSlice. So unless there's some kind of case where a v4 or v6 address could change independently of each other for the same Pod, this does seem like a good approach.

@khenidak
Copy link
Contributor Author

khenidak commented Nov 6, 2019

:-) max of two, if you need to overflow into another endpointslice. I will reword. Thanks.

@andrewsykim
Copy link
Member

andrewsykim commented Nov 6, 2019

So to clarify - if I set the ipFamily field in Services to IPv4, the cluster IP is IPv4 but it's endpoints set of endpoints can be both IPv4 and IPv6?

@khenidak
Copy link
Contributor Author

khenidak commented Nov 6, 2019

So to clarify - if I set the ipFamily field in Services to IPv4, the cluster IP is IPv4 but it's endpoints set of endpoints can be both IPv4 and IPv6?

No. Service.Spec.ipFamily drives the endpoint selection for a service. User will not get a different ip family assigned.

@andrewsykim
Copy link
Member

andrewsykim commented Nov 6, 2019

Gotcha, so doesn't that mean there are EndpointSlice updates where the updates are exclusively IPv4 or exclusively IPv6 (re: #84089 (comment))? Thinking of a case where pods are dual-stack but some user of the cluster just care about one IP family still. That seems like a pretty likely scenario.

@khenidak
Copy link
Contributor Author

khenidak commented Nov 6, 2019

Thanks!

I need to/let us push this forward because EndpointSlice BETA stage has dependency on supporting dualstack.

serviceUpdateResult := proxy.UpdateServiceMap(proxier.serviceMap, proxier.serviceChanges)
endpointUpdateResult := proxier.endpointsMap.Update(proxier.endpointsChanges)

staleServices := serviceUpdateResult.UDPStaleClusterIP
// merge stale services gathered from updateEndpointsMap
for _, svcPortName := range endpointUpdateResult.StaleServiceNames {
if svcInfo, ok := proxier.serviceMap[svcPortName]; ok && svcInfo != nil && svcInfo.Protocol() == v1.ProtocolUDP {
// ignore stale services that does not match ipfamily of the proxier
isIpv6Svc := utilnet.IsIPv6(svcInfo.ClusterIP())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ExternalIP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A service that has ExternalIPs must have ClusterIP AFAIK. Do you know of any case where a service can carry ExternalIPs but ClusterIP is empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headless Services (ClusterIP: None) with external IPs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  clusterIP: None
  externalIPs:
  - 1.1.1.1
  selector:
    app: nginx
  ports:
  - port: 80
    protocol: TCP
$ kubectl get svc nginx
NAME    TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
nginx   ClusterIP   None         1.1.1.1       80/TCP    62s

@@ -1023,6 +1040,14 @@ func (proxier *Proxier) syncProxyRules() {
klog.Errorf("Failed to cast serviceInfo %q", svcName.String())
continue
}

// ignore services that does not match proxier ipfamily
isIpv6Svc := utilnet.IsIPv6(svcInfo.ClusterIP())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, what about ExternalIP?

@andrewsykim
Copy link
Member

/milestone v1.17

@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Nov 13, 2019
@robscott
Copy link
Member

I think @khenidak is pretty busy so I'm going to work on a separate PR that integrates the EndpointSlice address type changes.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2019
@khenidak
Copy link
Contributor Author

closing in favor of #85246

@khenidak khenidak closed this Nov 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipvs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants