Change in behavior of ServiceEntries merge #50478

ymesika · 2024-04-16T06:15:27Z

Is this the right place to submit this?

This is not a security vulnerability or a crashing bug
This is not a question about how to use Istio

Bug Description

The clusters created for merged ServiceEntries have changed in recent Istio releases and in current master it's still broken.

Consider the following two simple ServiceEntry with similar hosts and port names but different port numbers:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: se1
  namespace: test
spec:
  exportTo:
  - .
  hosts:
  - news.google.com
  ports:
  - name: port1
    number: 80
    protocol: HTTP
  resolution: DNS
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: se2
  namespace: test
spec:
  exportTo:
  - .
  hosts:
  - news.google.com
  ports:
  - name: port1
    number: 8080
    protocol: HTTP
  resolution: DNS

The behavior is different in the various releases:

Up to 1.19.8 the created clusters are one per port without LB between ports:

outbound|80||news.google.com::172.217.22.110:80
outbound|8080||news.google.com::172.217.22.110:8080

In 1.20.4 and 1.21.1 there are clusters per port with LB between ports:

outbound|80||news.google.com::172.217.22.110:80
outbound|80||news.google.com::172.217.22.110:8080
outbound|8080||news.google.com::172.217.22.110:80
outbound|8080||news.google.com::172.217.22.110:8080

In 1.20.0-1.20.3 and in master there is one cluster for one of the ports which LB:

outbound|80||news.google.com::172.217.22.110:80
outbound|80||news.google.com::172.217.22.110:8080

I believe the behavior of the older releases - cluster per port without LB - is the correct one unless there was a decision to change this behavior.

Version

Versions are detailed in the description

Additional Information

No response

Affected product area

The text was updated successfully, but these errors were encountered:

ramaraochavali · 2024-04-16T10:44:16Z

@hzxuzhonghu I remember you changed some thing related to service entry merge recently?

j2gg0s · 2024-04-16T17:51:02Z

Maybe it's a side effect of this PR, https://github.com/istio/istio/pull/49573/files#diff-a8e332e06b003dfbf58c29f37de893fbb0e45f642200ea3ea2faa9847fc7e1e0L1426.

I solved this by set different ports[].name

hzxuzhonghu · 2024-04-17T10:06:09Z

I will take a look

hzxuzhonghu · 2024-04-18T02:41:48Z

@j2gg0s is right, from debug/endpointsz, we can verify this

ymesika · 2024-04-18T04:51:09Z

Right, different port names there are no issues (except 1.20.0-1.20.3 which already got resolved #49489)

j2gg0s · 2024-04-18T15:16:49Z

My understanding of this.
The actions are primarily caused by changes within the initServiceRegistry.

1.20.4 vs 1.20.3

#49573 has modified this part of the code within the initServiceRegistry.

                shards, ok := env.EndpointIndex.ShardsForService(string(s.Hostname), s.Attributes.Namespace)
                if ok {
-                       ps.ServiceIndex.instancesByPort[svcKey] = shards.CopyEndpoints(portMap)
+                       instancesByPort := shards.CopyEndpoints(portMap)
+                       for port, instances := range instancesByPort {
+                               ps.ServiceIndex.instancesByPort[svcKey][port] = instances
+                       }
                }

The premise for the problem is:
Two ServiceEntry within the same namespace have the same host and portName, but different ports.

So in 1.20.3, we only can see one cluster

1.20.0 vs 1.19.8

#46329 Remove service.InstancesByPort

-               svcKey := s.Key()
-               // Precache instances
+               portMap := map[string]int{}
                for _, port := range s.Ports {
-                       if _, ok := ps.ServiceIndex.instancesByPort[svcKey]; !ok {
-                               ps.ServiceIndex.instancesByPort[svcKey] = make(map[int][]*ServiceInstance)
-                       }
-                       instances := make([]*ServiceInstance, 0)
-                       instances = append(instances, env.InstancesByPort(s, port.Port)...)
-                       ps.ServiceIndex.instancesByPort[svcKey][port.Port] = instances
+                       portMap[port.Name] = port.Port
                }

+               svcKey := s.Key()
+               if _, ok := ps.ServiceIndex.instancesByPort[svcKey]; !ok {
+                       ps.ServiceIndex.instancesByPort[svcKey] = make(map[int][]*IstioEndpoint)
+               }
+               shards, ok := env.EndpointIndex.ShardsForService(string(s.Hostname), s.Attributes.Namespace)
+               if ok {
+                       ps.ServiceIndex.instancesByPort[svcKey] = shards.CopyEndpoints(portMap)
+               }

In 1.19.8's InstancesByPort will filter by port.
So after removing the comparison logic, we will see two endpoints.

// InstancesByPort retrieves instances for a service on the given ports with labels that
// match any of the supplied labels. All instances match an empty tag list.
func (s *Controller) InstancesByPort(svc *model.Service, port int) []*model.ServiceInstance {
	out := make([]*model.ServiceInstance, 0)
	s.mutex.RLock()
	instanceLists := s.serviceInstances.getByKey(instancesKey{svc.Hostname, svc.Attributes.Namespace})
	s.mutex.RUnlock()
	for _, instance := range instanceLists {
		if portMatchSingle(instance, port) {
			out = append(out, instance)
		}
	}

	return out
}

I have not reproduced the issue, so I cannot guarantee that my understanding is correct.

j2gg0s · 2024-04-18T15:26:00Z

My understanding of this. The actions are primarily caused by changes within the initServiceRegistry.

1.20.4 vs 1.20.3

#49573 has modified this part of the code within the initServiceRegistry.

                shards, ok := env.EndpointIndex.ShardsForService(string(s.Hostname), s.Attributes.Namespace)
                if ok {
-                       ps.ServiceIndex.instancesByPort[svcKey] = shards.CopyEndpoints(portMap)
+                       instancesByPort := shards.CopyEndpoints(portMap)
+                       for port, instances := range instancesByPort {
+                               ps.ServiceIndex.instancesByPort[svcKey][port] = instances
+                       }
                }

The premise for the problem is: Two ServiceEntry within the same namespace have the same host and portName, but different ports.

So in 1.20.3, we only can see one cluster

1.20.0 vs 1.19.8

#46329 Remove service.InstancesByPort

-               svcKey := s.Key()
-               // Precache instances
+               portMap := map[string]int{}
                for _, port := range s.Ports {
-                       if _, ok := ps.ServiceIndex.instancesByPort[svcKey]; !ok {
-                               ps.ServiceIndex.instancesByPort[svcKey] = make(map[int][]*ServiceInstance)
-                       }
-                       instances := make([]*ServiceInstance, 0)
-                       instances = append(instances, env.InstancesByPort(s, port.Port)...)
-                       ps.ServiceIndex.instancesByPort[svcKey][port.Port] = instances
+                       portMap[port.Name] = port.Port
                }

+               svcKey := s.Key()
+               if _, ok := ps.ServiceIndex.instancesByPort[svcKey]; !ok {
+                       ps.ServiceIndex.instancesByPort[svcKey] = make(map[int][]*IstioEndpoint)
+               }
+               shards, ok := env.EndpointIndex.ShardsForService(string(s.Hostname), s.Attributes.Namespace)
+               if ok {
+                       ps.ServiceIndex.instancesByPort[svcKey] = shards.CopyEndpoints(portMap)
+               }

In 1.19.8's InstancesByPort will filter by port. So after removing the comparison logic, we will see two endpoints.

// InstancesByPort retrieves instances for a service on the given ports with labels that
// match any of the supplied labels. All instances match an empty tag list.
func (s *Controller) InstancesByPort(svc *model.Service, port int) []*model.ServiceInstance {
	out := make([]*model.ServiceInstance, 0)
	s.mutex.RLock()
	instanceLists := s.serviceInstances.getByKey(instancesKey{svc.Hostname, svc.Attributes.Namespace})
	s.mutex.RUnlock()
	for _, instance := range instanceLists {
		if portMatchSingle(instance, port) {
			out = append(out, instance)
		}
	}

	return out
}

I have not reproduced the issue, so I cannot guarantee that my understanding is correct.

Considering that ServiceEntry to port.number and port.targetPort maybe different, it is difficult to distinguish abnormal situations within initServiceRegistry.

hzxuzhonghu · 2024-04-19T01:54:20Z

We rely on service port name in many places> in K8s service, it must be unique within a service, but it is tricky in istio, because we support merging SEs.

It is not possible to prevent creating two SEs with same port name as they can be created concurrently, But what can be done is warn in istio if there exists port name equals like this case.

zirain · 2024-04-26T01:58:12Z

find an issue, following configuration works in 1.18, but fails in 1.20+(not sure about 1.19)

apiVersion: v1
kind: Service
metadata:
  name: httpbin-ext
spec:
  externalName: httpbin.default.svc.cluster.local
  ports:
    - name: http
      port: 8080
      protocol: TCP
      targetPort: 8000
  type: ExternalName
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: httpbin-ext
spec:
  hosts:
    - httpbin.default.svc.cluster.local
  location: MESH_EXTERNAL
  ports:
    - name: http
      number: 8000
      protocol: HTTP
  resolution: DNS

in 1.20+, pilot will send eds as following rejected by proxy with error

2024-04-25T12:21:22.487063Z	warning	envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:138	gRPC config for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment rejected: malformed IP address: httpbin.default.svc.cluster.local. Consider setting resolver_name or setting cluster type to 'STRICT_DNS' or 'LOGICAL_DNS'	thread=17

{
    "clusterName": "outbound|8000||httpbin.default.svc.cluster.local",
    "endpoints": [
        {
            "locality": {},
            "lbEndpoints": [
                {
                    "endpoint": {
                        "address": {
                            "socketAddress": {
                                "address": "httpbin.default.svc.cluster.local",
                                "portValue": 8000
                            }
                        }
                    },
                    "metadata": {
                        "filterMetadata": {
                            "istio": {
                                "workload": ";;;;"
                            }
                        }
                    },
                    "loadBalancingWeight": 1
                },
                {
                    "endpoint": {
                        "address": {
                            "socketAddress": {
                                "address": "10.244.0.86",
                                "portValue": 8080
                            }
                        }
                    },
                    "healthStatus": "HEALTHY",
                    "metadata": {
                        "filterMetadata": {
                            "envoy.transport_socket_match": {
                                "tlsMode": "istio"
                            },
                            "istio": {
                                "workload": "httpbin;default;httpbin;v1;Kubernetes"
                            }
                        }
                    },
                    "loadBalancingWeight": 1
                }
            ],
            "loadBalancingWeight": 2
        }
    ]
}

hzxuzhonghu · 2024-04-26T04:07:01Z

Let me investigate this regression

hzxuzhonghu · 2024-04-26T04:12:55Z

@howardjohn I kind of remember you changed using the target port of externalName service, which maybe related

howardjohn · 2024-04-26T04:14:05Z

definitely not the cause of the original issue. maybe @zirain s issue

zirain · 2024-04-26T05:37:34Z

before(1.18):

"httpbin.default.svc.cluster.local": {
        "default": {
            "Shards": {
                "Kubernetes/Kubernetes": [
                    {
                        "Labels": {
                            "app": "httpbin",
                            "kubernetes.io/hostname": "envoy-gateway-control-plane",
                            "pod-template-hash": "86b8ffc5ff",
                            "security.istio.io/tlsMode": "istio",
                            "service.istio.io/canonical-name": "httpbin",
                            "service.istio.io/canonical-revision": "v1",
                            "topology.istio.io/cluster": "Kubernetes",
                            "topology.istio.io/network": "",
                            "version": "v1"
                        },
                        "Address": "10.244.0.121",
                        "ServicePortName": "http",
                        "ServiceAccount": "spiffe://cluster.local/ns/default/sa/httpbin",
                        "Network": "",
                        "Locality": {
                            "Label": "",
                            "ClusterID": "Kubernetes"
                        },
                        "EndpointPort": 8080,
                        "LbWeight": 0,
                        "TLSMode": "istio",
                        "Namespace": "default",
                        "WorkloadName": "httpbin",
                        "HostName": "",
                        "SubDomain": "",
                        "HealthStatus": 1,
                        "NodeName": "envoy-gateway-control-plane"
                    }
                ]
            },
            "ServiceAccounts": {
                "spiffe://cluster.local/ns/default/sa/httpbin": {}
            }
        }
    }

after(1.21):

"httpbin.default.svc.cluster.local": {
        "default": {
            "Shards": {
                "External/Kubernetes": [
                    {
                        "Labels": null,
                        "Address": "httpbin.default.svc.cluster.local",
                        "ServicePortName": "http",
                        "ServiceAccount": "",
                        "Network": "",
                        "Locality": {
                            "Label": "",
                            "ClusterID": ""
                        },
                        "EndpointPort": 8000,
                        "LbWeight": 0,
                        "TLSMode": "disabled",
                        "Namespace": "",
                        "WorkloadName": "",
                        "HostName": "",
                        "SubDomain": "",
                        "HealthStatus": 0,
                        "NodeName": ""
                    }
                ],
                "Kubernetes/Kubernetes": [
                    {
                        "Labels": {
                            "app": "httpbin",
                            "kubernetes.io/hostname": "envoy-gateway-control-plane",
                            "pod-template-hash": "86b8ffc5ff",
                            "security.istio.io/tlsMode": "istio",
                            "service.istio.io/canonical-name": "httpbin",
                            "service.istio.io/canonical-revision": "v1",
                            "topology.istio.io/cluster": "Kubernetes",
                            "topology.istio.io/network": "",
                            "version": "v1"
                        },
                        "Address": "10.244.0.117",
                        "ServicePortName": "http",
                        "ServiceAccount": "spiffe://cluster.local/ns/default/sa/httpbin",
                        "Network": "",
                        "Locality": {
                            "Label": "",
                            "ClusterID": "Kubernetes"
                        },
                        "EndpointPort": 8080,
                        "LbWeight": 0,
                        "TLSMode": "istio",
                        "Namespace": "default",
                        "WorkloadName": "httpbin",
                        "HostName": "",
                        "SubDomain": "",
                        "HealthStatus": 1,
                        "NodeName": "envoy-gateway-control-plane"
                    }
                ]
            },
            "ServiceAccounts": {
                "spiffe://cluster.local/ns/default/sa/httpbin": {}
            }
        }
    }

ymesika · 2024-04-26T05:57:38Z

@zirain doesn't it deserve its own different issue?

zirain · 2024-04-26T06:00:57Z

@ymesika separated into #50688

hzxuzhonghu · 2024-04-26T08:20:21Z

There maybe some bug with delta cluster builder.

If i create the Ses after a sidecar started, it will only see a cluster

root@kurator-linux-0001:~/go/src/istio.io/istio# istioctl pc cluster sleep-7656cf8794-srf59   |grep google
news.google.com                                         8080      -          outbound      STRICT_DNS

But if i create the ses before the sidecar started, ii will get two clusters

root@kurator-linux-0001:~/go/src/istio.io/istio# k get pod
NAME                       READY   STATUS    RESTARTS   AGE
httpbin-86b8ffc5ff-b5cx9   2/2     Running   0          91m
sleep-7656cf8794-srf59     2/2     Running   0          35s

howardjohn · 2024-04-26T23:11:02Z

There maybe some bug with delta cluster builder.

If i create the Ses after a sidecar started, it will only see a cluster

root@kurator-linux-0001:~/go/src/istio.io/istio# istioctl pc cluster sleep-7656cf8794-srf59   |grep google
news.google.com                                         8080      -          outbound      STRICT_DNS

But if i create the ses before the sidecar started, ii will get two clusters

root@kurator-linux-0001:~/go/src/istio.io/istio# k get pod
NAME                       READY   STATUS    RESTARTS   AGE
httpbin-86b8ffc5ff-b5cx9   2/2     Running   0          91m
sleep-7656cf8794-srf59     2/2     Running   0          35s

👍 I see the same

Fixes istio#50478 (comment), maybe other bugs

howardjohn · 2024-04-26T23:17:56Z

There maybe some bug with delta cluster builder.

Good find, this is a regression in 1.22. Fixed it up in #50712

zirain · 2024-04-29T21:04:53Z

@howardjohn @hzxuzhonghu is this fixed by #50691?

howardjohn · 2024-04-29T21:05:45Z

No, by #50711. I am finishing up tests, will be done in an hour

zirain · 2024-05-09T05:12:13Z

@howardjohn I think this's fixed?

howardjohn · 2024-05-09T14:08:03Z

Yep should be good

laiminhtrung1997 · 2024-07-03T09:14:34Z

I think a new bug occurred on release 1.22.2. I applied 2 SE below

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: envoy
  namespace: ac
spec:
  exportTo:
  - "."
  hosts:
  - "*.hercules.dev.com"
  location: MESH_EXTERNAL
  ports:
  - name: http-8080
    number: 8080
    protocol: HTTP
  - name: http-8181
    number: 8181
    protocol: HTTP
  - name: http-8200
    number: 8200
    protocol: HTTP
  - name: http-38083
    number: 38083
    protocol: HTTP
  - name: http-8090
    number: 8090
    protocol: HTTP
  - name: http-5000
    number: 5000
    protocol: HTTP
  resolution: NONE
  workloadSelector:
    labels:
      app.kubernetes.io/instance: envoy
      app.kubernetes.io/name: envoy
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: allow-tenant-domain
  namespace: ac
spec:
  exportTo:
  - "."
  hosts:
  - hercules.dev.com
  - "*.hercules.dev.com"
  location: MESH_EXTERNAL
  ports:
  - name: http
    number: 80
    protocol: HTTP
  - name: https
    number: 443
    protocol: HTTPS
  resolution: NONE

Then I run command to check SE after merged.

$ istioctl pc cluster -n ac deploy/envoy | grep dev.com
*.hercules.dev.com                                              5000      -          outbound      ORIGINAL_DST     
*.hercules.dev.com                                              8080      -          outbound      ORIGINAL_DST     
*.hercules.dev.com                                              8090      -          outbound      ORIGINAL_DST     
*.hercules.dev.com                                              8181      -          outbound      ORIGINAL_DST     
*.hercules.dev.com                                              8200      -          outbound      ORIGINAL_DST     
*.hercules.dev.com                                              38083    -          outbound      ORIGINAL_DST     
hercules.dev.com                                                80          -          outbound      ORIGINAL_DST     
hercules.dev.com                                                443        -          outbound      ORIGINAL_DST

Then I deleted all SE and re-applied them. The SE changes after that.

$ istioctl pc cluster -n ac deploy/envoy | grep dev.com
*.hercules.dev.com                                              80          -          outbound      ORIGINAL_DST     
*.hercules.dev.com                                              443        -          outbound      ORIGINAL_DST      
hercules.dev.com                                                80          -          outbound      ORIGINAL_DST     
hercules.dev.com                                                443        -          outbound      ORIGINAL_DST

Is there a new bug? @howardjohn @hzxuzhonghu

howardjohn · 2024-07-03T13:00:30Z

I am not sure that's a bug really. You have duplicate hostnames, behavior of that is undedined

hzxuzhonghu · 2024-07-04T01:48:46Z

It has something todo with the creation order, we donot merge the service with different attributes.

Here the selector is different.

func canMergeServices(s1, s2 *Service) bool {
	// Hostname has been compared in the caller `appendSidecarServices`, so we donot need to compare again.
	if s1.Attributes.Namespace != s2.Attributes.Namespace {
		return false
	}
	if s1.Resolution != s2.Resolution {
		return false
	}
	// kuberneres service registry has been checked before
	if s1.Attributes.ServiceRegistry != s2.Attributes.ServiceRegistry {
		return false
	}

	if !maps.Equal(s1.Attributes.Labels, s2.Attributes.Labels) {
		return false
	}

	if !maps.Equal(s1.Attributes.LabelSelectors, s2.Attributes.LabelSelectors) {
		return false
	}

	if !maps.Equal(s1.Attributes.ExportTo, s2.Attributes.ExportTo) {
		return false
	}

	return true
}

ymesika added the area/networking label Apr 16, 2024

keithmattix mentioned this issue Apr 23, 2024

Fix egress mtls origination test istio/istio.io#14914

Merged

keithmattix added this to the 1.22 milestone Apr 25, 2024

zirain mentioned this issue Apr 26, 2024

SeviceEntry behavior changed? #49900

Closed

zirain added this to Release Blocker in Prioritization Apr 26, 2024

hzxuzhonghu mentioned this issue Apr 26, 2024

Fix setting the merged service to servicesByHostname #50691

Merged

howardjohn added a commit to howardjohn/istio that referenced this issue Apr 26, 2024

pilot: also merge servicesByHostname

3079470

Fixes istio#50478 (comment), maybe other bugs

howardjohn mentioned this issue Apr 26, 2024

pilot: also merge servicesByHostname #50712

Closed

howardjohn mentioned this issue Apr 26, 2024

Fix service entry merge #50711

Merged

howardjohn closed this as completed May 9, 2024

Prioritization automation moved this from Release Blocker to Done May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change in behavior of ServiceEntries merge #50478

Change in behavior of ServiceEntries merge #50478

ymesika commented Apr 16, 2024

ramaraochavali commented Apr 16, 2024

j2gg0s commented Apr 16, 2024

hzxuzhonghu commented Apr 17, 2024

hzxuzhonghu commented Apr 18, 2024

ymesika commented Apr 18, 2024

j2gg0s commented Apr 18, 2024

j2gg0s commented Apr 18, 2024

1.20.4 vs 1.20.3

1.20.0 vs 1.19.8

hzxuzhonghu commented Apr 19, 2024

zirain commented Apr 26, 2024

hzxuzhonghu commented Apr 26, 2024

hzxuzhonghu commented Apr 26, 2024

howardjohn commented Apr 26, 2024

zirain commented Apr 26, 2024

ymesika commented Apr 26, 2024

zirain commented Apr 26, 2024

hzxuzhonghu commented Apr 26, 2024 •

edited

Loading

howardjohn commented Apr 26, 2024

howardjohn commented Apr 26, 2024

zirain commented Apr 29, 2024

howardjohn commented Apr 29, 2024

zirain commented May 9, 2024

howardjohn commented May 9, 2024

laiminhtrung1997 commented Jul 3, 2024 •

edited

Loading

howardjohn commented Jul 3, 2024

hzxuzhonghu commented Jul 4, 2024

Change in behavior of ServiceEntries merge #50478

Change in behavior of ServiceEntries merge #50478

Comments

ymesika commented Apr 16, 2024

Is this the right place to submit this?

Bug Description

Version

Additional Information

Affected product area

ramaraochavali commented Apr 16, 2024

j2gg0s commented Apr 16, 2024

hzxuzhonghu commented Apr 17, 2024

hzxuzhonghu commented Apr 18, 2024

ymesika commented Apr 18, 2024

j2gg0s commented Apr 18, 2024

1.20.4 vs 1.20.3

1.20.0 vs 1.19.8

j2gg0s commented Apr 18, 2024

1.20.4 vs 1.20.3

1.20.0 vs 1.19.8

hzxuzhonghu commented Apr 19, 2024

zirain commented Apr 26, 2024

hzxuzhonghu commented Apr 26, 2024

hzxuzhonghu commented Apr 26, 2024

howardjohn commented Apr 26, 2024

zirain commented Apr 26, 2024

ymesika commented Apr 26, 2024

zirain commented Apr 26, 2024

hzxuzhonghu commented Apr 26, 2024 • edited Loading

howardjohn commented Apr 26, 2024

howardjohn commented Apr 26, 2024

zirain commented Apr 29, 2024

howardjohn commented Apr 29, 2024

zirain commented May 9, 2024

howardjohn commented May 9, 2024

laiminhtrung1997 commented Jul 3, 2024 • edited Loading

howardjohn commented Jul 3, 2024

hzxuzhonghu commented Jul 4, 2024

hzxuzhonghu commented Apr 26, 2024 •

edited

Loading

laiminhtrung1997 commented Jul 3, 2024 •

edited

Loading