Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDS ACK error for headless service instance #17748

Closed
hzxuzhonghu opened this issue Oct 10, 2019 · 11 comments
Closed

LDS ACK error for headless service instance #17748

hzxuzhonghu opened this issue Oct 10, 2019 · 11 comments

Comments

@hzxuzhonghu
Copy link
Member

@hzxuzhonghu hzxuzhonghu commented Oct 10, 2019

(NOTE: This is used to report product bugs:
To report a security vulnerability, please visit https://istio.io/about/security-vulnerabilities/
To ask questions about how to use Istio, please visit https://discuss.istio.io
)

Bug description

Headless service instance LDS NACK with duplicate listener error.

	ADS:LDS: ACK ERROR 10.32.0.18:44890 sidecar~10.32.0.18~redis-2.default~default.svc.cluster.local-10 (redis-2.default) version_info:"2019-10-08T06:48:02Z/1" node:<id:"sidecar~10.32.0.18~redis-2.default~default.svc.cluster.local" cluster:"redis-cart.default" metadata:<fields:<key:"CLUSTER_ID" value:<string_value:"Kubernetes" > > fields:<key:"CONFIG_NAMESPACE" value:<string_value:"default" > > fields:<key:"EXCHANGE_KEYS" value:<string_value:"NAME,NAMESPACE,INSTANCE_IPS,LABELS,OWNER,PLATFORM_METADATA,WORKLOAD_NAME,CANONICAL_TELEMETRY_SERVICE,MESH_ID,SERVICE_ACCOUNT" > > fields:<key:"INCLUDE_INBOUND_PORTS" value:<string_value:"6379" > > fields:<key:"INSTANCE_IPS" value:<string_value:"10.32.0.18" > > fields:<key:"INTERCEPTION_MODE" value:<string_value:"REDIRECT" > > fields:<key:"ISTIO_PROXY_SHA" value:<string_value:"istio-proxy:e383776139e4c69b49237bad84882fb972718307" > > fields:<key:"ISTIO_VERSION" value:<string_value:"master-20191004-09-15" > > fields:<key:"LABELS" value:<struct_value:<fields:<key:"app" value:<string_value:"redis-cart" > > fields:<key:"controller-revision-hash" value:<string_value:"redis-85d5755949" > > fields:<key:"statefulset.kubernetes.io/pod-name" value:<string_value:"redis-2" > > > > > fields:<key:"NAME" value:<string_value:"redis-2" > > fields:<key:"NAMESPACE" value:<string_value:"default" > > fields:<key:"OWNER" value:<string_value:"kubernetes://api/apps/v1/namespaces/default/statefulsets/redis" > > fields:<key:"POD_NAME" value:<string_value:"redis-2" > > fields:<key:"POD_PORTS" value:<string_value:"[{\"containerPort\":6379,\"protocol\":\"TCP\"}]" > > fields:<key:"SERVICE_ACCOUNT" value:<string_value:"default" > > fields:<key:"WORKLOAD_NAME" value:<string_value:"redis" > > fields:<key:"app" value:<string_value:"redis-cart" > > fields:<key:"controller-revision-hash" value:<string_value:"redis-85d5755949" > > fields:<key:"statefulset.kubernetes.io/pod-name" value:<string_value:"redis-2" > > > locality:<> build_version:"e383776139e4c69b49237bad84882fb972718307/1.12.0-dev/Clean/RELEASE/BoringSSL" > type_url:"type.googleapis.com/envoy.api.v2.Listener" response_nonce:"7aaef486-ecc6-40cc-9e14-0b87bc4267a5" error_detail:<code:13 message:"Error adding/updating listener(s) 10.32.0.18_6379: duplicate listener 10.32.0.18_6379 found" > 

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[x] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Expected behavior

Steps to reproduce the bug

Version (include the output of istioctl version --remote and kubectl version)

How was Istio installed?

Environment where bug was observed (cloud vendor, OS, etc)

Additionally, please consider attaching a cluster state archive by attaching
the dump file to this issue.

@hzxuzhonghu

This comment has been minimized.

Copy link
Member Author

@hzxuzhonghu hzxuzhonghu commented Oct 10, 2019

Without deep dive, but i can say as 10.32.0.18 is the instance ip, so i think pilot generates two listeners named 10.32.0.18_6379, one is for outbound and one is for inbound.

@hzxuzhonghu

This comment has been minimized.

Copy link
Member Author

@hzxuzhonghu hzxuzhonghu commented Oct 10, 2019

@istio/wg-networking-maintainers

What should we do?

  1. Donot generate the outbound listener to itself, as for normal proxies(with http protocol), the filter chain is like
            {
                "filterChainMatch": {
                    "prefixRanges": [
                        {
                            "addressPrefix": "10.32.0.13",
                            "prefixLen": 32
                        }
                    ]
                },
                "filters": [
                    {
                        "name": "envoy.tcp_proxy",
                        "typedConfig": {
                            "@type": "type.googleapis.com/envoy.config.filter.network.tcp_proxy.v2.TcpProxy",
                            "statPrefix": "BlackHoleCluster",
                            "cluster": "BlackHoleCluster"
                        }
                    }
                ]
            },

We donot allow accessing itself by the podip.

  1. generate the outbound listener, but should rename. But the listener will always proxy the traffic to blackhole.
@lambdai

This comment has been minimized.

Copy link
Member

@lambdai lambdai commented Oct 10, 2019

solution 3: remove the inbound listener 10.32.0.18_6379. It should be not hand off by 15001 listener.
Might need a slight order change at ListenerBuilder. We need to aggregate 15001 and remove inbound 10.32.0.18_6379 before generating the outbound 10.32.0.18_6379.

@hzxuzhonghu

This comment has been minimized.

Copy link
Member Author

@hzxuzhonghu hzxuzhonghu commented Oct 10, 2019

There is also an inconsistent phenomenon: For tcp service instance, it can communicate to itself by podip:port. But for http one, it cannot.

EDIT: ignore this,the cause is i tested it with nc using pure tcp.

@hzxuzhonghu

This comment has been minimized.

Copy link
Member Author

@hzxuzhonghu hzxuzhonghu commented Oct 10, 2019

remove the inbound listener 10.32.0.18_6379.

@lambdai Can not understand it well, I can see same virtualHosts exist in both 15006 and podip_port listeners. Does that mean for inbound traffic, it will only flow through virtualInbound 15006 listener.?

@rshriram

This comment has been minimized.

Copy link
Member

@rshriram rshriram commented Oct 10, 2019

So I think I know the problem. Its happening because we are generating listeners for each service instance in the headless service in the listener code. We need to fix that code to skip the pod's own service instance [or more specifically, skip ones where the instance.address == node.address]

@lambdai

This comment has been minimized.

Copy link
Member

@lambdai lambdai commented Oct 10, 2019

remove the inbound listener 10.32.0.18_6379.

@lambdai Can not understand it well, I can see same virtualHosts exist in both 15006 and podip_port listeners. Does that mean for inbound traffic, it will only flow through virtualInbound 15006 listener.?

Yes... Since 1.3 all the bind_to_port = false in bound listeners are actually not be hand off by 15006 listener.

@lambdai

This comment has been minimized.

Copy link
Member

@lambdai lambdai commented Oct 10, 2019

So I think I know the problem. Its happening because we are generating listeners for each service instance in the headless service in the listener code. We need to fix that code to skip the pod's own service instance [or more specifically, skip ones where the instance.address == node.address]

Are the new listeners per instance supposed to be inbound or outbound? Anyway listener is expensive at envoy

@hzxuzhonghu

This comment has been minimized.

Copy link
Member Author

@hzxuzhonghu hzxuzhonghu commented Oct 11, 2019

It is for outbound.

@phenixblue

This comment has been minimized.

Copy link

@phenixblue phenixblue commented Oct 25, 2019

Is there a set release that will include the fix merged as part of #17791?

@hzxuzhonghu

This comment has been minimized.

Copy link
Member Author

@hzxuzhonghu hzxuzhonghu commented Oct 26, 2019

It will be in release-1.4. And i think this should go into 1.3 as well. Will cherry-pick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants
You can’t perform that action at this time.