Requests fail when src & dst are the same #1585

tanuck · 2018-09-05T12:54:35Z

Hello,

When making HTTP1.1 requests where the src and dst are the same (a pod sending a request to itself) the proxy responds with a 500 status code. If the request is sent to a different pod in the deployment, everything works fine. If you send the request on the loopback address rather than the service DNS name, that is also fine. Is this expected?

$ linkerd version
Client version: v18.8.4
Server version: v18.8.4

Using the default sidecar generated from linkerd inject

Thanks

The text was updated successfully, but these errors were encountered:

grampelberg · 2018-09-05T15:19:35Z

Well that sounds interesting. Is there anything special with your application? Are you using TLS? Could we see your k8s resource yaml?

tanuck · 2018-09-05T16:01:14Z

So no TLS. It was initially found on a deployment running a Node.js GraphQL application on port 80. I've since reproduced this on every other deployment I've tried.

Here is the simplest reproduction that I've found:

kubectl run nginx --image=nginx --port=80 --replicas=2 -o yaml --dry-run | linkerd inject - | kc apply -f -
kubectl expose deploy nginx --port=80 --target-port=80 --type=ClusterIP
exec into one of the nginx containers and curl -v nginx - every other request should return 500

grampelberg · 2018-09-05T16:01:57Z

That's fantastic replication steps, thank you!

tanuck · 2018-09-13T13:05:27Z

Quick update - just upgraded to v18.9.1 and this problem still persists.

olix0r · 2018-09-13T15:04:52Z

I'd be curious to see what linkerd tap deploy nginx shows while the curl command is run. Also, the output of curl localhost:4191/metrics | grep -e request_total -e response_total might be informative.

seanmonstar · 2018-09-13T19:13:35Z

Hm, so if the dst is a socket address, the proxy will use it directly, which would explain the loopback succeeding. However, if it's hostname, then it will either:

If it looks like a service in the cluster, ask the controller for the socket address.
Or perform a system DNS lookup, and try to use that.

Is it possible to collect debug logs from the proxy? Or do we have an environment that I can poke into and enable them myself?

tanuck · 2018-09-19T17:23:33Z

So I used my steps from above. Then after sending 4 curl -v nginx requests, the tap and prometheus data look like this:

$ linkerd tap deploy nginx
req id=0:71 proxy=out src=10.0.1.2:34890 dst=10.0.1.2:80 tls=no_identity :method=GET :authority=nginx :path=/
end id=0:71 proxy=out src=10.0.1.2:34890 dst=10.0.1.2:80 tls=no_identity reset-error=6 duration=0µs response-length=0B


req id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity :method=GET :authority=nginx :path=/
req id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :method=GET :authority=nginx :path=/
rsp id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :status=200 latency=911µs
end id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled duration=27µs response-length=612B
rsp id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity :status=200 latency=2562µs
end id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity duration=46µs response-length=612B


req id=0:73 proxy=out src=10.0.1.2:35184 dst=10.0.1.2:80 tls=no_identity :method=GET :authority=nginx :path=/
end id=0:73 proxy=out src=10.0.1.2:35184 dst=10.0.1.2:80 tls=no_identity reset-error=6 duration=0µs response-length=0B


req id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity :method=GET :authority=nginx :path=/
req id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :method=GET :authority=nginx :path=/
rsp id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :status=200 latency=518µs
end id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled duration=40µs response-length=612B
rsp id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity :status=200 latency=2677µs
end id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity duration=68µs response-length=612B

# HELP request_total Total count of HTTP requests.
# TYPE request_total counter
request_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-k6rkx",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery"} 4
request_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-v2rqk",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery"} 2
# HELP response_total Total count of HTTP responses
# TYPE response_total counter
response_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-v2rqk",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",classification="success",status_code="200"} 2

@olix0r hope that helps!

JCMais · 2018-11-01T14:34:57Z

Having same issue.

We have a pod that acts as an authorization microservice, this pod can make requests to itself to check other permissions, so the hostname is http://authorization, this previously was working, after enabling linkerd2 it stopped working, linkerd-proxy container gives the following error:

ERR! proxy={server=out listen=127.0.0.1:4140 remote=10.16.0.48:60652} linkerd2_proxy::proxy::http::router service error: Error caused by underlying HTTP/2 error: protocol error: frame with invalid size

seanmonstar · 2018-11-07T19:24:21Z

The tap logs show the request to the other pod both from out and in, so the two proxies were involved.
The tap logs don't show the request in when the pod is the same, suggesting to me that the proxy never receives the request it should be sending itself.
The reset-error=6 is a FRAME_SIZE_ERROR from HTTP2, which would be the out proxy making an HTTP2 request to dst, and the bytes it got back are likely not HTTP2, and triggering that error.
The proxies will speak HTTP2 to each other when they know there is a proxy on the other side, so a connection returning bytes that aren't HTTP2 suggests it's connecting to something else.

All this makes me wonder if something is preventing the connection from being redirected to the proxy. Perhaps something in the iptables rules that are setup during proxy-init.

seanmonstar · 2018-11-15T16:21:27Z

Actually, while there was a proxy change for this, it won't be fixed until the iptables config is changed in this repo also.

When requests from a pod send requests to itself, the proxy properly redirects traffic from the originating container in the pod through the outbound listener of the proxy. Once the request ends on the inbound side of the proxy, it skips the proxy and calls the original container that made the request. This can cause problems for containers that serve HTTP as the proxy naively tries to initiate an HTTP/2 connection to the destination of a request. (See #1585 for a concrete example) This PR adds a new iptable rule, coupled with a proxy [change](linkerd/linkerd2-proxy#122) ensure that requests from a that occur in the aforementioned scenario, always redirect to the inbound listener of the proxy first. fixes #1585 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>

JCMais · 2018-11-16T11:25:51Z

Thanks for fixing this!

glindsell · 2019-03-21T15:58:25Z

I'm seeing the same reset-error=6 when trying to load balance gRPC using linkerd2 and nginx ingress.

Steps to recreate here:

https://github.com/glindsell/free-peer/tree/ingress/stream-meshed

olix0r · 2019-03-21T16:10:59Z

@glindsell thanks for putting together a repro and sharing! It's a little hard to tease out a clear problem description from that README, though. Would you mind opening a new issue so that we can make sure we get to the bottom of it?

glindsell · 2019-03-22T10:49:00Z

@olix0r good idea, I've updated the issue which I opened specifically for the purpose of gRPC stream load balancing with this info:

#2120

grampelberg added priority/triage bug labels Sep 5, 2018

seanmonstar self-assigned this Sep 13, 2018

seanmonstar removed their assignment Oct 30, 2018

seanmonstar mentioned this issue Nov 10, 2018

change Inbound to always use localhost linkerd/linkerd2-proxy#122

Merged

seanmonstar closed this as completed in linkerd/linkerd2-proxy#122 Nov 14, 2018

seanmonstar reopened this Nov 15, 2018

dadjeibaah mentioned this issue Nov 15, 2018

Add new iptable rule to for outbound traffic #1863

Merged

dadjeibaah closed this as completed in #1863 Nov 15, 2018

glindsell mentioned this issue Mar 22, 2019

Add load balancing capability for bidirectional gRPC streaming #2120

Closed

github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requests fail when src & dst are the same #1585

Requests fail when src & dst are the same #1585

tanuck commented Sep 5, 2018

grampelberg commented Sep 5, 2018

tanuck commented Sep 5, 2018

grampelberg commented Sep 5, 2018

tanuck commented Sep 13, 2018

olix0r commented Sep 13, 2018

seanmonstar commented Sep 13, 2018

tanuck commented Sep 19, 2018

JCMais commented Nov 1, 2018

seanmonstar commented Nov 7, 2018

seanmonstar commented Nov 15, 2018

JCMais commented Nov 16, 2018

glindsell commented Mar 21, 2019

olix0r commented Mar 21, 2019

glindsell commented Mar 22, 2019

Requests fail when src & dst are the same #1585

Requests fail when src & dst are the same #1585

Comments

tanuck commented Sep 5, 2018

grampelberg commented Sep 5, 2018

tanuck commented Sep 5, 2018

grampelberg commented Sep 5, 2018

tanuck commented Sep 13, 2018

olix0r commented Sep 13, 2018

seanmonstar commented Sep 13, 2018

tanuck commented Sep 19, 2018

JCMais commented Nov 1, 2018

seanmonstar commented Nov 7, 2018

seanmonstar commented Nov 15, 2018

JCMais commented Nov 16, 2018

glindsell commented Mar 21, 2019

olix0r commented Mar 21, 2019

glindsell commented Mar 22, 2019