Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests fail when src & dst are the same #1585

Closed
tanuck opened this issue Sep 5, 2018 · 14 comments · Fixed by linkerd/linkerd2-proxy#122 or #1863
Closed

Requests fail when src & dst are the same #1585

tanuck opened this issue Sep 5, 2018 · 14 comments · Fixed by linkerd/linkerd2-proxy#122 or #1863

Comments

@tanuck
Copy link
Contributor

tanuck commented Sep 5, 2018

Hello,

When making HTTP1.1 requests where the src and dst are the same (a pod sending a request to itself) the proxy responds with a 500 status code. If the request is sent to a different pod in the deployment, everything works fine. If you send the request on the loopback address rather than the service DNS name, that is also fine. Is this expected?

$ linkerd version
Client version: v18.8.4
Server version: v18.8.4

Using the default sidecar generated from linkerd inject

Thanks

@grampelberg
Copy link
Contributor

Well that sounds interesting. Is there anything special with your application? Are you using TLS? Could we see your k8s resource yaml?

@tanuck
Copy link
Contributor Author

tanuck commented Sep 5, 2018

So no TLS. It was initially found on a deployment running a Node.js GraphQL application on port 80. I've since reproduced this on every other deployment I've tried.

Here is the simplest reproduction that I've found:

  • kubectl run nginx --image=nginx --port=80 --replicas=2 -o yaml --dry-run | linkerd inject - | kc apply -f -
  • kubectl expose deploy nginx --port=80 --target-port=80 --type=ClusterIP
  • exec into one of the nginx containers and curl -v nginx - every other request should return 500

@grampelberg
Copy link
Contributor

That's fantastic replication steps, thank you!

@tanuck
Copy link
Contributor Author

tanuck commented Sep 13, 2018

Quick update - just upgraded to v18.9.1 and this problem still persists.

@olix0r
Copy link
Member

olix0r commented Sep 13, 2018

I'd be curious to see what linkerd tap deploy nginx shows while the curl command is run. Also, the output of curl localhost:4191/metrics | grep -e request_total -e response_total might be informative.

@seanmonstar seanmonstar self-assigned this Sep 13, 2018
@seanmonstar
Copy link
Contributor

Hm, so if the dst is a socket address, the proxy will use it directly, which would explain the loopback succeeding. However, if it's hostname, then it will either:

  • If it looks like a service in the cluster, ask the controller for the socket address.
  • Or perform a system DNS lookup, and try to use that.

Is it possible to collect debug logs from the proxy? Or do we have an environment that I can poke into and enable them myself?

@tanuck
Copy link
Contributor Author

tanuck commented Sep 19, 2018

So I used my steps from above. Then after sending 4 curl -v nginx requests, the tap and prometheus data look like this:

$ linkerd tap deploy nginx
req id=0:71 proxy=out src=10.0.1.2:34890 dst=10.0.1.2:80 tls=no_identity :method=GET :authority=nginx :path=/
end id=0:71 proxy=out src=10.0.1.2:34890 dst=10.0.1.2:80 tls=no_identity reset-error=6 duration=0µs response-length=0B


req id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity :method=GET :authority=nginx :path=/
req id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :method=GET :authority=nginx :path=/
rsp id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :status=200 latency=911µs
end id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled duration=27µs response-length=612B
rsp id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity :status=200 latency=2562µs
end id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity duration=46µs response-length=612B


req id=0:73 proxy=out src=10.0.1.2:35184 dst=10.0.1.2:80 tls=no_identity :method=GET :authority=nginx :path=/
end id=0:73 proxy=out src=10.0.1.2:35184 dst=10.0.1.2:80 tls=no_identity reset-error=6 duration=0µs response-length=0B


req id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity :method=GET :authority=nginx :path=/
req id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :method=GET :authority=nginx :path=/
rsp id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :status=200 latency=518µs
end id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled duration=40µs response-length=612B
rsp id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity :status=200 latency=2677µs
end id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity duration=68µs response-length=612B
# HELP request_total Total count of HTTP requests.
# TYPE request_total counter
request_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-k6rkx",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery"} 4
request_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-v2rqk",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery"} 2
# HELP response_total Total count of HTTP responses
# TYPE response_total counter
response_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-v2rqk",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",classification="success",status_code="200"} 2

@olix0r hope that helps!

@seanmonstar seanmonstar removed their assignment Oct 30, 2018
@JCMais
Copy link

JCMais commented Nov 1, 2018

Having same issue.

We have a pod that acts as an authorization microservice, this pod can make requests to itself to check other permissions, so the hostname is http://authorization, this previously was working, after enabling linkerd2 it stopped working, linkerd-proxy container gives the following error:

ERR! proxy={server=out listen=127.0.0.1:4140 remote=10.16.0.48:60652} linkerd2_proxy::proxy::http::router service error: Error caused by underlying HTTP/2 error: protocol error: frame with invalid size

@seanmonstar
Copy link
Contributor

  • The tap logs show the request to the other pod both from out and in, so the two proxies were involved.
  • The tap logs don't show the request in when the pod is the same, suggesting to me that the proxy never receives the request it should be sending itself.
  • The reset-error=6 is a FRAME_SIZE_ERROR from HTTP2, which would be the out proxy making an HTTP2 request to dst, and the bytes it got back are likely not HTTP2, and triggering that error.
  • The proxies will speak HTTP2 to each other when they know there is a proxy on the other side, so a connection returning bytes that aren't HTTP2 suggests it's connecting to something else.

All this makes me wonder if something is preventing the connection from being redirected to the proxy. Perhaps something in the iptables rules that are setup during proxy-init.

@seanmonstar
Copy link
Contributor

Actually, while there was a proxy change for this, it won't be fixed until the iptables config is changed in this repo also.

@seanmonstar seanmonstar reopened this Nov 15, 2018
dadjeibaah pushed a commit that referenced this issue Nov 15, 2018
When requests from a pod send requests to itself, the proxy properly redirects traffic from the originating container in the pod through the outbound listener of the proxy. Once the request ends on the inbound side of the proxy, it skips the proxy and calls the original container that made the request. This can cause problems for containers that serve HTTP as the proxy naively tries to initiate an HTTP/2 connection to the destination of a request.  (See #1585 for a concrete example)

This PR adds a new iptable rule, coupled with a proxy [change](linkerd/linkerd2-proxy#122) ensure that requests from a that occur in the aforementioned scenario, always redirect to the inbound listener of the proxy first.

fixes #1585

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
@JCMais
Copy link

JCMais commented Nov 16, 2018

Thanks for fixing this!

@glindsell
Copy link

I'm seeing the same reset-error=6 when trying to load balance gRPC using linkerd2 and nginx ingress.

Steps to recreate here:

https://github.com/glindsell/free-peer/tree/ingress/stream-meshed

@olix0r
Copy link
Member

olix0r commented Mar 21, 2019

@glindsell thanks for putting together a repro and sharing! It's a little hard to tease out a clear problem description from that README, though. Would you mind opening a new issue so that we can make sure we get to the bottom of it?

@glindsell
Copy link

@olix0r good idea, I've updated the issue which I opened specifically for the purpose of gRPC stream load balancing with this info:

#2120

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
6 participants