Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linkerd-proxy blocks actual proxy requests #6035

Closed
divanikus opened this issue Apr 15, 2021 · 28 comments
Closed

linkerd-proxy blocks actual proxy requests #6035

divanikus opened this issue Apr 15, 2021 · 28 comments

Comments

@divanikus
Copy link

Bug Report

What is the issue?

We have several apps which access resources behind our corporate proxy. They use HTTP Proxy requests (CONNECT, etc) for that. It appears that linkerd-proxy sidecars effectively block all proxy request, so the simply doesn't get any responses. However, linkerd-viz shows that requests as actually happening and successfully finished.

How can it be reproduced?

Just run chromium browser with http proxy enabled.

@kleimkuhler
Copy link
Contributor

@divanikus What Linkerd version are you observing this on? This is probably a duplicate of #5951. I left my reproduction steps on that issue but was not able to see the CONNECT requests fail; they were going through for me.

Since you are observing this, is there any chance you could try the latest edge edge-21.4.3? There are a few proxy fixes in there since 2.10 and it is is slated to become 2.10.1. It would be good to know if this issue was fixed by one of those proxy changes.

@divanikus
Copy link
Author

Problem was observed at 2.10.0, nothing changed on egde-21.4.3.

@divanikus
Copy link
Author

To be clear, I'm talking about outgoing requests, not the incoming ones. Proxy, by itself, is outside of the k8s cluster.

@olix0r olix0r added this to the stable-2.11.0 milestone Apr 15, 2021
@olix0r
Copy link
Member

olix0r commented Apr 15, 2021

@divanikus Can you share more details about the HTTP proxy? Does it require authentication? If so, how is authentication implemented?

Can you share proxy logs with the annotation config.linkerd.io/proxy-log-level: linkerd_proxy_http=debug,linkerd=info,warn? You may need to manually obfuscate header values if basic authentication is used.

@adleong
Copy link
Member

adleong commented May 10, 2021

Closing due to inactivity.

@adleong adleong closed this as completed May 10, 2021
@divanikus
Copy link
Author

I bet it wasn't fixed. Unfortunately I didn't had time to reproduce the thing due my own vacation. Hopefully will post you more info this week. Didn't you tried at least curl something using an external proxy?

@kleimkuhler
Copy link
Contributor

@divanikus I mentioned my attempt at reproducing this that I left in a comment here. I confirmed that with an injected Nginx forward proxy, CONNECT requests were going through it fine.

If you can provide a reproducible example where this fails that would be super helpful. We're glad to keep looking into it, but we'll definitely need more information to make progress.

@divanikus
Copy link
Author

divanikus commented May 17, 2021

So, here's the thing. Everything works if you try to access a plain HTTP site behind a proxy. But if you would try to access HTTPS site, things do break.
Example curl output. Linkerd 2.10.1.

Outside of the cluster (or with disabled linkerd-proxy):

# curl https://icanhazip.com -v -x my-proxy:20000

*   Trying 91.z.y.x...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x564ce0c3ef90)
* Connected to my-proxy (91.z.y.x) port 20000 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to icanhazip.com:443
> CONNECT icanhazip.com:443 HTTP/1.1
> Host: icanhazip.com:443
> User-Agent: curl/7.64.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.0 200 Connection established
< Set-Cookie: SERVERID=a3813c8ed9b2329c44c96e3415cad387; path=/
<
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* CONNECT phase completed!
* CONNECT phase completed!
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=CA; L=San Francisco; O=Cloudflare, Inc.; CN=sni.cloudflaressl.com
*  start date: Aug  4 00:00:00 2020 GMT
*  expire date: Aug  4 12:00:00 2021 GMT
*  subjectAltName: host "icanhazip.com" matched cert's "icanhazip.com"
*  issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x564ce0c3ef90)
> GET / HTTP/2
> Host: icanhazip.com
> User-Agent: curl/7.64.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
< HTTP/2 200
< date: Mon, 17 May 2021 15:35:06 GMT
< content-type: text/plain
< content-length: 15
< access-control-allow-origin: *
< access-control-allow-methods: GET
< x-otter: 🦦
< x-rtfm: Learn about this site at http://bit.ly/icanhazip-faq
< x-thank-you: Many thanks to the fine people at Cloudflare for keeping this site afloat!
< cf-request-id: 0a1c9086570000d15fc4a55000000001
< expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< server: cloudflare
< cf-ray: 650de9ea2c4ed15f-BUF
< alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400
<
193.x.y.z

Inside the cluster, with linkerd-proxy:

# curl https://icanhazip.com -v -x my-proxy:20000

*   Trying 91.z.y.x...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55a1826d5fb0)
* Connected to my-proxy (91.z.y.x) port 20000 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to icanhazip.com:443
> CONNECT icanhazip.com:443 HTTP/1.1
> Host: icanhazip.com:443
> User-Agent: curl/7.64.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.0 200 OK
< set-cookie: SERVERID=3754bda0d46fa2622994381bbd09cc39; path=/
< date: Mon, 17 May 2021 15:35:18 GMT
<
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to icanhazip.com:443
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to icanhazip.com:443

@olix0r olix0r reopened this May 17, 2021
@kleimkuhler kleimkuhler self-assigned this May 19, 2021
@olix0r olix0r added the bug label May 21, 2021
@trentmurray
Copy link

I'm hitting this at the moment. For one particular outbound URL, I get the EXACT same curl output as @divanikus - does anyone have a fix, patch, or workaround for this?

@kleimkuhler
Copy link
Contributor

@trentmurray Do you have manifests for reproducing this? I attempted to reproduce this and was unable to. I observed the linkerd-proxy proxy requests to http/https sites. What forward proxy are you using?

@divanikus
Copy link
Author

divanikus commented Jul 23, 2021

Mine proxies are outside of k8s cluster, try something like tinyproxy etc.

I observed the linkerd-proxy proxy requests to http/https sites.

I can observe them too, but my app isn't getting any response. Eventhough all requests are ok on linkerd-web.

@trentmurray
Copy link

trentmurray commented Jul 24, 2021

@kleimkuhler - I'm not sure what forward proxies are to be honest. I have a very simple k8s cluster; Traefik ingress, linkerd meshing our application namespace, and that's it. We are experiencing this SSL handshake failure only on one URL, all others are completely fine. I'm trying to track down what on earth it could be, but not having much luck.

Postman works, local curl works. I have to try outside of the namespace (or at least a pod not meshed); then I'll come back with the results.

@trentmurray
Copy link

OK, @divanikus @kleimkuhler - my issue ended up being the NACL in AWS not having an egress rule for port 444 for the URL I was trying to access.

The error from Linkerd was a red herring. My application container tried a python requests.post(), linkerd-proxy grabs that requests, and returns a SSL handshake issue, rather than connection time out.

When I took the pod out of the mesh, it just sat there trying to connect to the server until it eventually timed out.

@kleimkuhler
Copy link
Contributor

Okay thanks for the update @trentmurray that is helpful and good to hear. @divanikus if Trent was seeing the same errors as you I'd be curious if you have the same issue going on?

@divanikus
Copy link
Author

@kleimkuhler nope, i'm running on private cloud infrastructure, no restrictions on egresses.

@andronux
Copy link

andronux commented Oct 4, 2021

Hello everyone!
any news regarding this issue?

@kleimkuhler
Copy link
Contributor

@andronux There is no update from my side of things. I have still not been able to reproduce this with the steps I've provided above (linked in #5951).

I'd be happy to look into this more, but until I receive reproduction steps or additional information that helps with me trying to reproduce it myself, I'm not sure there is more I can do.

@trentmurray
Copy link

@kleimkuhler nope, i'm running on private cloud infrastructure, no restrictions on egresses.

What happens when you unmesh the pod and then try to make an http request to the endpoint/port your trying to access? Do you still get SSL, or is it a timeout or other HTTP error?

@AKishchak
Copy link

I have the same issue: everything works well when the pod is outside of the linked mesh. The test was done on the same cluster, but outside the linkerd, no extra egress rules on the cloud level added - and everything works well

I've done some tests for pods inside linkerd mesh and results are following:

Simple request - works:

curl https://httpbin.org/ip

Proxied request works as well, when it's HTTP:

curl -x "http://user:pass@150.150.150.150:1234" http://httpbin.org/ip 

HTTPS doesn't work:

curl -x "http://user:pass@150.150.150.150:1234" https://httpbin.org/ip 

The error is :

curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to httpbin.org:443

@olix0r
Copy link
Member

olix0r commented May 16, 2022

We have unsuccessfully attempted to reproduce this in the past.

We're most likely to resolve this if someone can help provide kubernetes manifests that we can use to reproduce the issue locally.

@olix0r
Copy link
Member

olix0r commented May 24, 2022

#8539 describes a similar problem which may cover some of the reports in this issue--it hard to say without more details.

I've pushed a proxy image (which will be included in this week's edge release) that can be used for testing: ghcr.io/olix0r/l2-proxy:main.c7b9c6565.

@jon-walton
Copy link
Contributor

Hi @olix0r , I've tested this with edge-22.6.2 and can still reproduce it. Here's a write up and steps for you to hopefully also reproduce it 🤞 https://github.com/jon-walton/linkerd2-repro-6035

@kleimkuhler
Copy link
Contributor

@jon-walton I'll look into this again with your repo. Thanks for putting it together!

@kleimkuhler
Copy link
Contributor

After some investigation it looks like the issue has been figured out, but it'd be great to have some confirmation from the participants of this issue.

The issue is related to the HTTP version that the forward proxy is responding with. The CONNECT method was introduced in HTTP/1.1; when a forward proxy responds with that version, the Linkerd proxy knows that the forward proxy supports the request and it completes successfully.

However, there seem to be cases where the forward proxy is responding to CONNECT requests with HTTP/1.0. When the Linkerd proxy sees it's interacting with server with that version, it doesn't attempt to send the request because the server shouldn't support that HTTP method.

We can see this happen with the tinyproxy example @jon-walton provided. When we curl a website through tinyproxy, it responds with

$ curl -v --proxy localhost:8888 https://google.com
...
< HTTP/1.0 200 Connection established
< Proxy-agent: tinyproxy/1.11.0
...

This is surprising because if tinyproxy is correctly serving HTTP/1.0, it shouldn't be responding to CONNECT requests with 2xx — it shouldn't be aware of that method. It should either be responding with an error, or responding with HTTP/1.1 200 Connection established.

Earlier in this issue I referenced when I last tried to reproduce this (comment) without success. This is because the forward proxy I was using was properly responding with HTTP/1.1.

@divanikus based off the original issue description this seems to be the issue for you as well. Are you able to confirm this using a forward proxy that responds with the expected server version?

@jon-walton @AKishchak @trentmurray it'd be helpful if you can provide the forward proxies that you were using when you encountered this issue. It's easy to say that this issue is actually with the forward proxies and not Linkerd proxy, but if this is a common enough issue it may be worth enumerating the forward proxies that are responding with incorrect versions for future debugging.

@jon-walton
Copy link
Contributor

@kleimkuhler for my test, I'm using tinyproxy 1.11.0. version 1.11.1 adds support for returning the correct http version (tinyproxy/tinyproxy@a869e71) which seems promising but not quite... Unfortunately it looks like it's hard coded to respond with HTTP/1.0... https://github.com/tinyproxy/tinyproxy/blob/master/src/reqs.c#L306

If I hard code it to HTTP/1.1 the request works via the linkerd proxy 🎉

I'll update the issue on the tinyproxy side tinyproxy/tinyproxy#152 (too dangerous for me to try fix 😉), but our infra is also running an older version which I probably won't be able to update quickly 🤔

@jon-walton
Copy link
Contributor

@kleimkuhler i've also updated my repro-repo with the "fix" for tinyproxy https://github.com/jon-walton/linkerd2-repro-6035 for you to confirm on your side if you need

@kleimkuhler
Copy link
Contributor

Awesome thanks for the quick reply and updating the repro @jon-walton! This checks out with what I observed and I'm glad to see the fix is at least in progress on tinyproxy.

I'm going to leave this open for additional replies, but this does seem like something that more than one forward proxies handle incorrectly. I don't think we'll be making any changes to the Linkerd proxy to close this.

hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jul 18, 2022
The issue linkerd/linkerd2#6035 describes a situation where Linkerd
rejects HTTP/1.1 CONNECT requests because the remote (non-Linkerd)
forward proxy returns a successful response that erroneously has the
wrong HTTP version (HTTP/1.0). Since the CONNECT method does not exist
in the HTTP/1.0 protocol, Linkerd does not treat this response as a
CONNECT.

It turns out that in the accursed Real World, some HTTP forward proxies
actually do return the wrong protocol version when successfully handling
CONNECT requests. We don't want to remove the check that the protocol
version is correct, because it could lead to us incorrectly treating
responses to CONNECTs as establishing a tunnel when they don't actually
do that. However, in order to make it easier for future users who
encounter issues where other proxies return wrong HTTP versions for
CONNECT responses, it might be nice to log a warning, so that users can
determine *why* their CONNECT requests aren't working.

This branch changes the proxy to log a warning if a successful response
to a CONNECT request had the wrong HTTP version. We won't log the
warning if the response is not a success or if the request was not a
CONNECT, so it should only cover this specific case.
@kleimkuhler
Copy link
Contributor

Closing this for now. You'll see from the change above (linkerd/linkerd2-proxy#1827) that we will also log these responses when they occur.

olix0r pushed a commit to linkerd/linkerd2-proxy that referenced this issue Jul 21, 2022
)

* http: log a warning when a CONNECT response has the wrong version

The issue linkerd/linkerd2#6035 describes a situation where Linkerd
rejects HTTP/1.1 CONNECT requests because the remote (non-Linkerd)
forward proxy returns a successful response that erroneously has the
wrong HTTP version (HTTP/1.0). Since the CONNECT method does not exist
in the HTTP/1.0 protocol, Linkerd does not treat this response as a
CONNECT.

It turns out that in the accursed Real World, some HTTP forward proxies
actually do return the wrong protocol version when successfully handling
CONNECT requests. We don't want to remove the check that the protocol
version is correct, because it could lead to us incorrectly treating
responses to CONNECTs as establishing a tunnel when they don't actually
do that. However, in order to make it easier for future users who
encounter issues where other proxies return wrong HTTP versions for
CONNECT responses, it might be nice to log a warning, so that users can
determine *why* their CONNECT requests aren't working.

This branch changes the proxy to log a warning if a successful response
to a CONNECT request had the wrong HTTP version. We won't log the
warning if the response is not a success or if the request was not a
CONNECT, so it should only cover this specific case.

* format version with debug

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants