Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

TLS handshake errors caused by Controller ELB running tcp healthchecks to 443 #295

Closed
iamsaso opened this issue Feb 2, 2017 · 21 comments
Closed
Milestone

Comments

@iamsaso
Copy link
Contributor

iamsaso commented Feb 2, 2017

We are seeing a lot of "TLS handshake error" on kube-apiserver and authproxy

screenshot 2017-02-02 09 29 08

Anyone have any ideas why this is happening?

@redbaron
Copy link
Contributor

redbaron commented Feb 2, 2017

check from the client side, there should be more informative error

@iamsaso
Copy link
Contributor Author

iamsaso commented Feb 2, 2017

Client side works and I dont see any errors

@redbaron
Copy link
Contributor

redbaron commented Feb 2, 2017

there must be, these IPs belong to pods right? and something there failed to established connection

@iamsaso
Copy link
Contributor Author

iamsaso commented Feb 2, 2017

huh, ok I think I can pin it to a service and ELB healthcheck. ty!

@iamsaso iamsaso closed this as completed Feb 2, 2017
@iamsaso iamsaso reopened this Feb 2, 2017
@iamsaso
Copy link
Contributor Author

iamsaso commented Feb 2, 2017

screenshot 2017-02-02 11 36 48
screenshot 2017-02-02 11 36 58

This happens because we are doing TCP healthcheck on SSL listener

@mumoshu mumoshu changed the title TLS handshake error TLS handshake errors from ELB running tcp healthchecks to 443 Feb 5, 2017
@mumoshu mumoshu changed the title TLS handshake errors from ELB running tcp healthchecks to 443 TLS handshake errors caused by Controller ELB running tcp healthchecks to 443 Feb 16, 2017
@whereisaaron
Copy link
Contributor

The apiserver requires a client certificate the ELB doesn't have, so the ELB can only do a TCP connect healthcheck, and that connect/disconnect triggers those errors in the logs. I get them too. It would be nice if there were someway to suppress them, or not log on a zero-data connect/disconnect.

@mumoshu
Copy link
Contributor

mumoshu commented Mar 22, 2017

Thanks for the info @whereisaaron!
How about allowing unauthenticated http accesses to apiserver from ELB in a security group so that we can use http health checks instead of tcp checks?

@whereisaaron
Copy link
Contributor

Enabling any unauthenticated/anonymous access to the API makes be a bit uneasy @mumoshu !

It is possible for failed client-cert requests to fall back to anonymous, but I don't see that you can restrict those anonymous requests to just the ELB, so everyone with access to the API (from either side of the ELB) could make anonymous requests. These would map to the system:unauthenticated RBAC group and so access could be restricted by RBAC. But if you didn't have RBAC enabled, you'd have the whole cluster open for unauthenticated admin access.

So --anonymous-auth=true combined with RBAC authorization then you could enable successful HTTPS health checks by the ELB. That would also avoid these false-positive log entries.

I personally think these log entries are inappropriately conflating an empty TCP connect with real TLS negotiation errors. If the client doesn't send a single byte, I am not sure you could claim they are trying to negotiate anything. It is kind of like someone visits your website login page, but doesn't try to login at all, and then claiming that is a failed login. Seems like a specious stance. Fixing that would require a patch to the authproxy though.

@mumoshu
Copy link
Contributor

mumoshu commented Mar 23, 2017

Thanks @whereisaaron!
How about --insecure-port then? I guess we can restrict accesses to the insecure port of a kube-apiserver with a security group so that only ELBs are allowed to access the insecure port.

@mumoshu
Copy link
Contributor

mumoshu commented Mar 23, 2017

Ah, doing so would enable attackers to access an insecure port via cracked pods on controller nodes?

@jeremyd
Copy link
Contributor

jeremyd commented Apr 4, 2017

I think we can close this. This is just how TCP healthchecks work and there's no easy way to 'fix' it without sacrificing stability... cc @Saso #bugcleaning

@redbaron
Copy link
Contributor

redbaron commented Apr 6, 2017

do we have any other ports opened by apiserver which can be checked by ELB?

@cknowles
Copy link
Contributor

cknowles commented Apr 6, 2017

/etc/kubernetes/manifests/kube-apiserver.yaml says:

- containerPort: 443
  hostPort: 443
  name: https
- containerPort: 8080
  hostPort: 8080
  name: local

8080 is used for the liveness probe too (/healthz).

@danielfm
Copy link
Contributor

danielfm commented Apr 6, 2017

AFAIK 8080 is the insecure port (bound to 127.0.0.1 by default), so I don't think using this port would work.

@mumoshu
Copy link
Contributor

mumoshu commented Apr 28, 2017

This seems to have been fixed by SSL healthchecks. See #604
@Sasso @danielfm @c-knowles @redbaron @whereisaaron Could you confirm? Thanks!

@mumoshu mumoshu closed this as completed Apr 28, 2017
@mumoshu mumoshu added this to the v0.9.6-rc.6 milestone Apr 28, 2017
@cknowles
Copy link
Contributor

Anyone seeing this again with a more recent version of kube-aws like 0.10.0?

@whereisaaron
Copy link
Contributor

Hi @c-knowles I think this fix was specific to the 'classic' ELB. Have you (like me) switched to the ELBv2 option that kube-aws offers now? That currently configures with a TCP healthcheck, which would have the same problem.

ELBv2 load balancers do support HTTP(S) health checks, even on TCP load balancers. So a similar fix may be possible for the ELBv2 configuration.

@cknowles
Copy link
Contributor

@whereisaaron nope, we haven't swapped unless kube-aws changed the default. I'll investigate further, for now all the info I have it this seems to be re-occurring and not entirely sure why. My current guess is this is healthchecks we have inside the cloud init scripts/systemd unit and the nodes were unhealthy at the time (unrelated health issues).

@sonnysideup
Copy link

I'm using kube-aws v0.10.2, a classic ELB for a single API endpoint and I'm seeing the TLS errors:

I0720 16:09:21.904678       1 logs.go:41] http: TLS handshake error from 10.30.12.238:34902: EOF
I0720 16:09:31.904127       1 logs.go:41] http: TLS handshake error from 10.30.12.238:34940: EOF
I0720 16:09:41.904106       1 logs.go:41] http: TLS handshake error from 10.30.12.238:34970: EOF
I0720 16:09:51.904188       1 logs.go:41] http: TLS handshake error from 10.30.12.238:35010: EOF

Maybe this is a regression?

@g00nix
Copy link

g00nix commented Jul 23, 2019

When you dominate a market, you don't really care about small details like this. So what if the LB can send only TCP healthchecks? Just change the source code from apps so that it doesn't throw EOF errors when TCP healthchecks come in.

EASY!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants