Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long-running tunnel breaks cluster connectivity: ssh: handshake failed: connection reset by peer #4240

Open
ghost opened this issue May 10, 2019 · 14 comments
Labels
area/tunnel Support for the tunnel command help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. os/linux priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@ghost
Copy link

ghost commented May 10, 2019

The exact command to reproduce the issue:
minikube ip

But the truth is, I kept minikube tunnel running for a while. I can access everything using kubectl, but when I want to do any command that communicates with the cluster in any way, it just hangs.

I have done nothing special, just minikube start, in an other window minikube tunnel.

I also tried minikube tunnel --cleanup,did not fail, but did nothing either.

Then i tried minikube ip -v=7:
Using SSH client type: native
&{{{ 0 [] [] []} docker [0x83b580] 0x83b550 [] 0s} 127.0.0.1 37871 }
About to run SSH command:
ip addr show

and after a while: Error dialing TCP: ssh: handshake failed: read tcp 127.0.0.1:35568->127.0.0.1:37871: read: connection reset by peer

The full output of the command that failed:
It just hangs no output.

The output of the minikube logs command:
That just hangs too.

The operating system version:
Kubuntu 17.10

@tstromberg tstromberg changed the title minikube commands hang minikube commands hang after tunnel usage May 15, 2019
@tstromberg tstromberg changed the title minikube commands hang after tunnel usage post-tunnel: commands hang, ssh: handshake failed: connection reset by peer May 15, 2019
@tstromberg tstromberg changed the title post-tunnel: commands hang, ssh: handshake failed: connection reset by peer post-tunnel: ssh: handshake failed: connection reset by peer May 15, 2019
@tstromberg
Copy link
Contributor

Sorry to hear that this is happening. Do you mind sharing the output of:

minikube status
ip r s
sudo iptables -S
minikube tunnel --cleanup --alsologtostderr -v=8

Also, can you share the minikube start command-line used, as well as the output it showed? I'm curious if this is kvm2 or virtualbox.

@tstromberg tstromberg added area/tunnel Support for the tunnel command kind/bug Categorizes issue or PR as related to a bug. os/linux triage/needs-information Indicates an issue needs more information in order to work on it. labels May 15, 2019
@ghost
Copy link
Author

ghost commented May 15, 2019

I was using minikube with driver none, directly on my linux distro (kubuntu 17.10). In the meantime my problem was solved, not sure what fixed it, but using the extra config for resolv (to make sure it points to accessible DNS resolvers) in combination with clearing iptables and restarting docker/kubelet seems to have fixed my issues.

@tstromberg
Copy link
Contributor

Closing as unreproducible. Please re-open if you see this problem again. It's very mysterious.

@Eelis
Copy link

Eelis commented Aug 6, 2019

I have the same issue.

Output of minikube status:

💣  Error getting bootstrapper: getting kubeadm bootstrapper: command runner: getting ssh client for bootstrapper: Error dialing tcp via ssh client: ssh: handshake failed: read tcp 127.0.0.1:60878->127.0.0.1:45871: read: connection reset by peer

Output of ip r s:

default via 192.168.0.1 dev enp3s0 proto dhcp metric 100 
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
192.168.0.0/24 dev enp3s0 proto kernel scope link src 192.168.0.20 metric 100 
192.168.99.0/24 dev vboxnet0 proto kernel scope link src 192.168.99.1 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown 

Output of sudo iptables -S:

-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
-A FORWARD -i virbr0 -o virbr0 -j ACCEPT
-A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN

Output of minikube tunnel --cleanup --alsologtostderr -v=8:

I0806 20:31:11.908621   12963 tunnel.go:48] Checking for tunnels to cleanup...

minikube start command:

minikube start --memory 10000 --disk-size 50g --dns-domain myapp-kube --extra-config=kubelet.cluster-domain=myapp-kube

@tstromberg tstromberg reopened this Aug 6, 2019
@Eelis
Copy link

Eelis commented Aug 12, 2019

I noticed that when the problem occurs, I can fix everything by running:

vboxmanage controlvm minikube setlinkstate1 off
vboxmanage controlvm minikube setlinkstate1 on

@tstromberg tstromberg added kind/support Categorizes issue or PR as a support question. and removed triage/needs-information Indicates an issue needs more information in order to work on it. kind/bug Categorizes issue or PR as related to a bug. labels Sep 20, 2019
@tstromberg tstromberg changed the title post-tunnel: ssh: handshake failed: connection reset by peer long-running tunnel breaks cluster connectivity: ssh: handshake failed: connection reset by peer Sep 20, 2019
@tstromberg
Copy link
Contributor

This is still an issue in v1.4 as far as I know.

@pollend
Copy link

pollend commented Nov 7, 2019

I'm still having this problem and I'm running v1.5.1.

@olivierlemasle
Copy link
Member

Could be linked to #4151 ?

@medyagh
Copy link
Member

medyagh commented Dec 16, 2019

I believe this is still an issue in v.1.6.1 since we didn't add any new code for tunnel

@medyagh medyagh added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. area/networking networking issues area/storage storage bugs kind/bug Categorizes issue or PR as related to a bug. and removed kind/support Categorizes issue or PR as a support question. labels Dec 16, 2019
@tstromberg tstromberg added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed area/networking networking issues area/storage storage bugs labels Dec 19, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2020
@medyagh
Copy link
Member

medyagh commented May 13, 2020

I wonder if this issue exists on docker driver too ? anyone tried with docker driver?

the ssh connectivy keeping alive could still be

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 19, 2020
@sharifelgamal
Copy link
Collaborator

This remains an issue I believe.
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jul 15, 2020
@pluveto
Copy link

pluveto commented Mar 16, 2023

I ran into this when I installing minikube.

log:

sudo hostname minikube && echo "minikube" | sudo tee /etc/hostname
I0316 15:29:18.568636   24627 main.go:141] libmachine: Error dialing TCP: ssh: handshake failed: read tcp 127.0.0.1:50064->127.0.0.1:32797: read: connection reset by peer
I0316 15:31:31.688700   24627 main.go:141] libmachine: Error dialing TCP: ssh: handshake failed: read tcp 127.0.0.1:60178->127.0.0.1:32797: read: connection reset by peer
I0316 15:33:44.808708   24627 main.go:141] libmachine: Error dialing TCP: ssh: handshake failed: read tcp 127.0.0.1:42278->127.0.0.1:32797: read: connection reset by peer
╭─       root@ecs-ebe5    ~                                                                                                                            SIGINT(2) ↵  14.7G    0.02    15:33:19 
╰─ ip r s

default via 192.168.0.1 dev eth0 proto dhcp metric 100
169.254.169.254 via 192.168.0.1 dev eth0 proto dhcp metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.245 metric 100
192.168.49.0/24 dev br-d6ae2df29860 proto kernel scope link src 192.168.49.1
╭─       root@ecs-ebe5    ~                                                                                                                                        14.7G    0.02    15:33:23 
╰─ sudo iptables -S

-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o br-d6ae2df29860 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-d6ae2df29860 -j DOCKER
-A FORWARD -i br-d6ae2df29860 ! -o br-d6ae2df29860 -j ACCEPT
-A FORWARD -i br-d6ae2df29860 -o br-d6ae2df29860 -j ACCEPT
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 192.168.49.2/32 ! -i br-d6ae2df29860 -o br-d6ae2df29860 -p tcp -m tcp --dport 32443 -j ACCEPT
-A DOCKER -d 192.168.49.2/32 ! -i br-d6ae2df29860 -o br-d6ae2df29860 -p tcp -m tcp --dport 8443 -j ACCEPT
-A DOCKER -d 192.168.49.2/32 ! -i br-d6ae2df29860 -o br-d6ae2df29860 -p tcp -m tcp --dport 5000 -j ACCEPT
-A DOCKER -d 192.168.49.2/32 ! -i br-d6ae2df29860 -o br-d6ae2df29860 -p tcp -m tcp --dport 2376 -j ACCEPT
-A DOCKER -d 192.168.49.2/32 ! -i br-d6ae2df29860 -o br-d6ae2df29860 -p tcp -m tcp --dport 22 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-d6ae2df29860 ! -o br-d6ae2df29860 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o br-d6ae2df29860 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
╭─       root@ecs-ebe5    ~                                                                                                                                        14.7G    0.02    15:33:34 
╰─ minikube tunnel --cleanup --alsologtostderr -v=8


I0316 15:33:49.296913   25030 out.go:296] Setting OutFile to fd 1 ...
I0316 15:33:49.297038   25030 out.go:348] isatty.IsTerminal(1) = true
I0316 15:33:49.297053   25030 out.go:309] Setting ErrFile to fd 2...
I0316 15:33:49.297060   25030 out.go:348] isatty.IsTerminal(2) = true
I0316 15:33:49.297166   25030 root.go:334] Updating PATH: /root/.minikube/bin
I0316 15:33:49.297367   25030 mustload.go:65] Loading cluster: minikube
I0316 15:33:49.297671   25030 config.go:180] Loaded profile config "minikube": Driver=docker, ContainerRuntime=docker, KubernetesVersion=v1.26.1
I0316 15:33:49.298015   25030 cli_runner.go:164] Run: docker container inspect minikube --format={{.State.Status}}
I0316 15:33:49.344649   25030 host.go:66] Checking if "minikube" exists ...
I0316 15:33:49.344859   25030 cli_runner.go:164] Run: docker system info --format "{{json .}}"
I0316 15:33:49.428177   25030 info.go:266] docker info: {ID:6d8873e2-970b-43de-a349-142d5b2d9518 Containers:1 ContainersRunning:1 ContainersPaused:0 ContainersStopped:0 Images:1 Driver:overlay2 DriverStatus:[[Backing Filesystem extfs] [Supports d_type true] [Using metacopy false] [Native Overlay Diff true] [userxattr false]] SystemStatus:<nil> Plugins:{Volume:[local] Network:[bridge host ipvlan macvlan null overlay] Authorization:<nil> Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:false KernelMemoryTCP:false CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6Tables:true Debug:false NFd:31 OomKillDisable:false NGoroutines:37 SystemTime:2023-03-16 15:33:49.422163619 +0800 CST LoggingDriver:json-file CgroupDriver:systemd NEventsListener:0 KernelVersion:5.15.0-67-generic OperatingSystem:Ubuntu 22.04.2 LTS OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:{AllowNondistributableArtifactsCIDRs:[] AllowNondistributableArtifactsHostnames:[] InsecureRegistryCIDRs:[127.0.0.0/8] IndexConfigs:{DockerIo:{Name:docker.io Mirrors:[] Secure:true Official:true}} Mirrors:[]} NCPU:8 MemTotal:16576102400 GenericResources:<nil> DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:ecs-ebe5 Labels:[] ExperimentalBuild:false ServerVersion:23.0.1 ClusterStore: ClusterAdvertise: Runtimes:{Runc:{Path:runc}} DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:<nil>} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:2456e983eb9e37e47538f59ea18f2043c9a73640 Expected:2456e983eb9e37e47538f59ea18f2043c9a73640} RuncCommit:{ID:v1.1.4-0-g5fd4c4d Expected:v1.1.4-0-g5fd4c4d} InitCommit:{ID:de40ad0 Expected:de40ad0} SecurityOptions:[name=apparmor name=seccomp,profile=builtin name=cgroupns] ProductLicense: Warnings:<nil> ServerErrors:[] ClientInfo:{Debug:false Plugins:[map[Name:buildx Path:/usr/libexec/docker/cli-plugins/docker-buildx SchemaVersion:0.1.0 ShortDescription:Docker Buildx Vendor:Docker Inc. Version:v0.10.2] map[Name:compose Path:/usr/libexec/docker/cli-plugins/docker-compose SchemaVersion:0.1.0 ShortDescription:Docker Compose Vendor:Docker Inc. Version:v2.16.0] map[Name:scan Path:/usr/libexec/docker/cli-plugins/docker-scan SchemaVersion:0.1.0 ShortDescription:Docker Scan Vendor:Docker Inc. Version:v0.23.0]] Warnings:<nil>}}
I0316 15:33:49.430640   25030 out.go:177]

W0316 15:33:49.431870   25030 out.go:239] ❌  Exiting due to DRV_CP_ENDPOINT: failed to lookup ip for ""
❌  Exiting due to DRV_CP_ENDPOINT: failed to lookup ip for ""
W0316 15:33:49.431965   25030 out.go:239] 💡  Suggestion:

    Recreate the cluster by running:
    minikube delete
    minikube start
💡  Suggestion:

    Recreate the cluster by running:
    minikube delete
    minikube start
W0316 15:33:49.431976   25030 out.go:239]

W0316 15:33:49.433123   25030 out.go:239] ╭───────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                           │
│    😿  If the above advice does not help, please let us know:                             │
│    👉  https://github.com/kubernetes/minikube/issues/new/choose                           │
│                                                                                           │
│    Please run `minikube logs --file=logs.txt` and attach logs.txt to the GitHub issue.    │
│    Please also attach the following file to the GitHub issue:                             │
│    - /tmp/minikube_tunnel_9355d93b403d830041bf7d26a54ff1e776fdc191_0.log                  │
│                                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                           │
│    😿  If the above advice does not help, please let us know:                             │
│    👉  https://github.com/kubernetes/minikube/issues/new/choose                           │
│                                                                                           │
│    Please run `minikube logs --file=logs.txt` and attach logs.txt to the GitHub issue.    │
│    Please also attach the following file to the GitHub issue:                             │
│    - /tmp/minikube_tunnel_9355d93b403d830041bf7d26a54ff1e776fdc191_0.log                  │
│                                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
I0316 15:33:49.434453   25030 out.go:177]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tunnel Support for the tunnel command help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. os/linux priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

9 participants