Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM eth0 fails unpredictably #1710

Closed
drigz opened this issue Jul 18, 2017 · 13 comments
Closed

VM eth0 fails unpredictably #1710

drigz opened this issue Jul 18, 2017 · 13 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@drigz
Copy link

drigz commented Jul 18, 2017

minikube version: v0.20.0

Environment:

  • OS: Ubuntu 14.04.5
  • VM Driver: virtualbox
  • ISO version: v0.20.0

What happened:
The VM can't contact the internet (or other local networks, eg 10.0.2.0/24). I notice this because pods crash due to loss of internet access.

minikube stop then minikube start resolves the issue.

How to reproduce it:
I've seen this a couple of times after leaving minikube running for a few days, but don't know what causes it.

Let me know if there's more I can test to debug the problem if it occurs again, or something I should try to reproduce it.

More details:
Trying to use the dashboard gives the error:
Get https://10.0.0.1:443/api/v1/namespaces/default/pods: dial tcp 10.0.0.1:443: getsockopt: network is unreachable

minikube ssh fails with:
E0718 14:40:44.106142 24866 ssh.go:53] Error attempting to ssh/run-ssh-command: exit status 255
Most other minikube commands also fail because ssh doesn't work.

ssh docker@192.168.99.100 (password: tcuser) works. Running on the VM:

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:fd:21:89 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a00:27ff:fefd:2189/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:42:22:ff brd ff:ff:ff:ff:ff:ff
    inet 192.168.99.100/24 brd 192.168.99.255 scope global dynamic eth1
       valid_lft 225sec preferred_lft 225sec
    inet6 fe80::a00:27ff:fe42:22ff/64 scope link 
       valid_lft forever preferred_lft forever
4: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1
    link/sit 0.0.0.0 brd 0.0.0.0
6: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:81:b4:2c:27 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:81ff:feb4:2c27/64 scope link 
       valid_lft forever preferred_lft forever
8: vethb33de0e@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 7e:da:79:a3:10:1b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::7cda:79ff:fea3:101b/64 scope link 
       valid_lft forever preferred_lft forever
10: veth76f1964@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 2a:cc:d4:3b:a5:36 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::28cc:d4ff:fe3b:a536/64 scope link 
       valid_lft forever preferred_lft forever
42: veth802e9a9@if41: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 6e:90:52:be:f4:e8 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    inet6 fe80::6c90:52ff:febe:f4e8/64 scope link 
       valid_lft forever preferred_lft forever
44: veth561c50c@if43: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether b6:d3:5f:33:7d:4c brd ff:ff:ff:ff:ff:ff link-netnsid 5
    inet6 fe80::b4d3:5fff:fe33:7d4c/64 scope link 
       valid_lft forever preferred_lft forever
46: vethbed633a@if45: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 26:fb:df:e3:8a:04 brd ff:ff:ff:ff:ff:ff link-netnsid 6
    inet6 fe80::24fb:dfff:fee3:8a04/64 scope link 
       valid_lft forever preferred_lft forever
$ ip route
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.99.0/24 dev eth1 proto kernel scope link src 192.168.99.100 
$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: Network is unreachable
$ ping 192.168.99.1
PING 192.168.99.1 (192.168.99.1): 56 data bytes
64 bytes from 192.168.99.1: seq=0 ttl=64 time=0.154 ms
$ sudo ifconfig eth0 10.0.2.15 netmask 255.255.255.0
$ ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:fd:21:89 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fefd:2189/64 scope link 
       valid_lft forever preferred_lft forever
$ ip route
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.99.0/24 dev eth1 proto kernel scope link src 192.168.99.100 
$ ping 10.0.2.2
PING 10.0.2.2 (10.0.2.2): 56 data bytes
^C
--- 10.0.2.2 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
$ dmesg
[SNIP]
[424957.621845] audit: type=1325 audit(1500382119.619:49626): table=nat family=2 entries=63
[424957.622072] audit: type=1300 audit(1500382119.619:49626): arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40 a3=166d930 items=0 ppid=3419 pid=25816 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0
 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables" exe="/usr/sbin/xtables-multi" subj=kernel key=(null)
[424957.622083] audit: type=1327 audit(1500382119.619:49626): proctitle=69707461626C6573002D7732002D43004B5542452D4D41524B2D4D415351002D74006E6174002D6A004D41524B002D2D7365742D786D61726B00307830303030343030302F
30783030303034303030
[424957.644089] audit: type=1325 audit(1500382119.641:49627): table=nat family=2 entries=63
[424957.644125] audit: type=1300 audit(1500382119.641:49627): arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40 a3=1466690 items=0 ppid=3419 pid=25818 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0
 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables" exe="/usr/sbin/xtables-multi" subj=kernel key=(null)
[424957.644132] audit: type=1327 audit(1500382119.641:49627): proctitle=69707461626C6573002D7732002D43004B5542452D504F5354524F5554494E47002D74006E6174002D6D00636F6D6D656E74002D2D636F6D6D656E74006B756265726E6574
65732073657276696365207472616666696320726571756972696E6720534E4154002D6D006D61726B002D2D6D61726B0030783030303034303030
[424962.368001] audit: type=1325 audit(1500382124.366:49628): table=filter family=2 entries=22
[424992.373528] audit_printk_skb: 15 callbacks suppressed
[SNIP]
@drigz drigz changed the title VM eth0 goes down unpredictably VM eth0 fails unpredictably Jul 18, 2017
@aaron-prindle
Copy link
Contributor

What driver are you using for minikube to provision the VM?

@drigz
Copy link
Author

drigz commented Jul 21, 2017

Oops, missed that in the report. It's virtualbox.

@r2d4 r2d4 added the kind/bug Categorizes issue or PR as related to a bug. label Sep 2, 2017
@Multiply
Copy link

@drigz Are you by any chance using this https://stevesloka.com/2017/05/19/access-minikube-services-from-host/ or something similar?

I'm having an issue where eth0 crashes, and keeps crashing, after trying to pull docker images down. When I remove the IP-route from the blog-post above, it stops crashing, and the cluster works again.

I can add the route back afterwards, and get no issues until I pull docker images again.

@drigz
Copy link
Author

drigz commented Sep 28, 2017

@Multiply, yes, I am adding a route to reach the containers running in minikube.

However, I don't think I'm seeing the issue as reproducibly as you. When you say "pull docker images", do you mean starting a container for which the image has not yet been downloaded? That's normally no problem for me.

@Multiply
Copy link

Multiply commented Sep 28, 2017

@drigz I have a setup.sh that does a few things, such as clean up routes, minikube delete, minikube start, recreate routes.
Then I use helm and helmfile to start everything up at once. We're talking 20+ containers, and all of them are using imagePullPolicy: Always.

As long as I have a route 10.0.0.0/24 created, it fails every time.
If I do this using the xhyve driver, there is no problem so far, but I haven't run any real tests yet.

Edit:
I am still not sure that it's because it is downloading docker images, but they fail, because the DNS server on 10.0.2.3 is down, because eth0 is down.

As I wrote before, if I remove the route from my Mac, it starts working after a few seconds, and I can easily add the route back afterwards, so there's something odd going on here.

@ursuad
Copy link

ursuad commented Oct 23, 2017

Ok, so I had this issue as well. After some digging, I could see lines like this in the output of the dmesg command:

               [19198.708086] e1000 0000:00:03.0 eth0: Reset adapter
[19198.708133] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[19198.708572] e1000 0000:00:03.0 eth0: Detected Tx Unit Hang
                 Tx Queue             <0>
                 TDH                  <6a>
                 TDT                  <76>
                 next_to_use          <76>
                 next_to_clean        <6a>
               buffer_info[next_to_clean]
                 time_stamp           <1012041db>
                 next_to_watch        <6a>
                 jiffies              <101205c02>
                 next_to_watch.status <0>

and

[19401.969811] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[19402.011988] e1000: eth0 NIC Link is Down
[19402.012000] e1000 0000:00:03.0 eth0: Reset adapter
[19402.053944] e1000 0000:00:03.0 eth0: Reset adapter
[19402.097207] e1000 0000:00:03.0 eth0: Reset adapter
[19402.139891] e1000 0000:00:03.0 eth0: Reset adapter

I've found a similar issue here where, from what I understood there was an issue with the VirtualBox Intel adapter.

A solution that seems to work ( at least until now ) was to do a:

  • $ minikube stop - stop the minikube VM
  • edit the ~/.minikube/machines/minikube/config.json file. Look for the NatNicType and HostOnlyNicType fields and set the values for them to be Am79C973 ( this is the PCnet-FAST III Network adapter type)
  • $ minikube start - start the minikube VM

@TopSwagCode
Copy link

I have the same Issue on Windows 10.
Both for Hyper V and Virtual Box.

My quick fix was to disable Network Card 1.
Press ok.
Enable Network Card 1 to NAT again.
Press ok.
After a few seconds minikube got connection again and worked again.

The problem was not only during downloading of docker images. It would happen when it had been running for some time.

Only thing I had been able to find so far was:
00:52:42.617495 NAT: Can't allocate mbuf 00:52:42.617505 NAT: Can't allocate mbuf 00:52:42.617515 NAT: Can't allocate mbuf 00:52:42.617525 NAT: Can't allocate mbuf 00:52:42.617535 NAT: Can't allocate mbuf 00:52:42.617544 NAT: Can't allocate mbuf 00:52:42.617556 NAT: Can't allocate mbuf 00:52:42.617565 NAT: Can't allocate mbuf 00:52:42.617574 NAT: Can't allocate mbuf 00:52:42.617583 NAT: Can't allocate mbuf 00:52:42.617592 NAT: Can't allocate mbuf 00:52:42.617603 NAT: Can't allocate mbuf 00:52:42.617614 NAT: Can't allocate mbuf 00:52:42.617624 NAT: Can't allocate mbuf 00:52:42.617633 NAT: Can't allocate mbuf 00:52:42.617643 NAT: Can't allocate mbuf 00:52:42.617653 NAT: Can't allocate mbuf 00:52:42.617662 NAT: Can't allocate mbuf 00:52:42.617671 NAT: Can't allocate mbuf 00:52:42.617680 NAT: Can't allocate mbuf 00:52:44.118919 NAT: Can't allocate mbuf

Which was gave me the idea to simply disable and enable the network card.

I am currently running about 20 micro services on my local machine.

Hope this help's someone else out there.
I haven't tried any of the prior solutions. Just happy to have this work around.

@yagosys
Copy link

yagosys commented Jan 27, 2018

I am having similar issue with debian 9 as guest machine.
debian 9 has NAT interface (eth0) and bridged interace (eth1). eth0 is using my host (MAC) for internet.

as long as I added a static route point to somewhere with nexthop to debian eth1 on host machine.
the eth0 adaptor reset will appear . and internet access on debian goes down.

I have tried many ways . but none of work.
what I tried

  1. replace intel e1000 driver with AMD Am79C973, same issue (the log is transmission timeout etc.)
  2. vb.customize ['modifyvm', :id, '--cableconnected1', 'on'] added this in my Vagrantfile, not work

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@micdah
Copy link

micdah commented Oct 1, 2018

I'm still excerpering this issue running on Windows 10, using VirtualBox as vm-driver.

@sanitariu
Copy link

I have the same problem using virtualbox 6.1.14. Problem is e1000 driver. Any ideas how to fix it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests