Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QEMU stops working with minikube #15021

Open
spowelljr opened this issue Sep 26, 2022 · 6 comments
Open

QEMU stops working with minikube #15021

spowelljr opened this issue Sep 26, 2022 · 6 comments
Assignees
Labels
co/qemu-driver QEMU related issues kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. os/macos priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@spowelljr
Copy link
Member

spowelljr commented Sep 26, 2022

QEMU version: 7.1.0
Machine: macOS 12.6 M1 (arm64)

The qemu driver was working fine and then all of a sudden it stopped working.

Ran minikube delete --all & top | grep qemu to verify no QEMU instance is running.

Tried minikube start --driver qemu and hangs with the following logs:
logs.txt

Ran minikube delete --all --purge and then started again, it hangs further in the process with the following logs:
logs2.txt

Also tried using different Kubernetes versions as well.

Tried uninstalling QEMU, restarting computer, then reinstalling but still the same error.

@spowelljr spowelljr added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. co/qemu-driver QEMU related issues labels Sep 26, 2022
@afbjorklund
Copy link
Collaborator

Unfortunately the "user" networking is still somewhat flaky, when running on macOS.

But it seems related to DNS, which is supposed to be answering on the host side:
dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:36748->10.0.2.3:53: i/o timeout

https://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29

@spowelljr
Copy link
Member Author

https://unix.stackexchange.com/a/614603

In the default "user mode" networking, QEMU uses only the first DNS nameserver from the host machine.

It is a known QEMU behavior, which is not expected to be fixed in QEMU.

More details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625689

@spowelljr
Copy link
Member Author

spowelljr commented Sep 28, 2022

To expand on the above comment, it's related to DNS and the user network.

In the default "user mode" networking, QEMU uses only the first DNS nameserver from the host machine.

If I look at the /etc/resolv.conf on my Mac the first nameserver is a corp one.

So when I try curling in the ISO it's routing the DNS lookup to the corp DNS and fails.

When starting minikube with the qemu driver with the user network we can confirm this by get the following error:
❗ This VM is having trouble accessing https://registry.k8s.io

And if you SSH into the machine you're not able to curl anything:

$ curl www.google.com
curl: (6) Could not resolve host: www.google.com

In the logs we can see the DNS errors holding everything up are from Docker:

Sep 28 20:07:30 minikube dockerd[860]: time="2022-09-28T20:07:30.312679028Z" level=warning msg="Error getting v2 registry: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:58598->10.0.2.3:53: i/o timeout"
Sep 28 20:07:30 minikube dockerd[860]: time="2022-09-28T20:07:30.312819695Z" level=info msg="Attempting next endpoint for pull after error: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:58598->10.0.2.3:53: i/o timeout"
Sep 28 20:07:30 minikube dockerd[860]: time="2022-09-28T20:07:30.320921361Z" level=error msg="Handler for POST /v1.40/images/create returned error: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:58598->10.0.2.3:53: i/o timeout"
Sep 28 20:08:20 minikube dockerd[860]: time="2022-09-28T20:08:20.338281844Z" level=warning msg="Error getting v2 registry: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:57092->10.0.2.3:53: i/o timeout"
Sep 28 20:08:20 minikube dockerd[860]: time="2022-09-28T20:08:20.338366219Z" level=info msg="Attempting next endpoint for pull after error: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:57092->10.0.2.3:53: i/o timeout"
Sep 28 20:08:20 minikube dockerd[860]: time="2022-09-28T20:08:20.343149260Z" level=error msg="Handler for POST /v1.40/images/create returned error: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:57092->10.0.2.3:53: i/o timeout"
Sep 28 20:08:40 minikube dockerd[860]: time="2022-09-28T20:08:40.342848186Z" level=warning msg="Error getting v2 registry: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:44128->10.0.2.3:53: i/o timeout"
Sep 28 20:08:40 minikube dockerd[860]: time="2022-09-28T20:08:40.343528645Z" level=info msg="Attempting next endpoint for pull after error: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:44128->10.0.2.3:53: i/o timeout"
Sep 28 20:08:40 minikube dockerd[860]: time="2022-09-28T20:08:40.349768811Z" level=error msg="Handler for POST /v1.40/images/create returned error: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:44128->10.0.2.3:53: i/o timeout"
Sep 28 20:09:10 minikube dockerd[860]: time="2022-09-28T20:09:10.349929159Z" level=warning msg="Error getting v2 registry: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:36394->10.0.2.3:53: i/o timeout"
Sep 28 20:09:10 minikube dockerd[860]: time="2022-09-28T20:09:10.350541201Z" level=error msg="Not continuing with pull after error: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:36394->10.0.2.3:53: i/o timeout"
Sep 28 20:09:10 minikube dockerd[860]: time="2022-09-28T20:09:10.350810076Z" level=error msg="Handler for POST /v1.40/images/create returned error: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:36394->10.0.2.3:53: i/o timeout"
Sep 28 20:09:40 minikube dockerd[860]: time="2022-09-28T20:09:40.357594007Z" level=warning msg="Error getting v2 registry: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:50671->10.0.2.3:53: i/o timeout"
Sep 28 20:09:40 minikube dockerd[860]: time="2022-09-28T20:09:40.357712840Z" level=info msg="Attempting next endpoint for pull after error: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:50671->10.0.2.3:53: i/o timeout"
Sep 28 20:09:40 minikube dockerd[860]: time="2022-09-28T20:09:40.364860423Z" level=error msg="Handler for POST /v1.40/images/create returned error: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io on 10.0.2.3:53: read udp 10.0.2.15:50671->10.0.2.3:53: i/o timeout"

It then just hangs until it eventually fails.

However, if I start minikube using --network=socket_vmnet (using #14989) it starts fine and am able to curl without issues. And if I start minikube using --network=user but with --container-runtime=containerd it also starts successfully as the Docker step is avoided but am still unable to curl.

Thanks for pointing me in the right direction @afbjorklund

@medyagh
Copy link
Member

medyagh commented Sep 29, 2022

good job in getting to the bottom of this @spowelljr we should add this to our documentation as a known issue.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 28, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/qemu-driver QEMU related issues kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. os/macos priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

5 participants