Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVM platform doesn't seem to work #11

Closed
evanphx opened this issue May 2, 2018 · 22 comments
Closed

KVM platform doesn't seem to work #11

evanphx opened this issue May 2, 2018 · 22 comments
Assignees
Labels
auto-closed platform: kvm Issue related to the kvm platform stale-issue This issue has not been updated in 120 days. type: bug Something isn't working

Comments

@evanphx
Copy link

evanphx commented May 2, 2018

First of all, what a cool project! I'm trying to use the kvm platform backend and running into an issue. I turned logging on and get the following:

I0502 09:38:41.822182    2663 x:0] ***************************
I0502 09:38:41.822305    2663 x:0] Args: [/usr/local/bin/runsc --network=host --debug-log-dir=/tmp/runsc --debug --strace --platform=kvm --root /var/run/docker/runtime-runsc/moby --log /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05/log.json --log-format json start 1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05]
I0502 09:38:41.822353    2663 x:0] PID: 2663
I0502 09:38:41.822379    2663 x:0] UID: 0, GID: 0
I0502 09:38:41.822402    2663 x:0] Configuration:
I0502 09:38:41.822432    2663 x:0]              RootDir: /var/run/docker/runtime-runsc/moby
I0502 09:38:41.822456    2663 x:0]              Platform: kvm
I0502 09:38:41.822483    2663 x:0]              FileAccess: proxy, overlay: false
I0502 09:38:41.822518    2663 x:0]              Network: host, logging: false
I0502 09:38:41.822546    2663 x:0]              Strace: true, max size: 1024, syscalls: []
I0502 09:38:41.822571    2663 x:0] ***************************
D0502 09:38:41.822599    2663 x:0] Load sandbox "/var/run/docker/runtime-runsc/moby" "1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05"
D0502 09:38:41.824406    2663 x:0] Signal sandbox "1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05"
D0502 09:38:41.824445    2663 x:0] Start sandbox "1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05", pid: 2639
D0502 09:38:41.824476    2663 x:0] Executing hook {Path:/usr/bin/dockerd Args:[libnetwork-setkey 1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05 39ab48b69d8788a6b7c56f380259da7713ca1247b463b0f7317b03767a59c2bc] Env:[] Timeout:<nil>}, state: {Version:1.0.1-dev ID:1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05 Status:created Pid:2639 Bundle:/var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05 Annotations:map[]}
D0502 09:38:41.851177    2663 x:0] Destroy sandbox "1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05"
D0502 09:38:41.851331    2663 x:0] Killing sandbox "1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05"
D0502 09:38:41.851369    2663 x:0] Killing gofer for sandbox "1570d7168cf1a1ef25052cf0433701779f9d7fa11a98c241f88b25bf1c5c8b05"
W0502 09:38:41.852246    2663 x:0] FATAL ERROR: error starting sandbox: failure executing hook "/usr/bin/dockerd", err: exit status 1
stdout:
stderr: time="2018-05-02T09:38:41-07:00" level=fatal msg="no such file or directory"

The command run was docker run --runtime=runsc hello-world

Docker version: Docker version 17.12.0-ce, build c97c6d6

I guess it's trying to execute the hooks but the fs namespace has already been unbound?

@evanphx
Copy link
Author

evanphx commented May 2, 2018

If I go in and disable the code in runsc to execute the hooks, I get this error everytime:

docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (invalid character 'm' after object key:value pair): /var/lib/docker/runtimes/runsc did not terminate sucessfully: unknown.
ERRO[0011] error waiting for container: context canceled

@prattmic prattmic self-assigned this May 2, 2018
@prattmic
Copy link
Member

prattmic commented May 2, 2018

Hi, thanks for trying out gVisor!

The KVM platform is still experimental and has some rough edges, as you've found. :)

I think there are actually two issues here:

  1. Hooks aren't working correctly. This may be occurring regardless of platform. Can you try the ptrace platform and see if the hooks work correctly there? cc @fvoznika to take a look at hooks.

  2. The sandbox is failing to initialize (or is crashing) when using KVM. That is the second error you posted (the error message needs a lot of work).

It looks like you have debug logging enabled. If you look in the log directory, there should be several files with close timestamps and suffixes like "create, "gofer", "boot", "start", etc. These are go together with a single run. Could you upload a set of these logs from a run with the second error you posted?

Thanks!

@fvoznika
Copy link
Member

fvoznika commented May 2, 2018

Thanks for reporting it! In addition to logs, please also post the docker command you have used.

@evanphx
Copy link
Author

evanphx commented May 2, 2018

@fvoznika i've updated the PR to include the command, sorry about that!

@evanphx
Copy link
Author

evanphx commented May 2, 2018

I got a whole new set of errors that the one I originally indicated this time (again, I have the hook code disabled). The log.log file shows me running the commands and the crash output.

runsc.tar.gz

@prattmic
Copy link
Member

prattmic commented May 2, 2018

Thanks for logs, this looks similar to failures I've seen before. We'll look in it.

What CPU model is this running on? Could you paste one of the processor blocks from /proc/cpuinfo?

@evanphx
Copy link
Author

evanphx commented May 2, 2018

Sure! It's a bit of an older machine running Ubuntu 16.04:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz
stepping        : 10
microcode       : 0xa07
cpu MHz         : 1592.731
cache size      : 3072 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm pti retpoline tpr_shadow vnmi flexpriority dtherm
bugs            : cpu_meltdown spectre_v1 spectre_v2
bogomips        : 4778.19
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz
stepping        : 10
microcode       : 0xa07
cpu MHz         : 1618.093
cache size      : 3072 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm pti retpoline tpr_shadow vnmi flexpriority dtherm
bugs            : cpu_meltdown spectre_v1 spectre_v2
bogomips        : 4778.19
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

@prattmic
Copy link
Member

prattmic commented May 4, 2018

iwankgb has provided more logs in #25.

@iwankgb
Copy link

iwankgb commented May 4, 2018

I gave it another try on another device:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 77
model name	: Intel(R) Atom(TM) CPU  C2550  @ 2.40GHz
stepping	: 8
microcode	: 0x127
cpu MHz		: 2393.999
cache size	: 1024 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch ida arat epb dtherm retpoline kaiser tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms
bugs		: cpu_meltdown spectre_v1 spectre_v2
bogomips	: 4787.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Now getting following error in majority (4 out of 5) of cases (detailed logs available at: https://critical.today/files/create_failed_atom.tar.gz):

docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (invalid character 'm' after object key): /var/lib/docker/runtimes/runsc did not terminate sucessfully: error creating sandbox: unexpected error waiting for sandbox "2307b9c9a43c3962446cbdc8d1fadfa83b5a3850bd4ee7c0012edc2467b6c60a", err: process (18440) is not running, err: <nil>
: unknown.

In 1 out of 5 cases I was getting address space conflict error (detailed logs available at: https://critical.today/files/address_space_atom.tar.gz)

@amscanne
Copy link
Contributor

amscanne commented May 7, 2018

I think the common thread here is that the physical address size on these CPUs is only 36 bits. (The Core is pretty old, and it seems newer Atoms support VT-x but still have a small physical address size?) We'll have to constrain the virtual hole punching or at least provide a better error here.

@ultimoguerrero
Copy link

I'm having issues related to KVM as well too.

[root@istanbul ~]# docker --version
Docker version 18.03.1-ce, build 9ee9f40
[root@istanbul ~]# docker run --runtime=runsc hello-world
docker: Error response from daemon: OCI runtime create failed: /var/lib/docker/runtimes/runsc did not terminate sucessfully: unknown. 
ERRO[0000] error waiting for container: context canceled

VM info:

[root@istanbul ~]# cat /etc/redhat-release 
CentOS Linux release 7.4.1708 (Core)

Host system is Fedora Core 28 running Virt-Manager 1.5.1

CPU Info

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 71
model name	: Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz
stepping	: 1
microcode	: 0x1d
cpu MHz		: 822.526
cache size	: 6144 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap intel_pt xsaveopt ibpb ibrs stibp dtherm ida arat pln pts
bugs		: cpu_meltdown spectre_v1 spectre_v2
bogomips	: 5387.71
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

Any suggestions?

@amscanne
Copy link
Contributor

After chasing down the heap reservation semantics, it looks like they've changed recently:
golang/go@51ae88e#diff-10660d1f0eb047497573dadfb42bd1ec

This should either fix the issue, or may start-up fail with the message from here:
https://github.com/google/gvisor/blob/master/pkg/sentry/platform/kvm/physical_map.go#L127

Anyone able to try with a go runtime including that commit?

@iwankgb
Copy link

iwankgb commented May 12, 2018 via email

@danrabinowitz
Copy link

@iwankgb Any news?

@prattmic prattmic assigned amscanne and unassigned prattmic Jun 13, 2018
@plehdeen
Copy link

Thanks to @iwankgb , I test the kvm on another cpu, and it just work. Seems that gvisor just not support on my old machine with 36bits physical address.

@zhang2639
Copy link

zhang2639 commented Jul 9, 2018

I got some similar problems. On the bottom of #84

@amscanne
Copy link
Contributor

When the note Go commit is in a runtime release (Go 1.11?), this issue should be resolved for smaller physical address spaces.

@ianlewis
Copy link
Contributor

ianlewis commented Oct 9, 2018

@evanphx @iwankgb @ultimoguerrero @zhang2639 Since Go 1.11 is out can one of you verify that it's fixed for you?

@jshachm
Copy link

jshachm commented Dec 29, 2018

@amscanne @ultimoguerrero
Found that your cpu don't have the feature xsave
and will get SIGILL with xsetbv

^ ^

@ianlewis
Copy link
Contributor

@jshachm Can you file a separate issue?

@fvoznika fvoznika added the type: bug Something isn't working label Jan 11, 2019
@ianlewis ianlewis added the platform: kvm Issue related to the kvm platform label Jan 17, 2019
amscanne pushed a commit to amscanne/gvisor that referenced this issue May 6, 2020
Use cni v0.7.0 in the integration test.
Signed-off-by: Lantao Liu <lantaol@google.com>
@github-actions
Copy link

A friendly reminder that this issue had no activity for 120 days.

@github-actions github-actions bot added the stale-issue This issue has not been updated in 120 days. label Sep 15, 2023
Copy link

This issue has been closed due to lack of activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-closed platform: kvm Issue related to the kvm platform stale-issue This issue has not been updated in 120 days. type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests