System freezes after loading kvm module #1

Closed
jasonbking opened this Issue Aug 27, 2011 · 6 comments

Projects

None yet

3 participants

@jasonbking

CPU is corei5 2400 (sandy bridge)

One time prior to a freeze, I did see 'kvm: NOTICE: unhanded wrmsr: 0x0 data 3000000018' on the console. However have not seen that since. Tried setting a bp in kvm_set_msr_common, and it appears to not be reached in subsequent lockups.

Disabling kvm leaves the system stable, doing an rem_drv kvm; add_drv kvm causes it to lockup shortly thereafter.

This is on a stock illumos debug build (source as of 8/26).

Also experienced similar issues w/ smartos live (though was never able to narrow it down).

@bcantrill
Member

Interesting. What guest? (Or does it hang without any guest at all?) Do you have a dump? And can you do this on the running system:

echo "vmcs_config::print" | mdb -k
@jasonbking

No guests running -- just a regular boot, doesn't generate a dump, cannot drop to kmdb, tried "dtrace -wn 'tick-1m { panic(); }".

If I boot with -B disable-kvm=true, things are stable.. however when I 'rem_drv kvm; add_drv kvm' it freezes shortly thereafter (just like when I boot the BE normally) and I cannot drop to kmdb (this is also a DEBUG kernel)

So due to all of that, I set a breakpoint in setup_vmcs_config, and the output is immediately before it returns (hopefully this is sufficient, if not, let me know another point that would be more useful to return the value):

{
size = 0x400
order = 0
revision_id = 0x10
pin_based_exec_ctrl = 0x3f
cpu_based_exec_ctrl = 0xb6a065fa
cpu_based_2nd_exec_ctrl = 0xeb
vmexit_ctrl = 0xf6fff
vmentry_ctrl = 0x51ff
}

@jasonbking

Additional data points: set breakpoints on kvmkvm_{open,close,ioctl,devmap,segmap}. None are being hit prior to the system locking up. Also set a bp on kvmkvm_attach, that succeeds without any issue.

@jasonbking

.. and it appears during the boot to be trying to unload the kvm module. setting a bp on kvm_detach gets triggered.

I stepped over each instruction, and after kvm_arch_hardware_unsetup is called, (or perhaps during), kmdb reports 'single-step stop on miscellaneous trap' and pc is within xc_serv. ::stack shows it's called as xc_serv(0, 0). Doing :c drops it back into xc_serv with the same message, after doing this several times, it drops back into the OS.

At this point, the system no longer locks up. (Uneducated guess) is the lockup perhaps a nasty interrupt deadlock triggered by kvm_arch_hardware_unsetup?

@rmustacc
Member
rmustacc commented Nov 3, 2011

We finally have a box on hand to test this against. Our investigation shows that while the kvm driver is inducing it, there is a problem much deeper in the system. Basically the act of taking a spin lock in cross call context can lead to the behavior you're seeing. As a work around, on a sandy bridge system, consider setting apix_enable=0 in /etc/system or via mdb -kd. The issue is likely in the apix module which was taken in a not quite refined state when the source closed. We're going to be doing further work to determine what's going on there, but it'll be some time before we get there.

@rmustacc
Member

This has been resolved in illumos-joyent. See joyent/illumos-joyent@4d86fb7 for the fix.

@rmustacc rmustacc closed this Nov 14, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment