New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion `t->regs().syscall_result_signed() == -syscall_state.expect_errno' failed to hold. #1577

Closed
ghost opened this Issue Nov 4, 2015 · 15 comments

Comments

Projects
None yet
1 participant
@ghost

ghost commented Nov 4, 2015

Seeing this assertion when attempting to debug VMware Workstation:

ALSA lib conf.c:3782:(snd_config_update_r) cannot access file /usr/share/alsa/alsa.conf
[FATAL /home/roc/rr/rr/src/record_syscall.cc:3001:rec_process_syscall_arch() errno: 0 'Success'](task 28326 %28rec:28326%29 at time 136608)
-> Assertion `t->regs().syscall_result_signed() == -syscall_state.expect_errno' failed to hold. Expected EINVAL for 'ioctl' but got result 0; Unknown ioctl(0x81785501): type:0x55 nr:0x1 dir:0x2 size:376 addr:0x7fffdfe99ac0

The Googles have told me this is the SNDRV_CTL_IOCTL_CARD_INFO ioctl from ALSA.

@rocallahan

This comment has been minimized.

Show comment
Hide comment
@rocallahan

rocallahan Nov 4, 2015

Member

Ugh.

Feel like figuring out the ALSA ioctls and adding them to prepare_ioctl in record_syscall.cc?

Member

rocallahan commented Nov 4, 2015

Ugh.

Feel like figuring out the ALSA ioctls and adding them to prepare_ioctl in record_syscall.cc?

@rocallahan

This comment has been minimized.

Show comment
Hide comment
@rocallahan

rocallahan Nov 4, 2015

Member

Actually, finding any documentation at all for these ioctls would be nice...

Member

rocallahan commented Nov 4, 2015

Actually, finding any documentation at all for these ioctls would be nice...

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Nov 4, 2015

I took a stab at this here: https://github.com/awalton/rr/commit/559804fe324d78183e193f6503eb9082050cea4e

I admit I have no idea if it's right, but it runs Workstation!

ghost commented Nov 4, 2015

I took a stab at this here: https://github.com/awalton/rr/commit/559804fe324d78183e193f6503eb9082050cea4e

I admit I have no idea if it's right, but it runs Workstation!

@rocallahan

This comment has been minimized.

Show comment
Hide comment
@rocallahan

rocallahan Nov 4, 2015

Member

I admit I have no idea if it's right, but it runs Workstation!

Woah really? Cool! ... Hey, any chance you could help with http://robert.ocallahan.org/2014/09/vmware-cpuid-conditional-branch.html ? :-)

I left a couple of minor comments in your commit. The only other thing this needs before landing is a test.

Thanks!!!

Member

rocallahan commented Nov 4, 2015

I admit I have no idea if it's right, but it runs Workstation!

Woah really? Cool! ... Hey, any chance you could help with http://robert.ocallahan.org/2014/09/vmware-cpuid-conditional-branch.html ? :-)

I left a couple of minor comments in your commit. The only other thing this needs before landing is a test.

Thanks!!!

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Nov 4, 2015

No problem. I filed an internal bug on VMware's bugzilla and forwarded the post on to the monitor team so hopefully either I or someone from that team will have something to tell you at some point.

Until then, I hit another issue around the iopl syscall being unimplemented and I'm poking around to see what it will take to get around that. I'll respin and post an updated patch later today.

ghost commented Nov 4, 2015

No problem. I filed an internal bug on VMware's bugzilla and forwarded the post on to the monitor team so hopefully either I or someone from that team will have something to tell you at some point.

Until then, I hit another issue around the iopl syscall being unimplemented and I'm poking around to see what it will take to get around that. I'll respin and post an updated patch later today.

@rocallahan

This comment has been minimized.

Show comment
Hide comment
@rocallahan

rocallahan Nov 4, 2015

Member

FWIW if you're calling iopl to enable use of x86 in instructions, rr is not going to work because we have no mechanism to record and replay the results of those instructions. (That could probably be fixed by making rr drop the iopl and then trap, emulate and record every in instruction, with significant overhead of course.)

Member

rocallahan commented Nov 4, 2015

FWIW if you're calling iopl to enable use of x86 in instructions, rr is not going to work because we have no mechanism to record and replay the results of those instructions. (That could probably be fixed by making rr drop the iopl and then trap, emulate and record every in instruction, with significant overhead of course.)

@rocallahan

This comment has been minimized.

Show comment
Hide comment
@rocallahan

rocallahan Nov 4, 2015

Member

There may be other exotic things that VMWare does that rr can't handle, e.g. ioctls that you use to communicate with your host kernel drivers. If you share memory between user-space and the kernel drivers that could create additional issues. I'm happy to help you work through them, just setting expectations :-).

Member

rocallahan commented Nov 4, 2015

There may be other exotic things that VMWare does that rr can't handle, e.g. ioctls that you use to communicate with your host kernel drivers. If you share memory between user-space and the kernel drivers that could create additional issues. I'm happy to help you work through them, just setting expectations :-).

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Nov 4, 2015

Sure, and I expect there are probably a bunch of gotchas. Right now I'm trying to hunt down a bug in the UI though, so a lot of those are likely avoidable. (In fact, I may not even need to implement iopl at all - I think I may be able to get around this for the time being.)

One of our internal team members has already commented on the monitor bug:

"Could you ask him to try this in the VM's .vmx file:

monitor_control.disable_hvsim_clusters = true
"

ghost commented Nov 4, 2015

Sure, and I expect there are probably a bunch of gotchas. Right now I'm trying to hunt down a bug in the UI though, so a lot of those are likely avoidable. (In fact, I may not even need to implement iopl at all - I think I may be able to get around this for the time being.)

One of our internal team members has already commented on the monitor bug:

"Could you ask him to try this in the VM's .vmx file:

monitor_control.disable_hvsim_clusters = true
"

@rocallahan

This comment has been minimized.

Show comment
Hide comment
@rocallahan

rocallahan Nov 4, 2015

Member

That fixes it! Thanks, at very least we now have a workaround!

Member

rocallahan commented Nov 4, 2015

That fixes it! Thanks, at very least we now have a workaround!

@rocallahan

This comment has been minimized.

Show comment
Hide comment
@rocallahan

rocallahan Nov 5, 2015

Member

@awalton is it OK for us to include that advice in rr's message when it detects the VMWare bug? Just want to make sure adding that setting won't cause any harm.

Member

rocallahan commented Nov 5, 2015

@awalton is it OK for us to include that advice in rr's message when it detects the VMWare bug? Just want to make sure adding that setting won't cause any harm.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Nov 5, 2015

I roundtripped it through our monitor team again and they said it's okay to use that setting and that it's likely the best workaround you'll get for the time being: the reason, as you correctly deduced, is to try to reduce hardware virtualization exits in order to improve performance (quite significantly), so the worst this setting does for you is slow things down a bit, which might make certain kinds of bugs harder to debug, but otherwise the impact should be pretty low. I think.

The advice they gave me to give to you is to make sure to read and cite their paper on this specific topic (http://dl.acm.org/citation.cfm?id=2342856) and to use the given workaround above. It was certainly educational for me - I work with VMs every day and even I didn't know some of the stuff they're doing!

I'll look into bumping the patch later this week - I got caught up actually debugging my problem in Workstation and forgot to rev the patch today.

ghost commented Nov 5, 2015

I roundtripped it through our monitor team again and they said it's okay to use that setting and that it's likely the best workaround you'll get for the time being: the reason, as you correctly deduced, is to try to reduce hardware virtualization exits in order to improve performance (quite significantly), so the worst this setting does for you is slow things down a bit, which might make certain kinds of bugs harder to debug, but otherwise the impact should be pretty low. I think.

The advice they gave me to give to you is to make sure to read and cite their paper on this specific topic (http://dl.acm.org/citation.cfm?id=2342856) and to use the given workaround above. It was certainly educational for me - I work with VMs every day and even I didn't know some of the stuff they're doing!

I'll look into bumping the patch later this week - I got caught up actually debugging my problem in Workstation and forgot to rev the patch today.

@rocallahan

This comment has been minimized.

Show comment
Hide comment
@rocallahan

rocallahan Nov 5, 2015

Member

Thanks a ton!

Member

rocallahan commented Nov 5, 2015

Thanks a ton!

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Nov 7, 2015

I updated my tree here:
https://github.com/awalton/rr/tree/alsa-fixes

However, when running the tests I ran into a whole slew of failed tests, including the one I just added, so it probably still needs more review:

10 - alsa_ioctl-no-syscallbuf (Failed)
418 - blocked_bad_ip-no-syscallbuf (Failed)
    ....(skip a few hundred lines)
    1360 - when-32-no-syscallbuf (Failed)

Probably the case of me doing something wrong, but I don't quite know the tool well enough to know what I've done...

ghost commented Nov 7, 2015

I updated my tree here:
https://github.com/awalton/rr/tree/alsa-fixes

However, when running the tests I ran into a whole slew of failed tests, including the one I just added, so it probably still needs more review:

10 - alsa_ioctl-no-syscallbuf (Failed)
418 - blocked_bad_ip-no-syscallbuf (Failed)
    ....(skip a few hundred lines)
    1360 - when-32-no-syscallbuf (Failed)

Probably the case of me doing something wrong, but I don't quite know the tool well enough to know what I've done...

@rocallahan

This comment has been minimized.

Show comment
Hide comment
@rocallahan

rocallahan Nov 7, 2015

Member

Interesting. I assume those failures occur without your patch? Would be worth investigating if you're interested.

I squashed your patches, reworked the test somewhat and merged: 89374ab

Member

rocallahan commented Nov 7, 2015

Interesting. I assume those failures occur without your patch? Would be worth investigating if you're interested.

I squashed your patches, reworked the test somewhat and merged: 89374ab

@rocallahan rocallahan closed this Nov 7, 2015

@rocallahan

This comment has been minimized.

Show comment
Hide comment
@rocallahan

rocallahan Nov 7, 2015

Member

And thanks!

Member

rocallahan commented Nov 7, 2015

And thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment