-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rr working with valgrind (or vice versa), as far as possible #16
Comments
As of valgrind 3.9.0, the valgrind CPU still reports itself as merom. The release notes claim to support AVX2 now, which is way up in haswell land. But crucially, AVX2 is only supported for x86-64. Sigh. The other sandy bridge \ merom features may be gated on x86-64 too. But we have another option: merom has a deterministic store-insn counter according to this paper. It would be pretty straightforward to add that support, but not sure it's worth it yet. |
Valgrind reports itself as sandy bridge on x86-64 machines that have AVX, but not on x86. Looks like x86 support is falling behind. |
#606 is a way to get valgrind working. |
On second thought, rr's "syscall injection" is technically self-modifying code, which voids valgrind's warranty. But I somewhat suspect (hope) that things will just work, and indeed that'll be a good test of whether rr is cleaning up after itself properly in tracee tasks. |
Yet another problem is that valgrind-instrumented code longjmp's out of some signal handlers, which rr doesn't know how to deal with. |
#1262 is YET ANOTHER annoying bug caused by uninitialized C++ object fields. We deal with enough really nasty bugs in rr that this class is just insulting! Will probably take a bit of time soon to retry valgrind'ing rr. CC @rocallahan |
This bug is about running valgrind tools on tracee code, right? Whereas for uninitialized fields in rr itself, you'd just need to run valgrind on rr itself, which should just work, I'd have thought... |
I'd like to have both. I agree it's worth treating those problems separately. valgrind doesn't work on even just rr out of the box, which is what I care about more, because per discussion above valgrind's emulated CPU identifies itself as Merom and rr barfs. ISTR force-overriding the detected microarch, but something broke. I'd like to try again. Now that I think about it, for the rr-only case, we can have it launch a subprocess to make a non-emulated cpuid call, and then users wouldn't even have to force their microarch. (Sorry Julian!) |
The reason is that libpfm does its own CPUIDs to decide on the encoding of the raw microarch-specific events that rr chooses based on its own CPUID. This isn't how abstractions are supposed to work. So we'll need #974 to get libpfm out of the way before we can work around the CPUID emulation. An interesting question is what valgrind does with perf_event_open attributes. If the syscall is passed through, then clients can observe CPU emulation by passing raw events that don't match the emulated microarch. But I doubt anyone cares. |
After #1274, here's the status
So with valgrind patched to support kcmp, I think we're good to go on the tracer-side checking, which is the more useful of the two. |
Filed a valgrind bug for kcmp support. |
The valgrind patch landed. The last to-do item is exec'ing tracees more quickly, in order to "detach" valgrind. Currently in replay, valgrind observes tid values change when the rr fork child switches from "real" execution mode to using emulated data from the trace. I think the easiest thing to do might be to add a "hidden" rr command, |
The problem referenced above may still exist, but we have another problem to deal with: the image after valgrind starts running rr doesn't have a vdso image mapped. I assume this is because valgrind's dynamic linker resolves references to vdso symbols to valgrind helpers. To work around this, we could include code to make traced syscalls in the rr image itself, at a known offset in the binary. Then, if we don't find the vdso, we can use the rr code. But that's pretty tricky a bit a delicate, so I don't think it's worth it at this point. |
Valgrind unmaps this for it's own nefarious purposes (rr-debugger#16). However, it is simple enough to provide our own syscall instruction to use instead.
Valgrind unmaps this for it's own nefarious purposes (rr-debugger#16). However, it is simple enough to provide our own syscall instruction to use instead.
Status update after #1796:
|
Valgrind unmaps this for it's own nefarious purposes (rr-debugger#16). However, it is simple enough to provide our own syscall instruction to use instead.
Valgrind unmaps this for it's own nefarious purposes (rr-debugger#16). However, it is simple enough to provide our own syscall instruction to use instead.
Do I understand correctly that running Basically, this use case is a massive speed boost for valgrind? |
No. If you do that, Valgrind will not check the application. It is possible to implement binary instrumentation of the replay, but the approach has to be a bit different. We've actually implemented this in a closed-source branch. And yes, we could use this to implement a memcheck-during-replay tool. Unfortunately we have to hold onto this code until we figure out a business model to make this work sustainable. |
I'm very interested in being able to apply some type of binary instrumentation during the replay. Any plans on open-sourcing the branch with that, or anyone else interested in looking into reimplementing something like this with me? |
For what it is worth, I believe that the combination of ASAN and MSAN is equivalent or superior to Valgrind’s default MemorySanitizer does involve some additional effort, because all linked code (except |
Yeah rr + the various sanitizers is pretty powerful, and I don't think there's a ton of need to ever deal with valgrind proper. Time to let this issue die. |
It appears that valgrind identifies itself as Merom, which apparently isn't supported yet. I'm testing with distro-supplied valgrind 3.7.0, so things may be different in later valgrind. If not, then we can either add Merom support to rr, or add later support for a newer architecture to valgrind.
The question of how well rr+valgrind could work is also interesting. Needs some thought. But even basic checking allowed me to diagnose #15.
The text was updated successfully, but these errors were encountered: