-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mcount: Allow full-dynamic tracing to instrument unsupported functions on x86_64 (w/ capstone) #870
base: master
Are you sure you want to change the base?
Conversation
…s on x86_64 (w/ capstone) Some functions can't be instrumented by full-dynamic tracing, for safety reasons. Functions that jump to a prologue that has been modified may cause an undefined behavior. To be able to instrument these functions, one can embed an illegal instruction (for example "int3") in the head of every instruction that has been moved. This way, if a thread branch to the function prologue, it will step on the illegal instruction and a handler will be called by the kernel to redirect the thread to the original instruction. Afterwards, the thread will jump back to the function and resume its execution normally. Instrumenting unsupported functions process is similar to full-dynamic tracing process excepts some differences: 1. Create a constraint based on the position of the "int3" in the offset of the call instruction. 2. Store the instructions located at the prolog of the function, if patchable. 3. Find and allocate a free address that respect the constraint created previously. 4. Store the position of "int3" and the address of the original instruction. 5. instrument the 'relative address call instruction' to 'prolog of the function' to call the trampoline. The execution flow is similar to the full-dynamic tracing except when a thread branch to a prologue and step on an "int3": 1. The trap handler is called. 2. The address from where the trap handler was raised is computed. 3. If the trap handler was raised from an address related to our tracepoint, the thread is redirected to the original instruction. Else, the original handler (set by the user) is called and the execution is resumed. Signed-off-by: Anas Balboul <anasbalbo@gmail.com>
Full dynamic tracing is only enabled when libcapstone is available. No need to do the test when the lib is missing. Signed-off-by: Anas Balboul <anasbalbo@gmail.com>
The added function branch twice to its prologue. It could be used to test dynamic_full tracing of unsupported functions. Signed-off-by: Anas Balboul <anasbalbo@gmail.com>
Hello @honggyukim. Thank you for you reply. I understand that it may take some time. |
@AnsBal an interesting approach. So IIUC it changes offset of the call instruction to have |
HI @namhyung, Thank you for the reply !
Yes, that's right.
I thought about it too. But the thing with patching the call-site is that you can't patch all of them. Some are too small (insn size) to be patched by an alternative call-site that can reach the original instruction. Besides that, sometimes, it's difficult to find the destination of a call-site (in the case of indirect branches). This technique covers all this case and "in theory" has more success rate. Another alternative could be using "int3" at the function entry just like kprobes/uprobes/gdb do. But it adds the overhead of dispatching the trap handler every time the function is called. |
Yeah, I agree that patching call-sites cannot catch indirect jumps. But it'd be possible to handle direct jumps only. I don't understand what you said about the size. I think we can patch the first byte to The downside I see in your approach is that it will spread trampolines for each function based on the instruction pattern. While some of them might be shared, this will increase the number of mmaps and it can reject real use of mmap in the target process later. That's why I tried to find trampoline location in the same text mapping.
Yes, this is the safe and slow approach. Maybe we can use it only for indirect jump cases (assuming it's rare). |
What I tried to say is that some call-sites are too small to be patched by an instruction that can reach the original one. If we assume that the size of function will, most of the time, be large enough for the compiler or the programmer to use a short jump (or other short branch instructions), then we won't be able to patch them because of their size.
Indeed, it's consuming the mapping address space of the target process. I'm wondering if we are consuming too much, since we are using this it only for a small part of the functions that we failed to patch. Edit: In the worst case we need one page for each function. 42 * 4096 = 172kb for uftrace and 77kb for uftrace. Knowing that the size of the user-space virtual memory is 128 TB, the worst case is not that bad.
It could be used to patch indirect jumps cases as well as the case where we can't patch a call-site, because the optimization worth trying to patch the call-site. |
Oh, I thought adding
Thanks for the numbers. It's good to see that there're not many. Did you try to patch all functions in the libraries as well? In general, we cannot predict how much it is for each binary and for possible compiler changes. Also there's a limit of number of mappings ( |
It may be an option but I think that 'int3' in the call-site may downgrade significantly the performance if it's a loop instead of a single branch to the prologue.
Only function in static object has been patched. How can I patch dynamic objects with dynamic tracing ?
Oh, the default limit is too small. It's true that patching the entry with an 'int3' directly doesn't need to map a memory area for each patched function, but its still downgrade the performance. I think of it as a time space trade-off. |
I don't follow. I think the effect is same since your approach also needs to hit
Are you talking about the DSOs? The commit 1147486 added library support. As it uses prefix matching, you may add |
Some functions can't be instrumented by full-dynamic tracing, for safety reasons. Functions that jump to a prologue that has been modified may cause an undefined behavior. To be able to instrument these functions, one can embed an illegal instruction (for example "int3") in the head of every instruction that has been moved. This way, if a thread branch to the function prologue, it will step on the illegal instruction and a handler will be called by the kernel to redirect the thread to the original instruction. Afterwards, the thread will jump back to the function and resume its execution normally.
Instrumenting unsupported functions process is similar to full-dynamic tracing process excepts some differences:
The execution flow is similar to the full-dynamic tracing except when a thread branch to a prologue and step on an "int3":
resumed.
Some Drawbacks and limitations: