-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent crash on (un?)loading because of non-atomic ftrace #87
Comments
Hm... I understand that bug happens when LKRG was doing initialization. However, from the logs it looks completely unrelated to LKRG:
|
While the crash isn't in LKRG, maybe LKRG had done something that caused it. Also, I now think the "BUG: kernel NULL pointer dereference, address: 0000000000000000" diagnostic is wrong. We could want to see why the kernel would think so given these circumstances - if it's in fact wrong, this could be something to fix in the kernel. Further two lines are more likely correct:
The failing instruction, if the |
LKRG does hook |
Hm... Originally, I was thinking it might be related to the other problem (tracepoint) but I think you might be right that it is related to FTRACE. Yes, first instruction should be FTRACE dummy NOP replaced by optimized KRETPROBE hook. However, hooking should(is) atomic and kernel sends IPI to all cores to synchronize the new changes between the cores. |
I doubt it. There's this: * text_poke - Update instructions on a live kernel
* @addr: address to modify
* @opcode: source of the copy
* @len: length to copy
*
* Only atomic text poke/set should be allowed when not doing early patching.
* It means the size must be writable atomically and the address must be aligned
* in a way that permits an atomic write. It also makes sure we fit on a single
* page. Of course, a size of 5 doesn't meet "the size must be writable atomically", and in fact, if I read the code right, ftrace doesn't even appear to call the above function, instead calling: /**
* text_poke_early - Update instructions on a live kernel at boot time
* @addr: address to modify
* @opcode: source of the copy
* @len: length to copy
*
* When you use this code to patch more than one byte of an instruction
* you need to make sure that other CPUs cannot execute this code in parallel.
* Also no thread must be currently preempted in the middle of these
* instructions. And on the local CPU you need to be protected against NMI or
* MCE handlers seeing an inconsistent instruction while you patch.
*/
void __init_or_module text_poke_early(void *addr, const void *opcode,
size_t len)
{
unsigned long flags;
if (boot_cpu_has(X86_FEATURE_NX) &&
is_module_text_address((unsigned long)addr)) {
/*
* Modules text is marked initially as non-executable, so the
* code cannot be running and speculative code-fetches are
* prevented. Just change the code.
*/
memcpy(addr, opcode, len);
} else {
local_irq_save(flags);
memcpy(addr, opcode, len);
local_irq_restore(flags);
sync_core();
[...]
}
} As you can see, "you need to make sure that other CPUs cannot execute this code in parallel". I don't see ftrace doing that. ftrace does use a 5-byte I think that on a real CPU, we'd probably hit the problem when one hardware thread of a core installs (or removes) the ftrace hook and another thread on the same core runs the code being patched - then it's the same cache line, so even byte-size updates can propagate individually. On QEMU, we can probably hit the problem for any combination of logical CPUs, if it doesn't simulate the cache. |
Hm... from my understanding we don't go to
Before any modification in FTRACE happens, kernel must call:
which forces
when check at [1] fails we won't call it ( Regardless of that corner case, |
We got something similar in https://github.com/openwall/lkrg/pull/94/checks?check_run_id=2632803616
|
@Adam-pi3 I'm not sure I'm reading the code right, but it appears that |
One more: https://github.com/openwall/lkrg/runs/4485495918?check_suite_focus=true
|
As seen in https://github.com/openwall/lkrg/runs/2570683885
The text was updated successfully, but these errors were encountered: