Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel.stext integrity check failed due to an error in handling the p_arch_jump_label_transform_entry address on ARM64 #269

Open
root-hardenedvault opened this issue May 14, 2023 · 12 comments

Comments

@root-hardenedvault
Copy link

Reproduction steps:

  1. Build LKRG & load the LKM
  2. Reboot and then the kernel will panic:

Hardware: Rasperry Pi 4
OS: Ubuntu 22.04

[  302.646797] VED: ALERT: DETECT: Kernel: _stext hash changed unexpectedly
[  302.660354] VED: ALERT: DETECT: Kernel: 1 checksums changed unexpectedly
[  302.667154] VED: ALERT: BLOCK: Kernel: 1 checksums changed unexpectedly
[  302.673863] Kernel panic - not syncing: Kernel: 1 checksums changed unexpectedly
[  302.681366] CPU: 2 PID: 484 Comm: kworker/u8:6 Tainted: G         C OE     5.15.0-1027-raspi #29-Ubuntu
[  302.690899] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[  302.696816] Workqueue: events_unbound p_check_integrity [ved]
[  302.702698] Call trace:
[  302.705173]  dump_backtrace+0x0/0x200
[  302.708890]  show_stack+0x20/0x30
[  302.712250]  dump_stack_lvl+0x8c/0xb8
[  302.715965]  dump_stack+0x18/0x34
[  302.719324]  panic+0x1e4/0x3e4
[  302.722420]  p_check_integrity+0x1370/0x18d4 [ved]
[  302.727316]  process_one_work+0x204/0x4e0
[  302.731385]  worker_thread+0x144/0x490
[  302.735187]  kthread+0x128/0x134
[  302.738457]  ret_from_fork+0x10/0x20
[  302.742086] SMP: stopping secondary CPUs
[  302.746067] Kernel Offset: 0x5a489ca00000 from 0xffff800008000000
[  302.752245] PHYS_OFFSET: 0xffffadd980000000
[  302.756482] CPU features: 0x800804f1,00000846
[  302.760899] Memory Limit: none
[  302.763996] ---[ end Kernel panic - not syncing: Kernel: 1 checksums changed unexpectedly ]---

We noticed that in p_arch_jump_label_transform_entry, the segment containing the destination address is parsed and the corresponding segment is updated in p_arch_jump_label_transform_ret based on the parsing result. However, parsing errors may have caused the hash of kernel.stext that needs to be updated but missed in p_arch_jump_label_transform_ret. The workaround is that adding an update operations in two places and it seem worked!

diff --git a/src/modules/database/JUMP_LABEL/p_arch_jump_label_transform/p_arch_jump_label_transform.c b/src/modules/database/JUMP_LABEL/p_arch_jump_label_transform/p_arch_jump_label_transform.c
index 12d0ac9..0e75432 100644
--- a/src/modules/database/JUMP_LABEL/p_arch_jump_label_transform/p_arch_jump_label_transform.c
+++ b/src/modules/database/JUMP_LABEL/p_arch_jump_label_transform/p_arch_jump_label_transform.c
@@ -121,10 +121,10 @@ notrace int p_arch_jump_label_transform_ret(struct kretprobe_instance *ri, struc
          break;
 
       case P_JUMP_LABEL_MODULE_TEXT:
-
+#if defined(CONFIG_ARM64)
+         p_db.kernel_stext.p_hash = p_lkrg_fast_hash((unsigned char *)p_db.kernel_stext.p_addr,
+                                                (unsigned int)p_db.kernel_stext.p_size);
-
+#endif
          for (p_tmp = 0; p_tmp < p_db.p_module_list_nr; p_tmp++) {
             if (p_db.p_module_list_array[p_tmp].p_mod == p_db.p_jump_label.p_mod) {
                /*
@@ -186,8 +186,10 @@ notrace int p_arch_jump_label_transform_ret(struct kretprobe_instance *ri, struc
           * FTRACE might generate dynamic trampoline which is not part of .text section.
           * This is not abnormal situation anymore.
           */
+#if defined(CONFIG_ARM64)
+         p_db.kernel_stext.p_hash = p_lkrg_fast_hash((unsigned char *)p_db.kernel_stext.p_addr,
+                                                (unsigned int)p_db.kernel_stext.p_size);
+#endif
          break;
    }
@solardiz
Copy link
Contributor

We had (a different fork of) recent LKRG run on Ubuntu 22.04.1 with kernel 5.15.0-1019-aws #23-Ubuntu SMP Wed Aug 17 18:35:04 UTC 2022 aarch64 aarch64 in AWS instance type c6g.medium for 4+ months with no such issue showing up. However, that instance has only 1 vCPU, so perhaps the issue is a race condition showing up on multi-[v]CPU systems.

@Adam-pi3
Copy link
Collaborator

However, parsing errors may have caused the hash of kernel.stext that needs to be updated but missed in p_arch_jump_label_transform_ret

What parsing error are you referring to? If something is incorrectly read, likely you see different memory layout than LKRG which may result in such type of the problems.

Additionally, LKRG synchronize with JUMP_LABEL using various locks which means it is impossible for integrity routine to not see the result of JUMP_LABEL work. It looks like you might hit some issue which is not root-cause and the patch is masking the real problem. Did you try to run under very verbose level to see what JUMP_LABEL really does?

@solardiz
Copy link
Contributor

the patch is masking the real problem.

Sure, which I assume is @root-hardenedvault's understanding too, which is why he calls this a "workaround" and doesn't send us a PR with these changes right away. Ideally, we'd figure out the real problem and arrive at a proper fix.

@root-hardenedvault
Copy link
Author

It appears that the issue is caused by a race condition. LKRG does not require any lock to be held when accessing p_db.p_jump_label.state. The panic consistently occurs during the process of updating the core text hash in arch_jump_label_transform_ret. We have also observed that p_db.p_jump_label.state is set to 1 (P_JUMP_LABEL_CORE_TEXT) when the integrity_timer calculates and compares the core text hash. It's likely that LKRG may update the core text hash while checking if it has been changed, which could lead to the race condition. Is there a mechanism in LKRG to avoid this situation? However, this cannot explain why the above patch works, since those updates would not be executed. Another scenario can trigger the panic (the similar kernel logs) is when the nftables work as a systemd service at boot time.

@Adam-pi3
Copy link
Collaborator

Function arch_jump_label_transform is called under JUMP_LABEL lock. When LKRG intercept the call, it is also running under JUMP_LABEL lock and we do synchronize against it. Integrity verification routine won't run before acquiring this lock:
https://github.com/lkrg-org/lkrg/blob/main/src/modules/database/p_database.h#L192

If LKRG has this lock acquired, JUMP_LABEL engine won't modify .text section. I don't think it's a correct root-cause.

@accelbread
Copy link

I'm also seeing this issue, also on a Raspberry Pi 4. It occurs consistently, a few seconds after my system makes it to the login prompt.

Adam-pi3 added a commit to Adam-pi3/lkrg-work that referenced this issue Oct 25, 2023
The reported problem with integrity verification on ARM64 (lkrg-org#269)
is a result of a very tight race condition with tracepoints.
Changes which simplify synchronization with JUMP_LABEL engine:
f98da1b
affected differently ARM64 platform which made such race possible.
However, potentially the same race problem may exist on x86 and
this commit fixes it and should address lkrg-org#269
solardiz pushed a commit that referenced this issue Oct 25, 2023
The reported problem with integrity verification on ARM64 (#269)
is a result of a very tight race condition with tracepoints.
Changes which simplify synchronization with JUMP_LABEL engine:
f98da1b
affected differently ARM64 platform which made such race possible.
However, potentially the same race problem may exist on x86 and
this commit fixes it and should address #269
@solardiz
Copy link
Contributor

@root-hardenedvault @accelbread We think we've just fixed this issue with #294 here - can you please test and let us know? Thank you!

@accelbread
Copy link

I'll give it a test over the weekend, thanks!

@accelbread
Copy link

Unfortunately, this does not fix the issue for me :(

@Adam-pi3
Copy link
Collaborator

Adam-pi3 commented Oct 31, 2023

@accelbread can you provide some details about the problem? What is the kernel version, How easy is to repro it? Can you recompile the LKRG with P_LKRG_JUMP_LABEL_STEXT_DEBUG, enable log_level=3 and show the logs?

btw. I heavily tested Ubuntu 23.10 under the kernel 6.5.0-1005-raspi and the issue is not there. If you have an opportunity to check the same OS/kernel it would be helpful

@accelbread
Copy link

accelbread commented Nov 1, 2023

I am on 6.1.57-hardened1 on NixOS. I have LKRG built into the kernel.

It is easy to reproduce. If I have default settings, a few seconds after boot, the device restarts. If I boot with "lkrg.kint_validate=1", the device does not restart a few seconds after boot, and runs fine.

I can recompile and retest later with debug and logs, and get back. Seems 6.5.9 kernel is available too now so will upgrade first.

I could also produce a minimal reproducing sd-card image if you'd like.

@citypw
Copy link

citypw commented May 13, 2024

Reproduction steps:

  • Build LKRG (c6654b1) & load the LKM
  • Run for a few hours

Hardware: Rasperry Pi 4
OS: Raspberry Pi OS
Kernel: 6.6.28+rpt-rpi-v8

[ 76.633929] LKRG: ALERT: DETECT: Kernel: _stext hash changed unexpectedly
[ 76.646008] LKRG: ALERT: DETECT: Kernel: Module hash changed unexpectedly, name ipv6
[ 76.653862] LKRG: ALERT: DETECT: Kernel: Module list hash changed unexpectedly
[ 76.661198] LKRG: ALERT: DETECT: Kernel: Module KOBJ list hash changed unexpectedly
[ 76.668959] LKRG: ALERT: DETECT: Kernel: Module KOBJ hash changed unexpectedly, name ipv6
[ 76.677271] LKRG: ALERT: DETECT: Kernel: 5 checksums changed unexpectedly
[ 76.684152] LKRG: ALERT: BLOCK: Kernel: 5 checksums changed unexpectedly
[ 76.690944] Kernel panic - not syncing: Kernel: 5 checksums changed unexpectedly
[ 76.698442] CPU: 2 PID: 38 Comm: kworker/u12:0 Tainted: G C O 6.6.28+rpt-rpi-v8 #1 Debian 1:6.6.28-1+rpt1
[ 76.709469] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 76.715380] Workqueue: events_unbound p_check_integrity [lkrg]
[ 76.721325] Call trace:
[ 76.723798] dump_backtrace+0xa0/0x100
[ 76.727600] show_stack+0x20/0x38
[ 76.730956] dump_stack_lvl+0x48/0x60
[ 76.734667] dump_stack+0x18/0x28
[ 76.738024] panic+0x330/0x398
[ 76.741118] p_check_integrity+0x1068/0x1900 [lkrg]
[ 76.746082] process_one_work+0x148/0x3b8
[ 76.750145] worker_thread+0x32c/0x450
[ 76.753942] kthread+0x11c/0x128
[ 76.757213] ret_from_fork+0x10/0x20
[ 76.760834] SMP: stopping secondary CPUs
[ 76.764813] Kernel Offset: 0x2c2c200000 from 0xffffffc080000000
[ 76.770811] PHYS_OFFSET: 0x0
[ 76.773724] CPU features: 0x0,80000201,3c020000,0000421b
[ 76.779105] Memory Limit: none
[ 76.782199] ---[ end Kernel panic - not syncing: Kernel: 5 checksums changed unexpectedly ]---

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants