corrupt kernel image read in stage 2 #552

wjhun · 2019-02-21T17:51:44Z

This is similar to the first crash found in #541, seen elsewhere as well (here I am just attempting to run webg - the bug is in very early init and doesn't appear to be related to the program). While parsing the kernel elf symbol table in stage3 (which is read in stage2 but temporarily mapped in stage3 to ingest the symbols), an invalid symbol name offset is followed past the end of the kernel image buffer. I verified that this invalid offset is in place - and doesn't match the file contents - in stage2 in kernel_read_complete().

This is kind of a heisenbug; if you change the kernel code by a line or two, it vanishes. It is apparently affected by the size and alignment of the elf sections. I can't give instructions to reproduce, and I just happen to have a tree that is reproducing it ... though not reliably depending on the build or added debug code.

Of course another issue here is that the elf parsing needs to be hardened. But I'd like to fix the root of the problem before putting protective measures in place.

0000000200164fe0 [debug output - this is the address of the corrupted word in the mapped kernel image...note that the previous several words before it match the kernel image, as well as 0x164ff0 and 0x164ff8 ... only 0x164fe0 and 0x164fe8 differ]

no fault handler
Page fault
interrupt: 000000000000000e
frame: 000000007fa70000
error code: 0000000000000000
address: 000000020016a6f8
rax: 0000000000004200
rbx: 0000000200164fe0
rcx: 0000000000000000
rdx: 0000000000000043
rsi: 000000007f000000
rdi: 000000020016a6f8
rbp: 000000010001fed0
rsp: 000000010001fec8
r8: 0000000000000000
r9: 000000007f000385    (move_gdt + 0000000000000000/0000000000000009)
r10: 000000007f000385   (move_gdt + 0000000000000000/0000000000000009)
r11: 0000000000000010
r12: 0000000200169f08
r13: 000000010001ff50
r14: 0000000000005340
r15: 0000000100a001c0
rip: 000000007f01e1fe
flags: 0000000000000006
stack trace:
000000007f01c070
000000007f01e47c
000000007f01db04        (read_kernel_syms + 000000000000009d/00000000000000d7)
000000007f01ddc5        (init_service_new_stack + 0000000000000072/000000000000016a)
000000007f01e157
000000007f000384        (_start + 0000000000000005/0000000000000006)
Makefile:73: recipe for target 'run-nokvm' failed
make: [run-nokvm] Error 1 (ignored)

The problem doesn't appear to be in read_sectors(), at first glance anyway, since the words affected are in the middle of a copy. I kind of doubt it's a tfs issue, since the file only has two extents, this is smack in the middle of one, and everything is sector aligned. Also, a dump of the fs image produces a kernel file that matches the original.

The text was updated successfully, but these errors were encountered:

wjhun · 2019-02-22T06:00:08Z

It appears as if this the culprit is related to the real mode sector read in stage2. Apply the following patch to the latest master and - whether or not the crash occurs - observe that this specific address changes, always to the same value, after a call to bios_read_sectors:

diff --git a/boot/stage2.c b/boot/stage2.c
index fa6e06e..168f5e3 100644
--- a/boot/stage2.c
+++ b/boot/stage2.c
@@ -42,7 +42,13 @@ static void read_sectors(char *dest, u64 start_sector, u64 nsectors)
 {
     while (nsectors > 0) {
         int read_sectors = MIN(nsectors, SCRATCH_LEN >> SECTOR_OFFSET);
+        console("pre ");
+        print_u64(*(u64*)0x7ffd1fe0);
+        console("\n");
        bios_read_sectors(start_sector, read_sectors);
+        console("post ");
+        print_u64(*(u64*)0x7ffd1fe0);
+        console("\n");
        runtime_memcpy(dest, pointer_from_u64(SCRATCH_BASE), read_sectors << SECTOR_OFFSET);
        dest += read_sectors << SECTOR_OFFSET;
        start_sector += read_sectors;

see how the location always changes to 0x4200:

/share/src/nanovms/qemu/x86_64-softmmu/qemu-system-x86_64 -display none -serial stdio -m 2G -device isa-debug-exit -no-reboot -drive file=/share/src/nanovms/master00/output/image/image,format=raw,if=virtio -device virtio-net,netdev=n0 -ne\
tdev user,id=n0,hostfwd=tcp::8080-:8080,hostfwd=tcp::9090-:9090,hostfwd=udp::5309-:5309
pre 0000000000000000
post 0000000000004200
pre 0000000000004200
post 0000000000004200
pre 0000000000004200
post 0000000000004200
[...]
pre 0001001200002e98
post 0000000000004200
assigned: 10.0.2.15

That last one was the correct word from the kernel image being clobbered. This is what was leading the elf parsing off into the weeds.

Using a watchpoint in gdb shows the address being modified - presumably because it's in esp!

(gdb) c
Continuing.

Hardware watchpoint 2: *(0x7ffd1fe0)

Old value = 10890
New value = 16896
0x0000d562 in ?? ()
(gdb) i r
eax            0x4200   16896
ecx            0xfffffffe       -2
edx            0x80     128
ebx            0xde1c   56860
esp            0x7ffd1fe0       0x7ffd1fe0
ebp            0x7ffdfac4       0x7ffdfac4
esi            0x8062   32866
edi            0x7ffd3000       2147299328
eip            0xd562   0xd562
eflags         0x46     [ PF ZF ]
cs             0xf000   61440
ss             0x0      0
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb) bt
#0  0x0000d562 in ?? ()
#1  0x00008954 in ?? ()
#2  0x0000c277 in ?? ()
#3  0x0000a320 in ?? ()
#4  0x0000c406 in ?? ()
#5  0x0000c56a in ?? ()
#6  0x000087cf in ?? ()
#7  0x00008852 in ?? ()
#8  0x0000bef4 in ?? ()
#9  0x0000d2c8 in ?? ()
#10 0x0000d2e7 in ?? ()
#11 0x00008ac1 in ?? ()
#12 0x0000d36b in ?? ()
#13 0x0000d3fe in ?? ()
#14 0x0000cdb4 in ?? ()
#15 0x00008c74 in ?? ()
#16 0x00008db8 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

0xd562 is off in the weeds, but after entering bios_read_sectors we are changing modes and stacks, and I'm not sure if gdb is keeping up with this (though if I change to arch i8086 it basically shows the same regs and backtrace).

My hunch is that somehow, in the process of switching to real mode, the high 16 bits of esp (stage2 stack) are merged with REAL_SP (0x1ff0), leading to the corruption as the stack is used while in real mode. (Note that the half word at 0x7ffd1fe8 is also corrupted.) But I'm not totally clear on why this is if SS is cleared; I would think the high word of ESP would only apply in protected mode.

If I add a 'xor esp, esp' before the jump to real mode, the corruption goes away. I'm not sure if this is the right fix though. I will PR for further discussion.

mkhon · 2019-02-22T06:32:53Z

Will, I will take it from here Max

…

On Fri, 22 Feb 2019 at 14:00, Will Jhun ***@***.***> wrote: It appears as if this the culprit is related to the real mode sector read in stage2. Apply the following patch to the latest master and - whether or not the crash occurs - observe that this specific address changes, always to the same value, after a call to bios_read_sectors: diff --git a/boot/stage2.c b/boot/stage2.c index fa6e06e..168f5e3 100644 --- a/boot/stage2.c +++ b/boot/stage2.c @@ -42,7 +42,13 @@ static void read_sectors(char *dest, u64 start_sector, u64 nsectors) { while (nsectors > 0) { int read_sectors = MIN(nsectors, SCRATCH_LEN >> SECTOR_OFFSET); + console("pre "); + print_u64(*(u64*)0x7ffd1fe0); + console("\n"); bios_read_sectors(start_sector, read_sectors); + console("post "); + print_u64(*(u64*)0x7ffd1fe0); + console("\n"); runtime_memcpy(dest, pointer_from_u64(SCRATCH_BASE), read_sectors << SECTOR_OFFSET); dest += read_sectors << SECTOR_OFFSET; start_sector += read_sectors; see how the location always changes to 0x4200: /share/src/nanovms/qemu/x86_64-softmmu/qemu-system-x86_64 -display none -serial stdio -m 2G -device isa-debug-exit -no-reboot -drive file=/share/src/nanovms/master00/output/image/image,format=raw,if=virtio -device virtio-net,netdev=n0 -ne\ tdev user,id=n0,hostfwd=tcp::8080-:8080,hostfwd=tcp::9090-:9090,hostfwd=udp::5309-:5309 pre 0000000000000000 post 0000000000004200 pre 0000000000004200 post 0000000000004200 pre 0000000000004200 post 0000000000004200 [...] pre 0001001200002e98 post 0000000000004200 assigned: 10.0.2.15 That last one was the correct word from the kernel image being clobbered. This is what was leading the elf parsing off into the weeds. Using a watchpoint in gdb shows the address being modified - presumably because it's in esp! (gdb) c Continuing. Hardware watchpoint 2: *(0x7ffd1fe0) Old value = 10890 New value = 16896 0x0000d562 in ?? () (gdb) i r eax 0x4200 16896 ecx 0xfffffffe -2 edx 0x80 128 ebx 0xde1c 56860 esp 0x7ffd1fe0 0x7ffd1fe0 ebp 0x7ffdfac4 0x7ffdfac4 esi 0x8062 32866 edi 0x7ffd3000 2147299328 eip 0xd562 0xd562 eflags 0x46 [ PF ZF ] cs 0xf000 61440 ss 0x0 0 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) bt #0 0x0000d562 in ?? () #1 0x00008954 in ?? () #2 0x0000c277 in ?? () #3 0x0000a320 in ?? () #4 0x0000c406 in ?? () #5 0x0000c56a in ?? () #6 0x000087cf in ?? () #7 0x00008852 in ?? () #8 0x0000bef4 in ?? () #9 0x0000d2c8 in ?? () #10 0x0000d2e7 in ?? () #11 0x00008ac1 in ?? () #12 0x0000d36b in ?? () #13 0x0000d3fe in ?? () #14 0x0000cdb4 in ?? () #15 0x00008c74 in ?? () #16 0x00008db8 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) 0xd562 is off in the weeds, but after entering bios_read_sectors we are changing modes and stacks, and I'm not sure if gdb is keeping up with this (though if I change to arch i8086 it basically shows the same regs and backtrace). My hunch is that somehow, in the process of switching to real mode, the high 16 bits of esp (stage2 stack) are merged with REAL_SP (0x1ff0), leading to the corruption as the stack is used while in real mode. (Note that the half word at 0x7ffd1fe8 is also corrupted.) But I'm not totally clear on why this is if SS is cleared; I would think the high word of ESP would only apply in protected mode. If I add a 'xor esp, esp' before the jump to real mode, the corruption goes away. I'm not sure if this is the right fix though. I will PR for further discussion. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#552 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC4fneJLQc9CilYMyxsMrh8QVogN24K1ks5vP4d6gaJpZM4bH_f4> .

mkhon · 2019-02-23T05:22:07Z

I found the root cause - stack is still 32-bit in real-mode because we don't re-load segment registers in prot16 mode with data16 descriptor.

wjhun self-assigned this Feb 21, 2019

wjhun added a commit that referenced this issue Feb 22, 2019

clear esp before switching to real mode - #552

deabf61

wjhun mentioned this issue Feb 22, 2019

clear esp before switching to real mode #553

Merged

mkhon closed this as completed Feb 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corrupt kernel image read in stage 2 #552

corrupt kernel image read in stage 2 #552

wjhun commented Feb 21, 2019

wjhun commented Feb 22, 2019

mkhon commented Feb 22, 2019 via email

mkhon commented Feb 23, 2019

corrupt kernel image read in stage 2 #552

corrupt kernel image read in stage 2 #552

Comments

wjhun commented Feb 21, 2019

wjhun commented Feb 22, 2019

mkhon commented Feb 22, 2019 via email

mkhon commented Feb 23, 2019