Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect passing of e820 memory map to Linux kexec guests #72

Closed
Googulator opened this issue Feb 12, 2024 · 16 comments · Fixed by #74
Closed

Incorrect passing of e820 memory map to Linux kexec guests #72

Googulator opened this issue Feb 12, 2024 · 16 comments · Fixed by #74

Comments

@Googulator
Copy link
Contributor

The following 2 excerpts are from the same live-bootstrap qemu session.

Fiwix reported the memory map as follows:

memory    0x0000000000000000-0x000000000009fbff available
          0x000000000009fc00-0x000000000009ffff reserved
          0x00000000000f0000-0x00000000000fffff reserved
          0x0000000000100000-0x00000000bffdffff available
          0x00000000bffe0000-0x00000000bfffffff reserved
          0x00000000feffc000-0x00000000feffffff reserved
          0x00000000fffc0000-0x00000000ffffffff reserved
          0x0000000100000000-0x000000013fffffff available

Then, after kexec, Linux reports:

[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cfff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d000-0x000000000009efff] reserved
[    0.000000] BIOS-e820: [mem 0x000000000009f000-0x000000000009fbfe] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009fffe] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000ffffe] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdfffe] usable
[    0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bffffffe] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000fefffffe] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000fffffffe] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013ffffffe] usable

Two oddities are visible:

  1. All of the memory regions reported by Fiwix are missing one byte at the end, when re-reported by Linux.
  2. An additional reservation is visible in the range 9d000-9efff. This block, for some reason, doesn't have the off-by-1 ending seen in other blocks.
@mikaku
Copy link
Owner

mikaku commented Feb 13, 2024

I've run Fiwix and Linux on QEMU with the following same configuration:

qemu-system-i386 \
        -drive file=<floppy.img>,format=raw,if=floppy,index=0 \
        -boot a \
        -m 4G \
        -enable-kvm \
        -machine pc

Here are the results:

Screenshot from 2024-02-13 08-22-49

Screenshot from 2024-02-13 08-23-21

As you can see there is no difference in the memory output.
The boot loader used in the floppy was GRUB v1 (legacy) in both cases.

What was your QEMU configuration?
What was your Linux kernel version?

@Googulator
Copy link
Contributor Author

Googulator commented Feb 13, 2024

The problem is only seen if you first boot Fiwix, and then kexec from Fiwix to Linux. Kexec'd Linux will then see an incorrect memory map.

The issue was seen on both QEMU 6.2, and on bare metal.

@mikaku
Copy link
Owner

mikaku commented Feb 13, 2024

Can you, please, paste in here the Linux kernel binary you are using?

@Googulator
Copy link
Contributor Author

kernels.zip

This contains the Fiwix and Linux kernels, as well as the kexec loaders for them, as compiled in live-bootstrap. These were captured from one of my bare metal test machines.

@mikaku
Copy link
Owner

mikaku commented Feb 13, 2024

I don't have enough console back history to read the e820 lines as they are on top of the output of the Linux kernel.
Adding console /dev/ttyS0 in the kexec_cmdline= argument don't work either.
Are the serial devices enabled in this Linux kernel?

@Googulator
Copy link
Contributor Author

Googulator commented Feb 13, 2024

They are, but the syntax is slightly different in Linux than in Fiwix: console=ttyS0

@mikaku
Copy link
Owner

mikaku commented Feb 13, 2024

They are, but the syntax is slightly different in Linux than in Fiwix: console ttyS0

I'm completely unable to redirect the output to the serial line. I've used console=ttyS0 but it doesn't work.
I'm unable to reproduce your problem.

@Googulator
Copy link
Contributor Author

Easiest way to reproduce is probably using live-bootstrap. Apply this patch to rootfs.py to get a serial log:

diff --git a/rootfs.py b/rootfs.py
index c31d5a1..5d7df2c 100755
--- a/rootfs.py
+++ b/rootfs.py
@@ -282,6 +282,8 @@ print(shutil.which('chroot'))
             arg_list += [
                 '-machine', 'kernel-irqchip=split',
                 '-nic', 'user,ipv6=off,model=e1000',
+                '-chardev', 'socket,id=char0,port=45454,host=0.0.0.0,server=on,wait=on,telnet=on,logfile=serial.log',
+                '-serial', 'chardev:char0',
                 '-nographic'
             ]
             run(args.qemu_cmd, *arg_list)

then use "telnet localhost 45454" or equivalent to start receiving the log on screen (it won't start building until it sees a connection - unfortunately I haven't found a way to keep the normal stdio chardev, and still log to a file).

@mikaku
Copy link
Owner

mikaku commented Feb 14, 2024

I'm sorry, it doesn't work at all.

I only see the messages generated by kernel/kexec.c during the transition from Fiwix to Linux.
No Linux kernel boot messages are shown in the serial line, and the scroll is too fast and to big to have enough time and history to go back.

Here is the cmdline I use in Fiwix:

ro root=/dev/hda2 console=/dev/ttyS0 kexec_proto=linux kexec_size=8000 kexec_cmdline=\"ro root=/dev/sda2 console=ttyS0\"

I don't know why the Linux boot messages aren't redirected to the serial line.

@mikaku
Copy link
Owner

mikaku commented Feb 22, 2024

As commented on #bootstrapable IRC channel, I've finally built a simple enough Linux kernel that shows very few boot messages. This made me able to scroll back up to the top of the console history and see the memory map when the Linux kernel is kexec'ed from Fiwix.

This is the screen shot of such memory map:

Screenshot from 2024-02-21 21-16-15

All of the memory regions reported by Fiwix are missing one byte at the end, when re-reported by Linux.

Yes, the problem with all these addresses ending with ...ffe instead of ...fff is a bug in the bios_map_init() function. But I think that the fix is a bit simpler than your PR.

diff --git a/mm/bios_map.c b/mm/bios_map.c
index b1829bd..4d71295 100644
--- a/mm/bios_map.c
+++ b/mm/bios_map.c
@@ -100,7 +100,7 @@ void bios_map_init(struct multiboot_mmap_entry *bmmap_addr, unsigned int bmmap_l
 {
        struct multiboot_mmap_entry *bmmap;
        unsigned int from_high, from_low, to_high, to_low;
-       unsigned long long to;
+       unsigned long long to, to_orig;
        int n, type;
 
        bmmap = bmmap_addr;
@@ -112,6 +112,7 @@ void bios_map_init(struct multiboot_mmap_entry *bmmap_addr, unsigned int bmmap_l
                while((unsigned int)bmmap < (unsigned int)bmmap_addr + bmmap_length) {
                        from_high = (unsigned int)(bmmap->addr >> 32);
                        from_low = (unsigned int)(bmmap->addr & 0xFFFFFFFF);
+                       to_orig = (bmmap->addr + bmmap->len);
                        to = (bmmap->addr + bmmap->len) - 1;
                        to_high = (unsigned int)(to >> 32);
                        to_low = (unsigned int)(to & 0xFFFFFFFF);
@@ -124,6 +125,9 @@ void bios_map_init(struct multiboot_mmap_entry *bmmap_addr, unsigned int bmmap_l
                                to_low,
                                bios_mem_type[type]
                        );
+                       /* restore the original end address */
+                       to_high = (unsigned int)(to_orig >> 32);
+                       to_low = (unsigned int)(to_orig & 0xFFFFFFFF);
                        if(n < NR_BIOS_MM_ENT && bmmap->len) {
                                bios_mem_map[n].from = from_low;
                                bios_mem_map[n].from_hi = from_high;

After applying this patch, the memory map in the Fiwix kernel looks like this (same as before, nothing changed):

memory    0x0000000000000000-0x000000000009fbff available
          0x000000000009fc00-0x000000000009ffff reserved
          0x00000000000f0000-0x00000000000fffff reserved
          0x0000000000100000-0x000000000ffdffff available
          0x000000000ffe0000-0x000000000fffffff reserved
          0x00000000feffc000-0x00000000feffffff reserved
          0x00000000fffc0000-0x00000000ffffffff reserved
          0x000000000009d000-0x000000000009efff reserved

Then, after kexec, Linux reports:

Screenshot from 2024-02-22 18-59-58

An additional reservation is visible in the range 9d000-9efff. This block, for some reason, doesn't have the off-by-1 ending seen in other blocks.

This memory reservation is needed by the kexec implementation:

Fiwix/mm/memory.c

Lines 308 to 313 in 3a71a2f

#ifdef CONFIG_KEXEC
if(kexec_size > 0) {
bios_map_reserve(KEXEC_BOOT_ADDR, KEXEC_BOOT_ADDR + (PAGE_SIZE * 2));
_last_data_addr = map_kaddr(KEXEC_BOOT_ADDR, KEXEC_BOOT_ADDR + (PAGE_SIZE * 2), _last_data_addr, PAGE_PRESENT | PAGE_RW);
}
#endif /* CONFIG_KEXEC */

But you are right that this bug prevented this address to have the off-by-1 ending.

Looks like this is also fixed.
Can you, please, confirm that this patch fixes all these memory map problems?

Additionally, there is also a bug in kernel/kexec.c that affected the multiboot1 protocol:

diff --git a/kernel/kexec.c b/kernel/kexec.c
index c35a15d..aab5ada 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -247,7 +247,7 @@ void kexec_multiboot1(void)
 
        /* space reserved for the memory map structure */
        nmaps = 0;
-       while(bios_mem_map[nmaps].to) {
+       while(bios_mem_map[nmaps].type) {
                nmaps++;
        }
        esp -= sizeof(struct multiboot_mmap_entry) * nmaps;
@@ -259,7 +259,7 @@ void kexec_multiboot1(void)
                map->addr = map->addr << 32 | bios_mem_map[n].from;
                map->len = bios_mem_map[n].to_hi;
                map->len = map->len << 32 | bios_mem_map[n].to;
-               map->len -= map->addr - 1;
+               map->len -= map->addr;
                map->type = bios_mem_map[n].type;
                map++;
        }

@mikaku
Copy link
Owner

mikaku commented Feb 26, 2024

@Googulator, I've just pushed some changes in your PR #74 that merges your patch with mine.
Can you, please, check the final patch and see if you get the same results?

@mikaku
Copy link
Owner

mikaku commented Mar 12, 2024

@Googulator, did you have an opportunity to check the final patch?

@Googulator
Copy link
Contributor Author

Not yet, will do in the next few days.

@mikaku
Copy link
Owner

mikaku commented Apr 25, 2024

@Googulator, did you find time to check the final patch?
I have some modifications to push but I need to close this before.

@Googulator
Copy link
Contributor Author

With the current code in #74:

memory    0x0000000000000000-0x000000000009fbff available
          0x000000000009fc00-0x000000000009ffff reserved
          0x00000000000f0000-0x00000000000fffff reserved
          0x0000000000100000-0x00000000bffdffff available
          0x00000000bffe0000-0x00000000bfffffff reserved
          0x00000000feffc000-0x00000000feffffff reserved
          0x00000000fffc0000-0x00000000ffffffff reserved
          0x0000000100000000-0x000000013fffffff available
WARNING: detected a total of 3071MB of available memory below 4GB.
WARNING: only up to 2GB of physical memory will be used.
memory    0x000000000009d000-0x000000000009efff available -> reserved
...
kexec_linux: jumping to linux_trampoline() ...
[    0.000000] Linux version 4.14.341-openela_1 (@) (gcc version 4.0.4) #1 SMP PREEMPT @0
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable

Looks to be fixed.

@mikaku
Copy link
Owner

mikaku commented Apr 30, 2024

Excellent, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants