Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AVX xsave/xstor to avoid workaround when checkpointing #434

Closed
abmerop opened this issue Oct 11, 2023 · 3 comments
Closed

Implement AVX xsave/xstor to avoid workaround when checkpointing #434

abmerop opened this issue Oct 11, 2023 · 3 comments
Labels
arch-x86 The X86 ISA bug cpu-kvm gem5's KVM CPU

Comments

@abmerop
Copy link
Member

abmerop commented Oct 11, 2023

Describe the bug
The AVX / YMM register state is not saved or restored in gem5 with the X86KvmCPU leading to crashes on checkpoint restoration when AVX is enabled in CPUID.

Affects version
develop @ 141b06d

gem5 Modifications
No modification

To Reproduce
This is easiest to reproduce using a full system GPU configuration as it enables AVX be default and supports checkpoint/restore. This requires the VEGA_X86 build. The application doesn't really matter here, so for the application one can simply use a blank shell script.

  1. scons build/VEGA_X86/gem5.opt -j
  2. touch hello.sh
  3. build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image x86-gpu-fs-20220512.img --kernel vmlinux-5.4.0-105-generic --gpu-mmio-trace gem5-resources/src/gpu-fs/vega_mmio.log --app hello.sh --checkpoint-dir hello_cpt
  4. build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image x86-gpu-fs-20220512.img --kernel vmlinux-5.4.0-105-generic --gpu-mmio-trace gem5-resources/src/gpu-fs/vega_mmio.log --app hello.sh --restore-dir hello_cpt

Terminal Output
If applicable, add the terminal output here. If long, only include the relevant lines.
Please put the terminal output in code blocks. I.e.:


[    8.736510] ------------[ cut here ]------------
[    8.736510] Bad FPU state detected at switch_fpu_return+0x7d/0x120, reinitializing FPU registers.
[    8.736510] WARNING: CPU: 0 PID: 461 at /build/linux-hwe-5.4-utjlqf/linux-hwe-5.4-5.4.0/arch/x86/mm/extable.c:114 ex_handler_fprestore+0x65/0x70
[    8.736510] Modules linked in: ib_uverbs ib_core amdgpu(OE) amd_iommu_v2 amd_sched(OE) amdttm(OE) amdkcl(OE) drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt pata_acpi input_leds mac_hid edac_mce_amd serio_raw sch_fq_codel ip_tables x_tables autofs4
[    8.736510] CPU: 0 PID: 461 Comm: check-new-relea Tainted: G           OE     5.4.0-105-generic #119~18.04.1-Ubuntu
[    8.736510] Hardware name:  , BIOS  06/08/2008
[    8.736510] RIP: 0010:ex_handler_fprestore+0x65/0x70
[    8.736510] Code: 00 00 00 5d c3 48 0f ae 0d 78 30 bc 01 b8 01 00 00 00 5d c3 48 89 c6 48 c7 c7 20 87 34 82 c6 05 80 97 b8 01 01 e8 1b ea 01 00 <0f> 0b eb b9 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 e8 e2
[    8.736510] RSP: 0018:ffffc9000042fde0 EFLAGS: 00010086
[    8.736510] RAX: 0000000000000000 RBX: ffffc9000042fe48 RCX: 0000000000000000
[    8.736510] RDX: 0000000000000005 RSI: ffffffff82f965f5 RDI: 0000000000000046
[    8.736510] RBP: ffffc9000042fde0 R08: ffffffff82f965a0 R09: 0000000000000055
[    8.736510] R10: 0000000000000000 R11: 00000000000001cd R12: 000000000000000d
[    8.736510] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    8.736510] FS:  00007f4cdc05b740(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
[    8.736510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.736510] CR2: 00000000004e5ef0 CR3: 00000000b8f9a000 CR4: 00000000000406f0
[    8.736510] Call Trace:
[    8.736510]  fixup_exception+0x4a/0x60
[    8.736510]  do_general_protection+0x4e/0x150
[    8.736510]  general_protection+0x28/0x30
[    8.736510] RIP: 0010:switch_fpu_return+0x7d/0x120
[    8.736510] Code: 74 67 49 8d bc 24 00 14 00 00 48 89 7d d0 66 66 90 66 90 db e2 0f 77 db 45 d0 66 66 90 66 90 b8 ff ff ff ff 89 c2 48 0f c7 1f <65> 4c 89 2d 0b d7 fd 7e 66 66 66 66 90 45 89 b4 24 c0 13 00 00 65
[    8.736510] RSP: 0018:ffffc9000042fef8 EFLAGS: 00010086
[    8.736510] RAX: 00000000ffffffff RBX: ffff8880b8190000 RCX: 00000000000004dd
[    8.736510] RDX: 00000000ffffffff RSI: 7133cdb5d72e6598 RDI: ffff8880b8191400
[    8.736510] RBP: ffffc9000042ff28 R08: 0000000000000068 R09: 0000000000000001
[    8.736510] R10: 0000000000000068 R11: 000000000000ba5a R12: ffff8880b8190000
[    8.736510] R13: ffff8880b81913c0 R14: 0000000000000000 R15: 0000000000000000
[    8.736510]  ? schedule+0x33/0xa0
[    8.736510]  prepare_exit_to_usermode+0x98/0xa0
[    8.736510]  retint_user+0x8/0x8
[    8.736510] RIP: 0033:0x7f4cd9d7cbf5
[    8.736510] Code: 49 8b b7 80 00 00 00 4c 89 7c 24 48 c7 44 24 50 00 00 00 00 48 8d 04 80 48 8d 04 86 48 85 c0 48 89 44 24 40 0f 84 91 00 00 00 <8b> 10 48 c1 e2 04 49 03 97 88 00 00 00 48 39 c6 48 89 54 24 58 74
[    8.736510] RSP: 002b:00007fffb0954620 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[    8.736510] RAX: 00007f4cd3f2980c RBX: 00007fffb0954640 RCX: 00007f4cd2fe8000
[    8.736510] RDX: 0000000000001281 RSI: 00007f4cd2fe8000 RDI: 00000000001b7f83
[    8.736510] RBP: 00007f4cd1ca850e R08: 00007f4cd30762fd R09: 00007f4cd9d62e00
[    8.736510] R10: 000000000234bcf0 R11: 00007f4cdbc0eb20 R12: 00007fffb0954660
[    8.736510] R13: 000000000260c4b0 R14: 00007f4cd3f1b4a0 R15: 000000000234bcf0
[    8.736510] ---[ end trace b5790e806846cb11 ]---

Expected behavior
There should be no kernel backtrace dumps.

Host Operating System
Ubuntu 20.04

Host ISA
amd64

Compiler used
gcc 9.4.0

Additional information

Manual "backtrace" from Linux KVM call:
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/kvm/x86.c#L3442
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/kernel/fpu/core.c#L338
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/fpu/internal.h#L534
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/fpu/internal.h#L457
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/fpu/internal.h#L445
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/fpu/internal.h#L338
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/fpu/internal.h#L260
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/asm.h#L153
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/mm/extable.c#L106

@BobbyRBruce BobbyRBruce added bug cpu-kvm gem5's KVM CPU arch-x86 The X86 ISA and removed enhancement labels Oct 11, 2023
@BobbyRBruce
Copy link
Member

@abmerop : Just so I know if this is assignable or not: are you just reporting this or are you working on this fix?

@abmerop
Copy link
Member Author

abmerop commented Oct 11, 2023

Just reporting for now. If I have time to look into this i'll self-assign

@abmerop
Copy link
Member Author

abmerop commented Apr 1, 2024

Accidentally duplicated this... Will use new ticket #958

@abmerop abmerop closed this as completed Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x86 The X86 ISA bug cpu-kvm gem5's KVM CPU
Projects
None yet
Development

No branches or pull requests

2 participants