Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL Crashes and Exits #8250

Closed
1 of 2 tasks
Thernn88 opened this issue Apr 7, 2022 · 7 comments
Closed
1 of 2 tasks

WSL Crashes and Exits #8250

Thernn88 opened this issue Apr 7, 2022 · 7 comments

Comments

@Thernn88
Copy link

Thernn88 commented Apr 7, 2022

Version

Version 10.0.22000.593

WSL Version

  • WSL 2
  • WSL 1

Kernel Version

5.10.102.1

Distro Version

22.04

Other Software

Happens while orthograph is running.

https://github.com/mptrsen/Orthograph

Secondary computer with cloned instance, however, is fine. Happened on 20.04 also. Upgraded in an attempt to see if it fixed crashes. No such luck.

Repro Steps

Random. Occurs during heavy usage of Orthograph.

Expected Behavior

Not crashing.

Actual Behavior

Randomly crashes.

Diagnostic Logs

'Virtual Machine' has encountered a fatal error. The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x0, ErrorCode1: 0x0, ErrorCode2: 0x0, ErrorCode3: 0x0, ErrorCode4: 0x0. PreOSId: 0. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID D2AC38E3-B3E4-430E-A7F9-9E46B31F91E7)

Guest message:
[ 2089.359138] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2089.359139] CR2: ffffc9000fa0bc00 CR3: 000000050f25a000 CR4: 0000000000350ea0
[ 2089.359141] Call Trace:
[ 2089.359146] get_page_from_freelist+0x1cc/0x1070
[ 2089.359150] ? __memcg_kmem_charge_page+0xa4/0x200
[ 2089.359152] __alloc_pages_nodemask+0x12c/0x2c0
[ 2089.359155] handle_mm_fault+0xfdd/0x1670
[ 2089.359159] __get_user_pages+0x235/0x6c0
[ 2089.359161] __get_user_pages_remote+0xd4/0x2b0
[ 2089.359164] get_arg_page+0x3e/0xa0
[ 2089.359166] copy_string_kernel+0xb7/0x190
[ 2089.359168] do_execveat_common.isra.0+0x132/0x1c0
[ 2089.359170] __x64_sys_execve+0x33/0x40
[ 2089.359173] do_syscall_64+0x33/0x80
[ 2089.359176] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2089.359178] RIP: 0033:0x7fd054ac90fb
[ 2089.359180] Code: f8 01 0f 8e bd fe ff ff 5b 48 8d 3d df 59 13 00 5d 41 5c e9 97 62 fa ff 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 05 dd 12 00 f7 d8 64 89 01 48
[ 2089.359184] RSP: 002b:00007ffea4527028 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[ 2089.359187] RAX: ffffffffffffffda RBX: 000055f7015a21c8 RCX: 00007fd054ac90fb
[ 2089.359189] RDX: 000055f7015a2408 RSI: 000055f7015a21c8 RDI: 000055f7015a24f0
[ 2089.359190] RBP: 000055f700c7c027 R08: 000055f700c7c1ff R09: 000055f7015a2400
[ 2089.359192] R10: 0000000000000008 R11: 0000000000000246 R12: 000055f7015a2408
[ 2089.359194] R13: 00007ffea4527118 R14: 000055f7015a2408 R15: 000055f7015a24f0
[ 2089.359197] Modules linked in:
[ 2089.359199] CR2: ffffc9000fa0bc00
[ 2089.359201] ---[ end trace 182f4cac976c929d ]---
[ 2089.359525] RIP: 0010:__list_del_entry_valid+0x25/0x90
[ 2089.359527] Code: c3 0f 1f 40 00 48 b8 00 01 00 00 00 00 ad de 48 8b 17 4c 8b 47 08 48 39 c2 74 26 48 b8 22 01 00 00 00 00 ad de 49 39 c0 74 2b <49> 8b 30 48 39 fe 75 3a 48 8b 52 08 48 39 f2 75 48 b8 01 00 00 00
[ 2089.359531] RSP: 0018:ffffc9000f9cbb10 EFLAGS: 00010016
[ 2089.359533] RAX: dead000000000122 RBX: ffffea00cd8d12c8 RCX: ffff88b73f931960
[ 2089.359535] RDX: ffffc9000fa0bc00 RSI: ffff88b73f931980 RDI: ffffea00cd8d12c8
[ 2089.359537] RBP: 0000000000100dca R08: ffffc9000fa0bc00 R09: 0000000000000000
[ 2089.359539] R10: 0000000000000006 R11: 0000000000031960 R12: 0000000000000010
[ 2089.359541] R13: ffffffff82e184c0 R14: ffffc9000f9cbc50 R15: ffffffff82e17440
[ 2089.359544] FS: 00007fd0549db740(0000) GS:ffff88b73f900000(0000) knlGS:0000000000000000
[ 2089.359546] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2089.359548] CR2: ffffc9000fa0bc00 CR3: 000000050f25a000 CR4: 0000000000350ea0
[ 2089.359550] Kernel panic - not syncing: Fatal exception
[ 2094.352707] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2099.355711] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2104.358277] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2109.360228] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2114.361858] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2119.364134] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2124.367249] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2129.370524] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2134.373424] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2139.375716] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2144.378817] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2149.381651] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2154.386864] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2159.390046] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2164.392649] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2169.396300] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2174.398822] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2179.402194] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2184.405356] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2189.421204] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 2189.431214] hv_vmbus: Continuing even though VMBus UNLOAD did not complete
[ 2189.431218] Kernel Offset: disabled

@Thernn88
Copy link
Author

Thernn88 commented Apr 8, 2022

Issue appears to have stopped after reverting to Kernel 5.10.16

Was able to duplicate WSL crashing on second computer after I allowed it to update kernel to 5.10.102

Kernel bug

Let me know if there are any logs I can produce to help track it down.

@benhillis
Copy link
Member

@tyhicks - looks like potentially a new issue in the 5.10.102 kernel. Any additional information that we could ask for?

@Thernn88
Copy link
Author

This seems to have mostly resolved using latest WSL version .58 and the Microsoft WSLg Store install.

New, potentially related, issue is that bash hangs every 24 hours or so. It did not use to do this. This is tolerable compared to a crash every 30 minutes.

Are there any options other than a memory dump? I can have up to 200gb of ram in use when it hangs. That would be a chunky dmp file.

I am wondering if it might be hardware related. Issues occur more frequently when memory is overclocked. Less frequently when I revert to base. OC'd and base memory passes every stress test and error check under the sun though.

@tyhicks
Copy link
Collaborator

tyhicks commented Apr 11, 2022

Can you mention what memory tests you've performed? My initial thoughts do point to hardware problems. It would be good to run (or know that you've ran) Memtest86+.

@Thernn88
Copy link
Author

Memory error. It passed Memtest86+ but one stick fails in Karhu.

Frustrating as this is the 2nd memory kit I've purchased for this workstation.

First kit had 4/8 sticks fail immediately in Memtest86. Errors on this one were sneakier. I will try cleaning the socket and reseating ram.

I must be really unlucky to get two bad kits in a row. At least I can refund this one instead of waiting on an RMA.

Closing this as I'll chalk it up to bad RAM causing this issue.

@tyhicks
Copy link
Collaborator

tyhicks commented Apr 13, 2022

Sorry to hear about the bad RAM but I'm glad we got to the bottom of it before attempting to debug software issues.

@Thernn88
Copy link
Author

I have had no CTD or hangs since removing the bad stick.

Sorry for the trouble!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants