Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL hang up after upgrade to version 2.0.0 #10534

Closed
1 of 2 tasks
TaylorTrz opened this issue Sep 24, 2023 · 6 comments
Closed
1 of 2 tasks

WSL hang up after upgrade to version 2.0.0 #10534

TaylorTrz opened this issue Sep 24, 2023 · 6 comments
Assignees

Comments

@TaylorTrz
Copy link

TaylorTrz commented Sep 24, 2023

Windows Version

10.0.22621.2338

WSL Version

2.0.0.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

5.15.123.1-1

Distro Version

Ubuntu 20.04.5 LTS

Other Software

No response

Repro Steps

  1. Upgrade to WSL 2.0.0

  2. Open new feature with .wslconfig:

[experimental]
autoMemoryReclaim=gradual
sparseVhd=true
  1. Running Wsl for almost 5 hours, the dmesg output:
[Sun Sep 24 18:16:29 2023] WSL (1) ERROR: WriteToFile:3321: write(/sys/fs/cgroup/memory.reclaim, 35785605
[Sun Sep 24 18:16:29 2023] ) failed -1 11
[Sun Sep 24 18:16:29 2023] WSL (1) ERROR: Resource temporarily unavailable @main.cpp:399 (operator())
  1. Even the bash terminal can not open successfully.
    image

  2. With strace output, the syscall hang on stating Windows files on C Driver:

19:08:45.214606 stat(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
19:08:45.214712 stat("/usr/local/sbin/mandoc", 0x7fffa3218c70) = -1 ENOENT (No such file or directory)
19:08:45.214818 stat("/usr/local/bin/mandoc", 0x7fffa3218c70) = -1 ENOENT (No such file or directory)
19:08:45.214920 stat("/usr/sbin/mandoc", 0x7fffa3218c70) = -1 ENOENT (No such file or directory)
19:08:45.215025 stat("/usr/bin/mandoc", 0x7fffa3218c70) = -1 ENOENT (No such file or directory)
19:08:45.215119 stat("/sbin/mandoc", 0x7fffa3218c70) = -1 ENOENT (No such file or directory)
19:08:45.215224 stat("/bin/mandoc", 0x7fffa3218c70) = -1 ENOENT (No such file or directory)
19:08:45.215330 stat("/usr/games/mandoc", 0x7fffa3218c70) = -1 ENOENT (No such file or directory)
19:08:45.215437 stat("/usr/local/games/mandoc", 0x7fffa3218c70) = -1 ENOENT (No such file or directory)
19:08:45.215534 stat("/usr/lib/wsl/lib/mandoc", 0x7fffa3218c70) = -1 ENOENT (No such file or directory)
19:08:45.215640 stat("/mnt/c/Python310/Scripts/mandoc",

Expected Behavior

The system should running smoothly, and Bash open successfully.

Actual Behavior

System seems hang up with some syscall

Diagnostic Logs

No response

Tasks

No tasks being tracked yet.
@zcobol
Copy link

zcobol commented Sep 24, 2023

@TaylorTrz with autoMemoryReclaim=gradual WSL uses cgrupv2 only and the location of memory.reclaim is under unified:

/sys/fs/cgroup/unified/memory.reclaim

Why is your distro trying to access /sys/fs/cgroup/memory.reclaim, which is probably missing thus ERROR.

@pmartincic
Copy link
Collaborator

Thank-you for reporting @TaylorTrz, this was well written and easy to follow.

@pmartincic
Copy link
Collaborator

@TaylorTrz, I wasn't able to reproduce the hang in question. Thanks for pointing out WSL (1) ERROR: Resource temporarily unavailable @main.cpp:399 (operator()), you shouldn't see that anymore, but you'll still see the previous error line get printed when we try to reclaim memory and the kernel is unable to do so.

To confirm, you're saying it's possible to clear the hang you see with ctrl+c?

@TaylorTrz
Copy link
Author

TaylorTrz commented Dec 8, 2023

@TaylorTrz with autoMemoryReclaim=gradual WSL uses cgrupv2 only and the location of memory.reclaim is under unified:

/sys/fs/cgroup/unified/memory.reclaim

Why is your distro trying to access /sys/fs/cgroup/memory.reclaim, which is probably missing thus ERROR.

Hi, i reproduce issue above, and change cGroup v1 to cGroup v2.

wsl

# stat -fc %T /sys/fs/cgroup/
cgroup2fs

wslconfig

[wsl2]
kernelCommandLine = "cgroup_no_v1=all"  # Disable cgroupV1 to make memory reclamin function well

[experimental]
autoMemoryReclaim=gradual

However, wried kernel hang also happened.

@TaylorTrz
Copy link
Author

TaylorTrz commented Dec 8, 2023

After tried different autoMemoryReclaim options, still met with kernel hung.

[Fri Dec  8 01:45:06 2023] ) failed -1 11
[Fri Dec  8 01:47:36 2023] WSL (1) ERROR: WriteToFile:3389: write(/sys/fs/cgroup/memory.reclaim, 72455700
[Fri Dec  8 01:47:36 2023] ) failed -1 11
[Fri Dec  8 01:48:06 2023] WSL (1) ERROR: WriteToFile:3389: write(/sys/fs/cgroup/memory.reclaim, 71955087
[Fri Dec  8 01:48:06 2023] ) failed -1 11
[Fri Dec  8 01:49:06 2023] WSL (1) ERROR: WriteToFile:3389: write(/sys/fs/cgroup/memory.reclaim, 72191016
[Fri Dec  8 01:49:06 2023] ) failed -1 11
[Fri Dec  8 01:49:36 2023] WSL (1) ERROR: WriteToFile:3389: write(/sys/fs/cgroup/memory.reclaim, 72598609
[Fri Dec  8 01:49:36 2023] ) failed -1 11
[Fri Dec  8 01:50:06 2023] WSL (1) ERROR: WriteToFile:3389: write(/sys/fs/cgroup/memory.reclaim, 72918097
[Fri Dec  8 01:50:06 2023] ) failed -1 11
[Fri Dec  8 01:55:06 2023] WSL (1) ERROR: WriteToFile:3389: write(/sys/fs/cgroup/memory.reclaim, 73623429
[Fri Dec  8 01:55:06 2023] ) failed -1 11
[Fri Dec  8 01:55:36 2023] WSL (1) ERROR: WriteToFile:3389: write(/sys/fs/cgroup/memory.reclaim, 73704898
[Fri Dec  8 01:55:36 2023] ) failed -1 11
[Fri Dec  8 01:59:24 2023] rcu: INFO: rcu_sched self-detected stall on CPU
[Fri Dec  8 01:59:24 2023] rcu:         8-....: (6000 ticks this GP) idle=8c1/1/0x4000000000000000 softirq=112980/112980 fqs=2863
[Fri Dec  8 01:59:24 2023]      (t=6000 jiffies g=704373 q=16226)
[Fri Dec  8 01:59:24 2023] NMI backtrace for cpu 8
[Fri Dec  8 01:59:24 2023] CPU: 8 PID: 47388 Comm: iou-wrk-47319 Not tainted 5.15.133.1-microsoft-standard-WSL2 #1
[Fri Dec  8 01:59:24 2023] Call Trace:
[Fri Dec  8 01:59:24 2023]  <IRQ>
[Fri Dec  8 01:59:24 2023]  dump_stack_lvl+0x34/0x48
[Fri Dec  8 01:59:24 2023]  nmi_cpu_backtrace.cold+0x30/0x70
[Fri Dec  8 01:59:24 2023]  ? lapic_can_unplug_cpu+0x80/0x80
[Fri Dec  8 01:59:24 2023]  nmi_trigger_cpumask_backtrace+0xcd/0xd0
[Fri Dec  8 01:59:24 2023]  rcu_dump_cpu_stacks+0xc1/0xf3
[Fri Dec  8 01:59:24 2023]  rcu_sched_clock_irq.cold+0xe8/0x220
[Fri Dec  8 01:59:24 2023]  ? trigger_load_balance+0x60/0x2e0
[Fri Dec  8 01:59:24 2023]  update_process_times+0x8c/0xc0
[Fri Dec  8 01:59:24 2023]  tick_sched_timer+0x8c/0xa0
[Fri Dec  8 01:59:24 2023]  ? tick_sched_do_timer+0x90/0x90
[Fri Dec  8 01:59:24 2023]  __hrtimer_run_queues+0x124/0x270
[Fri Dec  8 01:59:24 2023]  hrtimer_interrupt+0x10e/0x240
[Fri Dec  8 01:59:24 2023]  __sysvec_hyperv_stimer0+0x2e/0x60
[Fri Dec  8 01:59:24 2023]  sysvec_hyperv_stimer0+0x6d/0x90
[Fri Dec  8 01:59:24 2023]  </IRQ>
[Fri Dec  8 01:59:24 2023]  <TASK>
[Fri Dec  8 01:59:24 2023]  asm_sysvec_hyperv_stimer0+0x16/0x20
[Fri Dec  8 01:59:24 2023] RIP: 0010:prepare_to_wait_event+0x65/0x180
[Fri Dec  8 01:59:24 2023] Code: 18 49 39 c6 74 7e 65 48 8b 14 25 80 ac 01 00 89 d8 87 42 18 45 31 f6 4c 89 ee 4c 89 e7 e8 f3 96 de 00 4c 89 f0 48 83 c4 08 5b <5d> 41 5c 41 5d 41 5e 41 5f e9 9d 35 0b 01 48 8b 10 f7 c2 00 00 02
[Fri Dec  8 01:59:24 2023] RSP: 0018:ffffb56f865bb758 EFLAGS: 00000296
[Fri Dec  8 01:59:24 2023] RAX: fffffffffffffe00 RBX: ffffa0291df30000 RCX: 0000000000000001
[Fri Dec  8 01:59:24 2023] RDX: ffffb56f865bb7b8 RSI: 0000000000000296 RDI: ffffa029127daeb0
[Fri Dec  8 01:59:24 2023] RBP: ffffb56f865bb7a0 R08: ffffb56f865bb7b8 R09: ffffa029ffc2a330
[Fri Dec  8 01:59:24 2023] R10: 0000000000000001 R11: 0000000000000000 R12: ffffa029127daeb0
[Fri Dec  8 01:59:24 2023] R13: 0000000000000296 R14: fffffffffffffe00 R15: ffffa02900bd1200
[Fri Dec  8 01:59:24 2023]  p9_client_rpc+0x143/0x5c0
[Fri Dec  8 01:59:24 2023]  ? do_wait_intr_irq+0xb0/0xb0
[Fri Dec  8 01:59:24 2023]  p9_client_flush+0xa1/0x140
[Fri Dec  8 01:59:24 2023]  p9_client_rpc+0x592/0x5c0
[Fri Dec  8 01:59:24 2023]  ? do_wait_intr_irq+0xb0/0xb0
[Fri Dec  8 01:59:24 2023]  ? idr_alloc_u32+0x8d/0xd0
[Fri Dec  8 01:59:24 2023]  p9_client_walk+0x92/0x2d0
[Fri Dec  8 01:59:24 2023]  ? v9fs_fid_find+0x7e/0x140
[Fri Dec  8 01:59:24 2023]  ? v9fs_vfs_lookup.part.0+0x65/0x150
[Fri Dec  8 01:59:24 2023]  v9fs_vfs_lookup.part.0+0x65/0x150
[Fri Dec  8 01:59:24 2023]  __lookup_slow+0x88/0x150
[Fri Dec  8 01:59:24 2023]  walk_component+0x158/0x1d0
[Fri Dec  8 01:59:24 2023]  ? path_init+0x2c0/0x3f0
[Fri Dec  8 01:59:24 2023]  path_lookupat+0x6e/0x1c0
[Fri Dec  8 01:59:24 2023]  filename_lookup+0xcf/0x1d0
[Fri Dec  8 01:59:24 2023]  ? __check_object_size+0x146/0x160
[Fri Dec  8 01:59:24 2023]  ? strncpy_from_user+0x4e/0x150
[Fri Dec  8 01:59:24 2023]  ? getname_flags.part.0+0x48/0x1b0
[Fri Dec  8 01:59:24 2023]  user_path_at_empty+0x3a/0x60
[Fri Dec  8 01:59:24 2023]  vfs_statx+0x74/0x130
[Fri Dec  8 01:59:24 2023]  ? __wake_up_common+0x80/0x190
[Fri Dec  8 01:59:24 2023]  do_statx+0x40/0x80
[Fri Dec  8 01:59:24 2023]  ? __wake_up_common_lock+0x8a/0xc0
[Fri Dec  8 01:59:24 2023]  ? update_load_avg+0x7a/0x5c0
[Fri Dec  8 01:59:24 2023]  ? newidle_balance+0x124/0x3d0
[Fri Dec  8 01:59:24 2023]  io_issue_sqe+0x17f5/0x20c0
[Fri Dec  8 01:59:24 2023]  ? finish_task_switch.isra.0+0x80/0x290
[Fri Dec  8 01:59:24 2023]  io_wq_submit_work+0x79/0xd0
[Fri Dec  8 01:59:24 2023]  io_worker_handle_work+0x182/0x530
[Fri Dec  8 01:59:24 2023]  io_wqe_worker+0x2ab/0x320
[Fri Dec  8 01:59:24 2023]  ? finish_task_switch.isra.0+0x80/0x290
[Fri Dec  8 01:59:24 2023]  ? io_worker_handle_work+0x530/0x530
[Fri Dec  8 01:59:24 2023]  ? io_worker_handle_work+0x530/0x530
[Fri Dec  8 01:59:24 2023]  ret_from_fork+0x22/0x30
[Fri Dec  8 01:59:24 2023]  </TASK>

What should i do to avoid this kernel hung.
20231208-kernel-hung.log
20231208-trace-wsl.txt

NOTE: This bug will lead to file misclean after reboot.!!!
If i happenly open a file with vim, this file was automatedly fsck and remove inode from filesystem after i reboot wsl2.

This is much too danger than expected.

@TaylorTrz
Copy link
Author

@TaylorTrz, I wasn't able to reproduce the hang in question. Thanks for pointing out WSL (1) ERROR: Resource temporarily unavailable @main.cpp:399 (operator()), you shouldn't see that anymore, but you'll still see the previous error line get printed when we try to reclaim memory and the kernel is unable to do so.

To confirm, you're saying it's possible to clear the hang you see with ctrl+c?

Hi, ctrl+c only helps to resume bash shell, kernel is still in hung-up state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants