Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL Becomes Non Responsive, Uses High Amounts Of CPU & Memory #9429

Open
1 of 2 tasks
ThatRex opened this issue Jan 3, 2023 · 28 comments
Open
1 of 2 tasks

WSL Becomes Non Responsive, Uses High Amounts Of CPU & Memory #9429

ThatRex opened this issue Jan 3, 2023 · 28 comments
Assignees

Comments

@ThatRex
Copy link

ThatRex commented Jan 3, 2023

Version

Version 10.0.19044.2364

WSL Version

  • WSL 2
  • WSL 1

Kernel Version

5.15.79.1-microsoft-standard-WSL2

Distro Version

Ubuntu 20.04

Other Software

No response

Repro Steps

  1. Install WSL
  2. Start Ubuntu (does not seem to depend of the distro)
  3. Wait for an undefined amount of time?

Expected Behavior

To not become non responsive and use up high amounts of CPU and memory after a period of time.

Actual Behavior

WLS becomes non responsive and use up high amounts of CPU and memory after a period of time. When this happens I am unable use or interact with WSL and the WSL CLI. In order to restart WSL I have to restart my PC. Before restricting the amount of CPU and Memory in the WSL config it would make my whole system lag out.
2023-01-08_12-53
2023-01-08_12-48_1

Diagnostic Logs

No response

@cristianpuddu
Copy link

same problem. I use Vscode, Docker Desktop and run containers for lamp development.
Sometimes It works again with "wsl -.shutdown" and restart docker desktop, but I often have to restart the pc

@dvdmrn
Copy link

dvdmrn commented Jan 5, 2023

I have the same problem, this seems to happen after my computer goes to sleep and I leave WSL running. I do what @cristianpuddu does to mitigate it

@ShaleenAg
Copy link

similar to this #9383 except in my case disk usage balloons up and as soon as I force stop my react app, WSL becomes responsive again.

@ThatRex
Copy link
Author

ThatRex commented Jan 8, 2023

same problem. I use Vscode, Docker Desktop and run containers for lamp development. Sometimes It works again with "wsl -.shutdown" and restart docker desktop, but I often have to restart the pc

Unfortunately for me wsl --shutdown and other WSL commands dont do anything as the WSL cli is completely non responsive when this issue occurs.

@ThatRex
Copy link
Author

ThatRex commented Jan 8, 2023

I have the same problem, this seems to happen after my computer goes to sleep and I leave WSL running. I do what @cristianpuddu does to mitigate it

hmmm, I do put my PC in hibernate, sometimes multiples times a day, so maybe that has something to do with it.

@codebymikey
Copy link

Just like @ThatRex, I also hibernated my PC fairly regularly, however this never led to any non-responsiveness until around last week, when my Ubuntu 18.04 image started prompting me to run wsl --update .

So it's probably something to do with that recent WSL update.

@dvdmrn
Copy link

dvdmrn commented Jan 10, 2023

same problem. I use Vscode, Docker Desktop and run containers for lamp development. Sometimes It works again with "wsl -.shutdown" and restart docker desktop, but I often have to restart the pc

Unfortunately for me wsl --shutdown and other WSL commands dont do anything as the WSL cli is completely non responsive when this issue occurs.

@ThatRex What I typically do is force quit WSL and run this command in PowerShell

@mklueh
Copy link

mklueh commented Jan 12, 2023

I'm constantly running into this problem, using Jetbrains Gateway.

When

wsl --shutdown

does not work anymore and hangs itself, I use

taskkill /f /im wslservice.exe

But now, after I did this, Jetbrains Gateway does not work anymore at all for one of my projects where the freeze happened, even after re-installation.

There is also this issue with a lot of active comments, that has been closed for whatever reason!

#8529

@sba923
Copy link

sba923 commented Jan 13, 2023

Same here.

wsl --shutdown hangs

I hibernate the PC every day at the end of my work day, so maybe that's linked with the fact I had a Ubuntu WSL tab open in Windows Terminal?

Restarting the LxssManager doesn't help.

Can't restart wslservice.exe.

@cristianpuddu
Copy link

same problem. I use Vscode, Docker Desktop and run containers for lamp development. Sometimes It works again with "wsl -.shutdown" and restart docker desktop, but I often have to restart the pc

i did other tests, in my case the problem seems to occur only when i use vscode.

@thangnq1001
Copy link

thangnq1001 commented Jan 13, 2023

Try my steps: #8725 (comment)

@benzman81
Copy link

Same problem, also hibernating (nowadays, who shuts down a computer anyway if you not have to, i.e. win update or some installation). We use rancher desktop with wsl and we constantly need to restart wsl and wait for all kubernetes pods to spin up again when this problem occurs. This is pretty annoying and seems to have started with some latest windows update.

@jeffska
Copy link

jeffska commented Jan 20, 2023

Just dropping in from #8696 which has 150+ comments and still no official response.

@OneBlue
Copy link
Collaborator

OneBlue commented Jan 20, 2023

Thanks everyone for all the info.

We have introduced a new feature to diagnose issue like this in wsl 1.1.0: wsl.exe --debug-shell

We have a couple theories on what the root cause of this could be, but we need more information from a live repro to move forward. Instructions:

  • Update to Store WSL 1.1.0 with: wsl.exe --update --pre-release (Administrator prompt)
  • Reproduce the issue where WSL is stuck
  • Open a debug shell via: wsl.exe --debug-shell
  • Inside that shell, run (just paste into the shell):
stack_log=stacks.txt
fd_log="fd.txt"
for pid in $(pgrep init); do
  echo -e "\nProcess: $pid" >> "$stack_log"
  echo -e "\nProcess: $pid " >> "$fd_log"
  cat "/proc/$pid/cmdline" >> "$fd_log"
  echo -e '\n' >> "$fd_log"
  for tid in $(ls "/proc/$pid/task" || true); do
    echo "tid: $tid" >> "$stack_log"
    cat "/proc/$pid/task/$tid/stack" >> "$stack_log" || true
  done

  ls -la "/proc/$pid/fd" >> "$fd_log" || true
done

ss -lap --vsock > "sockets.txt"

echo "stacks:"
cat stacks.txt

echo "fds:"
cat fd.txt

echo "sockets:" 
cat sockets.txt

echo "dmesg:"
dmesg

echo "meminfo:"
cat /proc/meminfo
  • Share the output of that script with a dump of wslservice.exe on this issue (TaskManager -> Details -> select wslservice.exe -> Right click -> Create dump file)

Note: The debug shell is not user friendly. By design it doesn't use an hvsocket relay like the other shells, but a named pipe. That means that it should work even if the service is completely crashed, as long as the VM is still running. Sometimes the prompt doesn't show, if that's the case, pressing 'return' should be enough to display it.

If the VM is completely crashed, the debug-shell will just remain empty. If that's the case, please capture a dump of the wsl.exe --debug-shell process and share it here

@alex-reach
Copy link

@OneBlue I followed your instructions above. My DMP File is too big to be shared here (only 25 MB allowed). You can download it here: https://bitbit.de/wslservice.zip

@OneBlue
Copy link
Collaborator

OneBlue commented Jan 20, 2023

@OneBlue I followed your instructions above. My DMP File is too big to be shared here (only 25 MB allowed). You can download it here: https://bitbit.de/wslservice.zip

Thank you @alex-reach. Can you also share the script's output ? We need both the script output and the .dmp file to investigate this issue

@alex-reach
Copy link

@OneBlue I followed your instructions above. My DMP File is too big to be shared here (only 25 MB allowed). You can download it here: https://bitbit.de/wslservice.zip

Thank you @alex-reach. Can you also share the script's output ? We need both the script output and the .dmp file to investigate this issue

wsl_debug.txt

@OneBlue Sure, I attached the script output now

@OneBlue
Copy link
Collaborator

OneBlue commented Jan 20, 2023

Thanks a lot @alex-reach. Pasting my answer from the other issue to get more information:

With this we can see that the issue appears to be related to virtio (since that's where the guest is stuck).

We need to dig deeper to understand why things are stuck here though, let's gather more information. Can you please:

Repeat the same steps as before to get wsl hung and share the wslservice.exe dump and the script output again
Run the following command and share its output (elevated cmd.exe): tasklist /m vp9fs.dll
Its output should look like this:
tasklist /m vp9fs.dll

Image Name PID Modules
========================= ======== ============================================
dllhost.exe 28572 vp9fs.dll
vmwp.exe 4956 vp9fs.dll
The second column gives you the PID of the processes (there should be one vmwp.exe, and one or two dllhost.exe). Please dump all of them (Use the PID to differentiate if there are processes running with the same name) and share the dumps here.

To summarize, we need:

The output of the script I shared #8696 (comment)
The output of tasklist /m vp9fs.dll
The dumps for wslservice.exe, and all the vmwp.exe, wmwpWSL.exe, and dllhost.exe processes returned by tasklist

@ThatRex
Copy link
Author

ThatRex commented Jan 22, 2023

Okay so WSL has done it again. I have updated to 1.1.0. When I try run the wsl.exe --debug-shell once its crashed nothing happens. Non the less I dumped wslservice.exe on the off chance it may be useful. Here is the download: https://drive.google.com/file/d/1akb4CIDesRxgZLdlsanm_JZddnWcudRM/view?usp=sharing
2023-01-22_21-08
image

@alex-reach
Copy link

@OneBlue Sure, here we go:

You can download everything here:

https://www.bitbit.de/wsl-debug.zip

But I have no "wmwpWSL.exe" running ..

@EvanSchalton
Copy link

Just redirected here from #8529
I'm having the same issue, someone over there indicated it might be a ram issue -- I have 64GB of ram and nothing else runningm, 56 is accessible via the .wslconfig (w/ 16 slot for good measure) -- I don't think it's a ram issue

@EvanSchalton
Copy link

EvanSchalton commented Jan 24, 2023

For me this seems to happen when my build context is over 24mb (despite 64GB ram, 16GB Swap & 128GB SSD space) -- I suspect, thought I'm not sure how to validate that the build kit is hitting a recurssion issue w/ symlinks in the build dir and isn't exiting properly -- I think as my build context gets larger I loose insight into the dir structure so I've been conflating the two issues -- if I have time I'll create a minimal example w/ a recursive symlink build context and see if it causes a hang

@OneBlue
Copy link
Collaborator

OneBlue commented Jan 24, 2023

Thank you @alex-reach.

This is really interesting because your repro is a bit different from the one in #8696.

In your case, dllhost.exe is still there after resuming from hibernation, but it doesn't seem to be notified properly.
Let's capture some logs to figure out why. Can you please:

  • Start a log collection (see 8) Collect WSL logs (recommended method)
  • Reproduce the issue
  • Stop the log collection
  • Run the debug shell script I shared earlier
  • Dump wslservice.exe, dllhost.exe and vmwp.exe as I explained earlier
  • Share the dumps, the debug shell output, and the log result here

@alex-reach
Copy link

@OneBlue Thanks for your feedback. I had to revert in the meantime to WSL 1.0.3.0, because I had issues with the Apache Service stopping to respond frequently in the preview WSL 1.1.0.0. I will install the preview again in a few days to send you a new log collection.

@fontanka16
Copy link

wsl.exe --debug-shell has as little effect as wsl --shutdown for me (none). I have been waiting for five minutes.
Whe looking into TaskManager, there is a lot of activity though:
image

I am not sure if any of the above commands kicked off that level of CPU usage

@N0xFF
Copy link

N0xFF commented Feb 18, 2023

  • Start a log collection (see 8) Collect WSL logs (recommended method)
  • Reproduce the issue
  • Stop the log collection
  • Run the debug shell script I shared earlier
  • Dump wslservice.exe, dllhost.exe and vmwp.exe as I explained earlier
  • Share the dumps, the debug shell output, and the log result here

@OneBlue I repeated all your steps and collected logs and dumps.

My steps to reproduce:

  1. Run Docker Desktop.
  2. Open Bash.
  3. Sleep (not hibernate).
  4. Wake up.
  5. Collect logs and dumps.

-> LOGS

Doesn't have dllhost.exe for vp9fs.dll:

PS C:\WINDOWS\system32> tasklist /m vp9fs.dll

Image Name                     PID Modules
========================= ======== ============================================
vmwp.exe                     11812 vp9fs.dll

@OneBlue OneBlue mentioned this issue Mar 10, 2023
2 tasks
@jak-sdk
Copy link

jak-sdk commented Jun 21, 2023

This happens to me daily, for me it seems to be excessive interrupts from Hyper-V?

top - 10:21:03 up 10:07,  3 users,  load average: 1.04, 1.20, 1.12
Tasks:  72 total,   1 running,  71 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,100.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.4 us,  0.0 sy,  0.0 ni, 99.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.9 us,  0.0 sy,  0.0 ni, 98.2 id,  0.0 wa,  0.0 hi,  0.9 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   5930.4 total,   1543.8 free,    789.2 used,   3597.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   4855.8 avail Mem

Perf:

   PerfTop:    4561 irqs/sec  kernel:99.4%  exact:  0.0% lost: 0/0 drop: 0/0 [2200Hz cycles],  (all, 6 CPUs)
---------------------------------------------------------------------------------------------

    31.01%  [kernel]          [k] hv_ce_set_next_event
    23.17%  [kernel]          [k] native_apic_mem_write
    22.73%  [kernel]          [k] asm_sysvec_hyperv_stimer0
     3.16%  [kernel]          [k] read_tsc
     1.96%  [kernel]          [k] __sysvec_hyperv_stimer0
     1.88%  [kernel]          [k] read_hv_clock_tsc
     1.80%  [kernel]          [k] add_interrupt_randomness
     1.66%  [kernel]          [k] hrtimer_interrupt
     1.15%  [kernel]          [k] clockevents_program_event
     0.84%  [kernel]          [k] irq_exit_rcu
# cat /proc/interrupts  | grep HVS
HVS:  254943891  294333063  651585438  293415102  274733401  217126954   Hyper-V stimer0 interrupts

And to compare to normal behaviour:

   PerfTop:      24 irqs/sec  kernel:95.8%  exact:  0.0% lost: 0/0 drop: 0/0 [4000Hz cycles],  (all, 6 CPUs)
---------------------------------------------------------------------------------------------

    24.13%  perf              [.] __symbols__insert
    15.97%  perf              [.] rb_next
     5.58%  [kernel]          [k] kallsyms_expand_symbol.constprop.0
     4.60%  perf              [.] rb_insert_color
     3.60%  perf              [.] kallsyms__parse
     2.68%  libc.so.6         [.] 0x00000000001b2ef9
     2.25%  [kernel]          [k] vsnprintf
     2.06%  libelf-0.186.so   [.] gelf_getphdr
# grep HVS /proc/interrupts
HVS:       3439       3326       3454       4390       3559       3150   Hyper-V stimer0 interrupts

@TheWizier
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests