Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL complete freeze #8824

Closed
1 of 2 tasks
schiorean opened this issue Sep 13, 2022 · 70 comments
Closed
1 of 2 tasks

WSL complete freeze #8824

schiorean opened this issue Sep 13, 2022 · 70 comments

Comments

@schiorean
Copy link

Version

Microsoft Windows [Version 10.0.22000.918]

WSL Version

  • WSL 2
  • WSL 1

Kernel Version

5.15.57.1

Distro Version

Ubuntu 22.04

Other Software

PhpStorm 2022.2.1

Repro Steps

  1. Open a large project in PhpStorm (starts indexing project fileds...)
  2. Open another large project in PhpStorm (starts indexing project fileds...)
  3. Keep switching between large projects and sooner or later WSL is completely frozen while PhpStorm is indexing files

Expected Behavior

PhpStorm finishes files indexing and WSL is usable.

Actual Behavior

WSL freezes completely including any wsl.exe command, so even wsl.exe --shutdown will hang forever. The only way to restart WSL is by doing a computer restart.

Diagnostic Logs

Uploading via Feedback Hub.

@schiorean
Copy link
Author

Here is the Feedback Hub WLS logs https://aka.ms/AAi265d

@OneBlue
Copy link
Collaborator

OneBlue commented Sep 13, 2022

Thanks for reporting this @schiorean. Unfortunately I'm not seeing any logs on Feedback Hub. Can you share logs here ?

@OneBlue
Copy link
Collaborator

OneBlue commented Sep 13, 2022

/logs

@ghost
Copy link

ghost commented Sep 13, 2022

Hello! Could you please provide more logs to help us better diagnose your issue?

To collect WSL logs, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The scipt will output the path of the log file once done.

Once completed please upload the output files to this Github issue.

Click here for more info on logging

Thank you!

@schiorean
Copy link
Author

@OneBlue here's the log. Many thanks!

WslLogs-2022-09-13_14-38-49.zip

@ghost ghost removed the needs-author-feedback label Sep 13, 2022
@OneBlue
Copy link
Collaborator

OneBlue commented Sep 13, 2022

Thank you @schiorean. I'm not seeing anything that jumps out in the logs.

This looks like 'freeze' issue we've been investigating for a while.

If you can reproduce this consistently, can you please:

  • First clear everything by running wsl --shutdown (make sure nothing is running). If this is unresponsive, kill wslservice.exe.
  • Open a shell inside your the system distro (and leave it open) via: wsl --u root --system
  • Install gdb in that shell via: [edited]
  • Run the steps you described to get WSL into that "frozen" state in your regular distro (not inside the system distro)
  • Once WSL is frozen, use the previously opened shell to dump all the 'init' processes via: gcore -a $(pgrep init)
  • This will generate a few core.* files, please shares those on this issue
  • Also please share a dump of wslservice.exe (you can do this via task manager, in the 'details' tab, right click on wslservice.exe, then 'Create dump file')
  • Also please share the output of dmesg in the system shell

This should give us enough information to understand what's happening

@schiorean
Copy link
Author

schiorean commented Sep 14, 2022

Hi @OneBlue,

I reproduced it again, but I don't have the gcore output because I can't figure out how to install gdb in the system shell (none of the usual commands are recognized e.g. trying apt install gdb I get -bash: apt: command not found). So until I can install gdb in the system shell (I sent an email to [edited] I am sharing here the output of dmesg and wslservice service dump.

Some extra notes, maybe it's helping:

  1. Even though wsl.exe is frozen the currently open shells are still usable (btw, can't I install gdb in the normal, non-system, shell and provide the dump from there?)
  2. The only way to restart WSL besides a full computer restart, is by opening a terminal as Administrator, then run hcsdiag list followed by hcsdiag kill.

I tried to upload the zip file containing the dumps here, but looks like File size is too big. I uploaded again in Feedback Hub https://aka.ms/AAi265d however when I open Details I can't see it... If you can't see it in Feedback Hub please tell me where to send the file?

@OneBlue
Copy link
Collaborator

OneBlue commented Sep 14, 2022

Oh sorry @schiorean, that was a missed copy-paste from me.

What I meant was:
"Install gdb in that shell via tdnf install gdb"

But there was an unrelated email address in my clipboard (edited out). Sorry about that.

Can you try that and share the Linux dumps ? It should be possible to upload them directly on this issue.

@schiorean
Copy link
Author

@OneBlue
I uploaded dmesg output, wslservice.exe dump is too big (> 25MB) and can't upload it here.
dmesg.txt

Later today I hope will provide the dbg dump too.

@OneBlue
Copy link
Collaborator

OneBlue commented Sep 15, 2022

Thank you @schiorean. Sadly nothing jumps out from dmesg so we'll need the dumps to root cause this.

If the dumps are too big for Github, OneDrive / Google drive should work.
Given the symptoms I'm suspecting that the issue is on the Linux side, so the Linux dumps should be the most interesting files for this issue.

@philmb3487
Copy link

Hi, I am also getting random freezes like that.

@schiorean
Copy link
Author

@OneBlue finally, attached are the core files.
And in case you missed it WSL Service logs I uploaded a few days ago #8824 (comment)

core.zip

@schiorean
Copy link
Author

schiorean commented Sep 20, 2022

@OneBlue here's another core dump, this time it happened faster compared with the previous one. I estimate about ~20 minutes since started WSL & system console.

core-take-2.zip

@nexton-winjeel
Copy link

We've hit a similar issue here. I can't reliably reproduce it, but I do have a workaround that fixes our specific problem. In our case:

  • We have a CI build that is running inside WSL.
  • If Windows Defender starts scanning the build folder, WSL will freeze (as per @schiorean's description).
  • If we exclude the build folder from Windows Defender, we don't see this behaviour.

As I said, I can't reliably reproduce this issue, but it seems to happen most often when a file is deleted in WSL (it seems to occur more frequently in the clean stage of our build).

@zed76r
Copy link

zed76r commented Sep 21, 2022

There is another way to restore WSL usable.

  1. win+s Search WSL
  2. right-click, "App Settings (应用设置 in my localized)"
  3. then click "reset".

@schiorean
Copy link
Author

schiorean commented Sep 21, 2022

If we exclude the build folder from Windows Defender, we don't see this behaviour.

@sypaq-nexton Unortunately, I have the WSL home folder added in "Virus & threat protection" exclusion for many months already, so it's not a solution (for me at least).

@OneBlue
Copy link
Collaborator

OneBlue commented Sep 21, 2022

Thanks a lot for the dumps @schiorean.

We've published a new version of store wsl, if you can still reproduce the issue with the latest version, can you please share the Windows and Linux dumps of the same repro?

That would help us a lot

@schiorean
Copy link
Author

@OneBlue wsl --update says I'm already at the latest version intalled (0.66.2.0).

@OneBlue
Copy link
Collaborator

OneBlue commented Sep 22, 2022

If you're not enrolled in Windows insider, that makes sense since we haven't published that package everywhere yet.

You can install it without an insider account by downloading the package and running something like (elevated PowerShell) :

$installedPackage = Get-AppxPackage MicrosoftCorporationII.WindowsSubsystemforLinux -AllUsers
Remove-AppxPackage $installedPackage -AllUsers
Add-AppxPackage /path/to/msixbundle

@schiorean
Copy link
Author

@OneBlue so it happened again with 0.67.6.0. I attached linux core dump and dmesg.
wslservice.exe dump file I uploaded via Feedback hub again, if for some reason you can't access it I will upload it in Google Drive, let me know please.
core_take_3.zip
dmesg.log

@schiorean
Copy link
Author

Actually here's the wslservice.exe dump as well https://drive.google.com/file/d/19nPGc6-NOpv_f0RSp8-5kgRwW1Y-1y5n

@OneBlue
Copy link
Collaborator

OneBlue commented Sep 24, 2022

Thanks @schiorean.

After looking at all the dumps, I have a good idea of where the issue is, but I'll need a bit more info to root cause it.

I built a private version of the package with extra logging: https://1drv.ms/u/s!AiWXuqXSX5K2d45U1MHyOchps0k?e=w8SSjC
To install it (elevated powershell):

# Remove the installed package
$installedPackage = Get-AppxPackage MicrosoftCorporationII.WindowsSubsystemforLinux -AllUsers
Remove-AppxPackage $installedPackage -AllUsers

# Trust the private package's certificate (since it's not an official build, it's not signed with the official Microsoft certificate)
(Get-AuthenticodeSignature "/path/to/msixbundle").SignerCertificate | Export-Certificate -FilePath private-wsl.cert
Import-Certificate -FilePath .\private-wsl.cert -CertStoreLocation Cert:\LocalMachine\Root

# Install the package
Add-AppxPackage /path/to/msixbundle

Once the package is installed:

  • before doing anything please start the log collection with collect-wsl-logs.ps1
  • Open a shell inside your the system distro (and leave it open) via: wsl --u root --system
  • Reproduce the hang
  • Inside the system distro shell, run: ss -lap --vsock, ls -la /proc/*/fd and dmesg and share the output of the three commands
  • Dump wslservice.exe

This should give us more information to identify what's happening.

@schiorean
Copy link
Author

schiorean commented Sep 26, 2022

@OneBlue attached are the logs are per your latest instructions (took a 2-3 hours until hang happened). wslservice.exe dump uploaded separately here https://drive.google.com/file/d/19nPGc6-NOpv_f0RSp8-5kgRwW1Y-1y5n/view?usp=sharing

WslLogs-2022-09-26_11-09-22.zip
dmesg.txt
proc.txt
vsoc.txt

@OneBlue
Copy link
Collaborator

OneBlue commented Sep 27, 2022

Thank you @schiorean.

With this information, our current theory is that this issue was introduced by the latest Linux kernel upgrade.

To validate this, can you please:

[wsl2]
kernel=C:\\path\\to\\kernel-5-10

(Make sure that the '\' are doubled)

  • Then shutdown wsl via: wsl.exe --shutdown
  • Run: wsl uname -a to make sure that the correct kernel is in use. It should say: Linux [hostname] 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Then try to reproduce the issue (please share dmesg output if you can reproduce again with this kernel)

@schiorean
Copy link
Author

schiorean commented Sep 27, 2022

@OneBlue sorry but it happened again with the 5.10 kernel as well. Attached is the dmesg, I didn't have the system console started but was able to get it from normal console.
Let me know if I can do anything else to help you guys. This is really frustrating and I really don't want to leave Windows, it's perfect for my development besides this thing.

And when I dumped dmesg I confirmed the kernel version:

sorin@think:~$ uname -a
Linux think 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

dmesg.log

@michaelarnauts
Copy link

I have the same problem as OP. Same usecase (Phpstorm that causes WSL to hang during indexing, or probably, anything that does high IO). It happens multiple times a day, and I can reproduce this quite easily.

A faster way to get a fresh WSL env without rebooting (since the wsl --shutdown indeed hangs forever) is to kill wslservice.exe from the task manager.

The output of collect-wsl-logs.ps1 is here:
WslLogs-2022-09-28_14-05-40.zip

wsl --version:

WSL version: 0.66.2.0
Kernel version: 5.15.57.1
WSLg version: 1.0.42
MSRDC version: 1.2.3401
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22000.1042

I'll try the "unpublished" package above and retry.

@XhmikosR
Copy link

Unfortunately, this is definitely an issue for me too. I confirmed it on 2 different machines, and it makes WSL totally unusable. wsl --shutdown doesn't work either, so the only solution is to kill wslservice but it's not a real workaround since VS Code is stuck too...

C:\Users\user>wsl --version
WSL version: 0.70.4.0
Kernel version: 5.15.68.1
WSLg version: 1.0.45
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.819

@bowmanjd
Copy link

I, too, had this issue for some time. I got in the habit of closing all WSL windows at the end of the day, and this seemed to help. But I would still occasionally have freezing issues even after installing the fixed WSL version.

So... last week in a terrible and painful mistake, I deleted my WSL virtual disk for the distro I use the most. Since re-creating it, I haven't had this issue at all. Go figure. @XhmikosR, I never had the "totally unusable" experience that you had; it was more of a minor inconvenience. But if it is that bad, you may want to consider redoing your WSL distros.

For what it is worth, I moved from Fedora 36 to Fedora 37. Unsure if that is relevant, though.

@XhmikosR
Copy link

The thing is that the issue just started to appear for me a few weeks ago. Before that, everything worked fine.

Now, using Docker + VS Code, breaks quite frequently during the day. The only workaround for me is stopping and starting wslservice, but it breaks my workflow, totally.

I will try reinstalling Ubuntu, but there are other people having this issue, see also #9114.

@StewartWon
Copy link

StewartWon commented Nov 18, 2022

I too have been having issues with this. It always occurs when doing a large c++ cmake build (30mins). WSL freezes. This started happening a couple of months ago and basically makes WSL useless. Seems like compiling in Visual Studio in Windows and compiling in WSL makes it worse, but that may be a coincidence. This happens with multiple fresh Ubuntu installations too.

Also get:
The Windows Subsystem for Linux service is stopping........
The Windows Subsystem for Linux service could not be stopped.

A reboot is all that works

@XhmikosR
Copy link

XhmikosR commented Nov 19, 2022

I uninstalled any WSL Windows updates, removed my distro, disabled Virtual Machine Platform and WSL features and then installed everything again including the latest WSL from the store. I also gave WSL 6GB of RAM instead of 4GB that I was using in .wslconfig. Now it seems I'm not hitting the issue anymore, but unsure which thing exactly has fixed it.

@GeksterOne
Copy link

GeksterOne commented Dec 28, 2022

I have the same with
WSL version: 1.0.3.0
Kernel version: 5.15.79.1
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2364

Happens mostly in Pychar,/Intellij. Sometimes in terminal

Considering to stop using wsl in favour of Linux itsef. If it haven't fixed after an year seems Microsoft spoiled technology and doesn't know what to do.

@alonbl
Copy link

alonbl commented Dec 28, 2022 via email

@GeksterOne
Copy link

Added this. It helped in case where Pychar worked with files stored in normal file system using python enterpreter from WSL. But didn;t help for Pycharm with projects stored in WSL file system. Product still as raw as meat in my fridge.

@achs0
Copy link

achs0 commented Jan 27, 2023

Happens for me, too. WSL freezes completly while consuming 100% CPU.
Using Docker Desktop with a local k8s in WSL, which is more in standby because i am currently testing stuff in k8s.

WSL-Version: 1.0.3.0
Kernelversion: 5.15.79.1
WSLg-Version: 1.0.47
MSRDC-Version: 1.2.3575
Direct3D-Version: 1.606.4
DXCore-Version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows Version: 10.0.19045.2486

Processor	AMD Ryzen 9 5900X 12-Core Processor, 3701 MHz
RAM                64 GB

What only helps for me is to kill WSL in Task-Manager and start everything up again. It's annoying.

@lawndoc
Copy link

lawndoc commented Feb 2, 2023

Happens for me, too. WSL freezes completly while consuming 100% CPU.

Same problem, same versions for everything. It looks like our problem is being tracked over here

@funduck
Copy link

funduck commented Feb 10, 2023

Windows 11
Dell XPS 9310
WSL version: 1.0.3.0
Kernel version: 5.15.79.1
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.1105

I get freezes every day but cannot tell what exactly causes them. I work in WSL in VScode, so I catch freezes always there, not in windows apps.
I do not use docker.
I tried to collect logs but not sure if there is anything useful since system just freezes and I can't finalize the script for logs.
https://drive.google.com/file/d/1ooPJDSywR-ogUjRR9_k6CIqACKIpgMsq/view?usp=sharing

@mungojam
Copy link

see #9454 (comment) for the brute force command to shutdown WSL. Before that I always had to reboot

@funduck
Copy link

funduck commented Feb 16, 2023

The problem is that whole Windows11 is frozen and I can't do anything except reboot hardware by holding power button for 10 seconds.

@samd1993
Copy link

wsl.exe --shutdown works for me but I am unable to run anything for a long time on this.
I wonder if people experiencing this are utilizing multiple cores? Seems to happen to me when I run a CPU intensive program (CPU usage going to 100%)

@GeksterOne
Copy link

After almost half a year since Wsl stopped working and lack of reaction from Microsoft side I think it time to state that Wsl project is dead. It time to move on towards linux/mac systems for development. We stopped using Windows/wsl in our projects

@samd1993
Copy link

After almost half a year since Wsl stopped working and lack of reaction from Microsoft side I think it time to state that Wsl project is dead. It time to move on towards linux/mac systems for development. We stopped using Windows/wsl in our projects

Does using Ubuntu on Windows 10 count as this, or did you switch to actual linux machines

@GeksterOne
Copy link

GeksterOne commented Mar 25, 2023 via email

@schiorean
Copy link
Author

I am the original reporter of this issue and it has been fixed for a while already. WSL is not dead at all, am using it daily at my day job.
Your problem is a different problem, please open a new issue.

@jacksimpsoncartesian
Copy link

I'm still getting this issue happen to me - I've wiped and reinstalled Windows 3 times, had Dell replace hardware like the battery and hard drive. Issue only happens after I've installed WSL, and then it occurs multiple times a day.

@lawndoc
Copy link

lawndoc commented Oct 16, 2023

Hey all, I have had this problem for well over a year and I finally solved it for myself. For me the bug was when WSL generates the /etc/hosts file inside of the Linux subsystem. The hosts file generator appended a bad control character of some kind after the line that contained my hostname. Removing the character wasn't enough because WSL would regenerate the file, so I had to create /etc/wsl.conf and add

[network]
generateHosts=false

Then removing the bad character fixed it and WSL works as expected.

I don't have this issue on my home computer, so part of me wonders if this issue happens only on domain-joined systems when the hostname line includes a domain computer name. I have no evidence to back this up of course, but I'd be curious to hear from anyone else who this helps whether or not the system with the issue was domain-joined.

@befuddled9
Copy link

This happens to me as well when I have my laptop sleep with an SSH session. I would think that the session would just timeout/die but it hangs.

@Appla
Copy link

Appla commented Dec 5, 2023

After upgrading to https://github.com/microsoft/WSL/releases/tag/2.0.12 the issue disappeared

@Rskut Rskut mentioned this issue Dec 6, 2023
2 tasks
@vlinx
Copy link

vlinx commented Dec 7, 2023

try

wsl --update --pre-release

@xz-viewray
Copy link

Just updated to wsl 2.0.14. Will see if the issue has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests