Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vhdx files becoming corrupted since 2.0.4.0 pre-release install #10609

Open
1 of 2 tasks
jtabox opened this issue Oct 8, 2023 · 27 comments
Open
1 of 2 tasks

vhdx files becoming corrupted since 2.0.4.0 pre-release install #10609

jtabox opened this issue Oct 8, 2023 · 27 comments

Comments

@jtabox
Copy link

jtabox commented Oct 8, 2023

Windows Version

Microsoft Windows [Version 10.0.22621.2361]

WSL Version

2.0.4.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

5.15.123.1-1

Distro Version

Ubuntu 22.04

Other Software

No response

Repro Steps

I can't really reproduce this, I was mostly wondering if anyone else has had their virtual hd's gradually becoming corrupted with the pre-release version of WSL2.
I installed v2.0.4.0 two days ago, had an Ubuntu 22.04 (the standard distro from the store) already installed. After some time I suddenly started getting filesystem read-only errors. I've had this distro installed (same vhdx file) for almost a year now, never had similar issue (or any other for that matter). Only things that had changed now is the pre-release version and 2 settings activated in .wslconf, autoMemoryReclaim=gradual and sparseVhd=true (I also ran --manage --set-sparse true for my already existing image).
Google said it's probably a corrupted disk image, e2fsck found errors and supposedly repaired them, but the read-only errors persisted. I loaded a copy of the vhdx file that I had from before the pre-release install, soon enough it also started throwing read-only errors. Debug console showed the root filesystem was being mounted with errors, I would correct them with e2fsck but they persisted.
I ended up nuking both files, and did a fresh distro install. Went fine but at some point it started throwing corruption errors too, this time dpkg wouldn't run because of corrupted files.
I've now spent the last two days uninstalling and reinstalling WSL and testing out distros, but have been having the same issue. I wonder if it could be the sparseVhd option. Has anyone else had any similar issues? I've deactivated the option for now and watching if I get a corrupted file again, if I do I'll probably revert to the previous release version.

Expected Behavior

Mainly I'd expect my vhdx files not becoming corrupted 😅

Actual Behavior

They became corrupted.

Diagnostic Logs

No response

@benhillis
Copy link
Member

@jtabox - is it possible your distro vhd is full?

@OneBlue
Copy link
Collaborator

OneBlue commented Oct 9, 2023

/logs

@jtabox
Copy link
Author

jtabox commented Oct 9, 2023

@jtabox - is it possible your distro vhd is full?

No, I don't think so. As I said, this happened with fresh installs of the Ubuntu 22.04 directly from the Store, so I hadn't installed many things. Maybe a miniconda installation at most, the vhdx files were never above 2 Gb.

In regard to logs, I have actually not had a similar issue ever since I deactivated sparseVhd in the configuration file. I installed a lot of things by now, the image is at 30 Gb at the moment and it seems to be working fine. So, I don't know how much use any logs would have now.
I'm not sure if my corruption issues were directly related to sparseVhd option, or indirectly, in some obscure way. But the fact is I haven't had any corruption the last day or so, while previously with sparseVhd enabled, the image would become corrupted within an hour or two.
I'll try to make a backup of my working installation and re-enable sparseVhd and see if the issues come back.

@OneBlue
Copy link
Collaborator

OneBlue commented Oct 10, 2023

Thank you @jtabox. Can you try to reproduce the issue under log collection ? We'd need to to see logs from when the disk becomes normal to corrupt to root cause the issue.

/logs

@ASleepyCat
Copy link

ASleepyCat commented Oct 11, 2023

I've also had this issue when enabling sparseVhd and --set-sparse on Ubuntu. It's also affecting files outside the VM for me:

  • Games that are on my C:/ drive were getting corrupted (random crashes, softlocks). Reinstalling them fixed the issues
  • Steam forgot that most of my games were installed on my D:/ drive
  • I was getting notifications to restart my computer to fix disk corruption

These issues started from the very first 2.0.0 pre-release version.

Edit: My drive got corrupted again, although this time I had disabled sparseVhd on my distro. I guess my C:/ drive corruption issues are unrelated?

@NGRhodes
Copy link

NGRhodes commented Oct 11, 2023

This is a clean Win11 install today, fully updated and running 2.0.4 with sparseVhd in my .wslconfig and --set-sparse against Ubuntu

I run touch test sucessfully, activate a conda env and try running pycharm from Windows (remote connection to WSL2). Try touch test2 and get a readonly error. All happens within a few minutes.
Here are my logs for the above.

WslLogs-2023-10-11_22-06-43.zip

This is chkdsk straight after:

The type of the file system is NTFS.

WARNING!  /F parameter not specified.
Running CHKDSK in read-only mode.

Stage 1: Examining basic file system structure ...
Attribute list for file 6196 is corrupt.
Attribute list for file 6197 is corrupt.
  270592 file records processed.
File verification completed.
 Phase duration (File record verification): 5.54 seconds.
File record segment 1783E is an orphan.
File record segment 1783F is an orphan.
File record segment 2FBD0 is an orphan.
File record segment 2FBD1 is an orphan.
  8188 large file records processed.
 Phase duration (Orphan file record recovery): 10.84 milliseconds.

Errors found.  CHKDSK cannot continue in read-only mode.```

@jtabox
Copy link
Author

jtabox commented Oct 11, 2023

@NGRhodes So you're getting similar errors I assume? With sparseVhd active? At least I'm not the only one, I haven't seen any similar feedback so I was worried it's something specific to my PC. Do you use antivirus software? Besides Microsoft's Defender. I've installed Avast Free Antivirus recently, and it's been a bit too eager to block and meddle in stuff in general, so I was wondering if it's related in some way.

@ASleepyCat Luckily I haven't had any issues with anything else outside the vhdx file getting corrupted. Gotta admit it sounds a bit far-fetched, I'd assume ´sparseVhd´ only affects the vhdx files, though I might be totally wrong here.

@NGRhodes
Copy link

NGRhodes commented Oct 11, 2023

wsl --manage Ubuntu --set-sparse false and I can still reproduce the error.

@jtabox - I have tried with Kaspersky and Windows Defender and the drive goes readonly in both cases.

@zirco77
Copy link

zirco77 commented Oct 13, 2023

I had a similar issue

Context:

  • had an existing Ubuntu22.04 on WSL 1.2.5 Release. Windows 11 Pro 22H2, build 22621.2361
  • upgraded WSL to 2.0.3
  • added sparseVhd=true in .wslconfig
  • ran --manage --set-sparse true on existing image

Then I started to got random problems, did a few shutdown/restart of WSL, until I figured out that the file system was locking into read-only after less than minute of use after each "reboot" of the WSL distro.

Attempts to fix:

  • removed sparseVhd from .wslconfig, - ran --manage --set-sparse false. No luck, same problems.
  • created a new distro, mounted the broken ext4.vhdx in it, ran e2fsck which found (and reportedly fixed) many errors. Back in first distro I had the same problem (and e2fsck would not find any new errors.)

I ended up copying most of my data/config files from the the first to second distro (Ubuntu22.04 as well), re-installed what I needed, and deleted the first one. It was simply broken.

I've used the second distro every day for over a week and its working totally fine under WSL 2.0.3. I strongly suspect --set-sparse true to be the cause, and I didn't take any change to enable it on the second distro. I didn't have time to try to reproduce the issue though.

@zavocc
Copy link

zavocc commented Oct 22, 2023

This also causes to make the areas of the C: drive dirty! In my case setting sparse would not only cause read only errors on the distro filesystem everywhere but also causes minor filesystem corruption which chkdsk (in windows re) reports free space not being able to properly freed up?

image

I managed however to fix read only file system errors by running e2fsck on WSL system distro wsl --system and by mounting the vhdx which I used
wsl --mount --vhd .\ext4.vhdx --bare and do e2fsck /dev/sdc -f -y and it works, though it could corrupt some files (which in my case my oh my zsh prints a lot of errors)

@ChGen
Copy link

ChGen commented Mar 15, 2024

I experimented with pre-release versions of WSL2 and sparseVhd option too. And I run fstrim in wsl2 too. And it seems that my host Windows 11 23H2 NTFS system is quite corrupted now (and vhd ext4 too, btw), so sfc and dism cannot repair it.
Quite dangerous stuff...

@AlexeyMatskevich
Copy link

I enabled sparseVhd a few months ago, for the last month my filesystem in ubuntu started getting corrupted every other programming session, especially when running docker desktop.
Prior to enabling this option, I had been using this system for over a year and had no problems. Also, the windows file system started to get corrupted when using wsl too, in cases where I don't use wsl, this behaviour is not observed.

@jtabox
Copy link
Author

jtabox commented Apr 15, 2024

I love WSL as a concept and for the amazing utility it offers for free, and I truly appreciate the work being poured into it. But honestly, I'm staying as far away from sparseVhd as humanly possible, at least for the time being.

Since I opened this issue a few months ago, every one of the 3-4 times I changed my mind and decided to give sparseVhd one more try, it has always ended with me straight up deleting the test distro's vhdx file within the first hour of use and having to create a new one from scratch (after deactivating sparseVhd of course) because it's impossible to fix its corruption issues.

There surely must be some kind of interaction between sparseVhd and something on my part, but I can't figure out what it is, after multiple tries. Luckily the host system doesn't seem to have been corrupted so far, but I don't dare use my main Windows PC for my tries.

@BtbN
Copy link

BtbN commented Apr 21, 2024

Just chiming in here, that I've observed exactly what's being described here as well.
Any distro with sparseVhd enabled will eventually suffer fs corruption.
And it also resulted in FS corruption of the Host NTFS, which I was almost about to throw out my SSD for.

@Krmloo
Copy link

Krmloo commented May 21, 2024

Same problem, both with WSL2 corruption and the host drive.

@btrude
Copy link

btrude commented May 26, 2024

I am also experiencing the same WSL and host corruption as everyone else since switching to --set-sparse true.

@widewind2015
Copy link

unfortunately, my vhdx file gets corrupted after hours when I enable --set-sparse true.

@BtbN
Copy link

BtbN commented Jun 6, 2024

unfortunately, my vhdx file gets corrupted after hours when I enable --set-sparse true.

run a chkdsk on your host fs while you still can, and delete any vhds that were in sparse mode.

@Krmloo
Copy link

Krmloo commented Jun 6, 2024

So I've managed to somewhat curb the corruption

  • Remove all WSL distros that have at any point in time been set to sparse mode
  • Delete the vhds themselves
  • Run chkdsk, sfc /scannow, dism restore multiple times until all of them reported no detectable corruption

@CheyenneForbes
Copy link

My VHDXs are showing 0 bytes also, can the data be recovered?

@albertocavalcante
Copy link

It has been almost a year since this feature has been released and for what it looks like we have no fix to the corruption problem yet?

The recommendation at MicrosoftDocs/WSL#1855 should at least be changed.

@jtabox
Copy link
Author

jtabox commented Aug 21, 2024

It has been almost a year since this feature has been released and for what it looks like we have no fix to the corruption problem yet?

The recommendation at MicrosoftDocs/WSL#1855 should at least be changed.

Ever since I opened this thread, I've been consistently and periodically getting notifications of a new post here, so I'm really curious what the cause might be. Still, we're a small minority that's getting the corruption issue, so I assume there must be something specific in our PCs that interacts with WSL in such a catastrophic way. There would be way more open issues if this was a widespread problem.

At this point I've just given up the sparseVhd option completely, and If I'm being honest, even if the issue is fixed in a future update, I still won't be activating the option. The consequences are way too annoying and disrupting, and I don't have any spare PCs to test.

So as long as sparseVhd is not implemented as a default option, I'm fine with it taking time to resolve.

@BtbN
Copy link

BtbN commented Aug 21, 2024

This is still an experimental feature for which you need to go out of your way to set a flag.
And then when your disk corrupts, you also need to notice it, and make the connection that this is the cause.

For me, on multiple systems, turning on the sparseVhd feature very reliably corrupts the filesystem of both guest and host, so I highly doubt it's system dependent.

@aont
Copy link

aont commented Aug 21, 2024

Just FYI.

I was also facing this issue and gave up using sparseVhd.
To me, it seemed using docker made corruption frequently.
Docker may not be the direct cause, but it can be a key to reproduce this issue.

@ChGen
Copy link

ChGen commented Aug 21, 2024

Yes, I've noticed this issue while playing with docker and fstrim commands.

@BtbN
Copy link

BtbN commented Aug 21, 2024

docker probably just creates and deletes A LOT of files, so it gives the sparse stuff a lot more work to do and a lot more chances to make a mess.

@devilhyt
Copy link

Same issue here.

I set the VHD to sparse, got filesystem read-only errors in WSL, fixed them with e2fsck. Then Win11 reported a hard drive issue, and after the automatic repair, I couldn’t boot into it anymore🥲.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests