Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL filesystem keeps accumulating corrupted file and directory entries #1662

Closed
ebluestein opened this issue Feb 4, 2017 · 43 comments
Closed

Comments

@ebluestein
Copy link

I keep ending up with inaccessible files and directories, usually after trying to perform a source control operations that delete or edit a bunch of files. I'm manipulating the directories from bash, and the files are all on my windows file system under /mnt/c/Users/...

From the bash side, I see something like this:

ls -la
ls: cannot access 'contrib': No such file or directory
ls: cannot access 'hooks': No such file or directory
total 12
drwxrwxrwx 0 root root 0 Feb 3 17:48 .
drwxrwxrwx 0 root root 0 Feb 3 17:47 ..
d????????? ? ? ? ? ? contrib
drwxrwxrwx 0 root root 0 Feb 3 17:48 .git
d????????? ? ? ? ? ? hooks

On the Windows side, trying to navigate to the directory in Explorer returns access denied. Trying to view or change the permissions also results in access denied. Restarting LxssManager usually clears up the issue, but sometimes I've had to reboot. My IDE on Windows was trying to read one of these files and ends up just hanging forever. One time when this happened, I saw the CreateFile call from the IDE fail with DELETE_PENDING in SysInternals process explorer, but that error does not always occur.

  • Your Windows build number: 15014.rs_prerelease.170115-1253
@Warblefly
Copy link

I've seen this too on build 15025, haven't noticed it before. Exactly the same symptoms. After a couple of minutes, and several attempts to delete the inaccessible file both from bash and Windows Explorer, the file disappeared of its own accord.

Reverted to 15019 because of another problem (that's apparently been fixed for the next release).

@fpqc
Copy link

fpqc commented Feb 5, 2017

Just close bash I think it should clean these up by itself . iirc @Manouchehri had this problem and they just disappeared after closing bash and maybe rebooting.

@ebluestein
Copy link
Author

No, closing bash does not resolve the problem.

@Manouchehri
Copy link
Contributor

Try rebooting.

@ebluestein
Copy link
Author

The poison directory entries do not disappear on reboot either, but after rebooting and ensuring LXSS is not running, it is generally possible to delete them from the cmd side of the house.

@ebluestein
Copy link
Author

Still seeing this on build 15048, it's extremely disruptive. Any update?

@aseering
Copy link
Contributor

aseering commented Mar 9, 2017

For what it's worth, this has never happened to me. I've used DrvFs quite a bit on each of the builds listed above.

Do y'all have a reproducer? Ideally a script or series of commands that consistently (possibly after a few retries) results in a corrupt file.

@ebluestein
Copy link
Author

Yes, 100% repro any time I try to delete a directory.

$ mkdir ./foo
$ rmdir ./foo
$ ls -la
ls: cannot access 'foo': No such file or directory
total ...
drwxrwxrwx 0 root root 512 Mar 8 22:01 .
drwxrwxrwx 0 root root 512 Mar 6 18:59 ..
d????????? ? ? ? ? ? foo

Same repro if I use rm -rf instead of rmdir.

Happy to collect any relevant logs.

@aseering
Copy link
Contributor

aseering commented Mar 9, 2017

Hm... That works fine for me:

$ mkdir ./foo
$ rmdir ./foo
$ ls -la
total 4097
drwxrwxrwx 0 root root 512 Mar  9 09:36 .
drwxrwxrwx 0 root root 512 Mar  9 09:35 ..

I unfortunately don't know what logs would be relevant. Hopefully one of the WSL devs will chime in?

One theory: Do you happen to have any sort of third-party antivirus or disk-indexing tool installed (or any other program that might try to read on-disk content as soon as it's written)? If so, could you try either disabling it temporarily, or telling it to ignore some directory and trying this tests again in that directory, to see if that fixes the issue?

@ebluestein
Copy link
Author

Ok I've narrowed down the issue actually - the file trees in which this repros is being watched by a file system watcher used by source control, it uses the inotify syscall that WSL added a couple months ago. Best guess would be the inotify implementation is hanging on to a reference to the underlying directory object after it's been deleted, causing it to hang around until LxssManager is stopped.

@sunilmut
Copy link
Member

sunilmut commented Mar 9, 2017

Adding @JasonLinMS to see if he can help here.

@ebluestein
Copy link
Author

Thanks Sunil!

@JasonLinMS
Copy link

Inotify on DrvFs does hang on to the watched file, which means that when the file is deleted, it will be in this "limbo state" where it still shows up in Explorer and gives an access denied when you try to open it, but the file should disappear and be cleaned up after the inotify watch on the file is stopped (and all other open handles to the file closed). Worst case you would need to close Bash for the files to be cleaned up properly. The fact that you need to shutdown LxssManager or even reboot is strange, do you have any repro instructions for easily getting some files corrupted?

@ebluestein
Copy link
Author

The file watcher I'm using is Watchman (https://facebook.github.io/watchman/)

My repro steps would be:

  1. Install Watchman
  2. Have watchman watch a directory ($ watchman watch-project /mnt/c/some/dir/path)
  3. Create a new directory under the watched directory ($ mkdir /mnt/c/some/dir/path/foo)
  4. Remove the new directory

This is a 100% repro for me.

I think DrvFS needs to let go of its handle to the watched file if the file is deleted, this isn't the behavior you'd see on Linux. These types of fs watchers don't typically exit, so the file would never be let go of. Exiting all instances of bash usually (but not always) gets it out of this state, but that is super painful because bootstrapping this filesystem watcher takes a really long time for a large directory (like on the order of 10 minutes!) and I have to do that every time I perform an operation that deletes a directory.

Occasionally when I get into this state, and exit all instances of bash, when I try to start bash again I get error 0x8000005. In those cases I have to reboot to get WSL working again.

Thanks!

@ebluestein
Copy link
Author

I just got into this state without watchman running actually. Exiting bash and stopping LxssManager did not resolve the issue, forcing me to reboot. Are there any logs I can collect for you?

@JasonLinMS
Copy link

Do you mean that this issue still happened even without you running any file watching program?

@ebluestein
Copy link
Author

Yes, that's correct - this time it happened without any file watching program.

@JasonLinMS
Copy link

I tried to repro your issue with watchman, but couldn't unfortunately.
If I create a new directory (under the watched directory) and then delete it, the new directory does indeed hang out in explorer, until I close all my Bash windows. This part is expected - it is a limitation of NTFS that we are working to resolve.

@JasonLinMS
Copy link

Just throwing ideas out here, have you ever tried re-installing your entire Bash installation?

@ebluestein
Copy link
Author

Now I'm wondering if I'm falsely blaming Watchman (seemed like too much of a coincidence that it reproed for dirs under the watch root, but not elsewhere).

I have not tried reinstalling Bash, might be worth a shot.

@JasonLinMS
Copy link

Well, the issue you are seeing is definitely due to Bash hanging on to file handles, and inotify does open a lot of handles. If you are seeing this issue so often, it may be worth tar-ing up all your files in /home, then doing a fresh re-install of Bash. But I can't guarantee that your issue will be resolved.

@ebluestein
Copy link
Author

Fair enough, thanks for your help!

@JasonLinMS
Copy link

No problem, hopefully it can resolve this issue. The interesting thing is that only an extremely small number of users have reported issues similar to this.

@lephyrus
Copy link

lephyrus commented Mar 17, 2017

@JasonLinMS For what it's worth, I'm hitting this as well - "this" being the issue of open file handles after DrvFs file manipulations, which lead to inaccessible and undeletable files. The "manipulation" in my case is doing a gradle build and then cleaning it. I'm on build 15048.

In contrast to the inotify-related issue, WSL sees no relevant open handles (using lsof), and as reported by @ebluestein, the problem persists even after closing all WSL windows. From the Windows side, the SYSTEM process (PID 4) is reported to still hold open handles on the affected files.

So far, only a reboot solves the issue for me. I've never tried restarting LxxsManager before, but now that I did, it actually hangs in the "stopping" state. Rebooting in the middle of your work is pretty frustrating - anything we can try or do to help solve this?

@JasonLinMS
Copy link

@lephyrus Please help answer the following: So have you been using any inotify-based file watchers when this issue repros? Also, do you have any third-party file system related drivers/filters/apps installed or any third-party anti-virus software installed? Have you reinstalled WSL recently, or have you been using the same WSL installation across many Insider builds.

@lephyrus
Copy link

@JasonLinMS I'm afraid I can't repro the problem currently, so I can't really comment with certainty. I'm pretty sure there was no watcher running - as I said, the effects of that looked different and I could solve them "inside" WSL, not needing to reboot. I used to run Novell Filr (cloud sync), which actually caused BSODs in connection with WSL - I've reported this issue and am not running it since. I'm not aware of having installed any other possibly relevant software and am only using Windows Defender. I've used this WSL installation across 2 or 3 slow ring builds.

Not being able to repro reliably, I can't really make clear-cut statements... sorry.

@francesco1119
Copy link

Here we go: I created a folder with mkdir support, I then remove the folder with rm -R support and
hell

@joelebeau
Copy link

Just to add more info to this - I ran into this problem with a directory in /mnt/d/ and after about 30 minutes of troubleshooting, I realized that I had it added as a project folder in Atom (running at the time). I closed Atom and it then allowed me to remove the directory.

@mathiscode
Copy link

@joelebeau - Thank you! Removing the project folder from Atom also resolved this issue in my case.

@atif-hussain
Copy link

i am getting the same error. i file MPC.cpp i renamed to MPC_cpp for a backup (ie. not in use by any program), and now either of them not letting themselves be deleted, with file permissions showing as '-?????????' and all file properties in listing as '?' what do i do?

@ebluestein
Copy link
Author

Reboot.

@GRX
Copy link

GRX commented May 15, 2018

Is there an alternative for this by now?
Been working perfectly for a week, now it keeps occurring on every session...

@tara-raj
Copy link

Please try this on Windows Build 1803 and reopen if you run into the same issue

@williscool
Copy link

I'm on version

OS Name	Microsoft Windows 10 Home
Version	10.0.17134 Build 17134

and this is still an issue.

really crappy to have to constantly restart my computer because some program (i.e. Android Studio) wont work properly because it can't delete a directory... that has nothing in it

@williscool
Copy link

williscool commented Nov 12, 2018

@tara-raj pls help . I am working on a react native app and using android studio

between watchman for react-native file indexing and android studio building the android app I have to restart my computer LITERALLY EVERY TIME I WORK ON THE APP.

Multiple times :(

One or the other is bound to hang or leave some phantom deleted file

UPDATE!:

Running watchman in the foreground Immediately killing killing it if I android studio starts being unable to delete things seems to consistently mitigate.

Still would be nice to not have to do that.

@pkit
Copy link

pkit commented May 2, 2019

Err, how to reopen it? Still happens on the newest fully patched builds.
Literally only reboot helps. I see no processes holding locks or accessing files in sysinternals process explorer. But files/dirs are still locked and cannot be accessed!

@pkit
Copy link

pkit commented May 2, 2019

@JasonLinMS

  • So have you been using any inotify-based file watchers when this issue repros?

Yes, WebStorm, but the issue remains after webstorm is closed. And all bash windows are closed.
The only thing that helps is reboot, after reboot the files are no longer found anywhere.
Literally the only things that access these files are: WebStorm, hyper.io -> wsl bash, that's it
Ah, and sometimes they are accessed by "Windows search indexer" (never remember the correct name for this one).

  • Also, do you have any third-party file system related drivers/filters/apps installed or any third-party anti-virus software installed?

No

  • Have you reinstalled WSL recently, or have you been using the same WSL installation across many Insider builds.

Recent fresh install.

Once more: Sysinternals Process Explorer can not find any process that holds these files open.

@ashenwgt
Copy link

ashenwgt commented Jun 9, 2019

OS needs a reboot to clean corrupted files sometimes (e.g. above scenario - when link count gets zero).

Detailed explanation here: https://superuser.com/a/1441014/629535

@nevyn
Copy link

nevyn commented Nov 7, 2019

Hey, I'm also seeing this frequently. I'm using Github Desktop, Visual Studio and Visual Studio code. If I touch stuff in both the VSCode WSL instance and Visual Studio things tend to go wonky, even though I'm not (afaik) using any file watchers. The result for me is undeleteable folders, from either Windows or WSL.

@abalter
Copy link

abalter commented Dec 25, 2019

Still a problem for me.

@francesco1119
Copy link

@LiamKarlMitchell
Copy link

LiamKarlMitchell commented Sep 23, 2020

On WSL 2 and still have this issue sometimes two different PC.
Can work around it by moving parent directory to /tmp and recreating... but its a pain in ass.

Files in WSL file system, accessing with editor and gitkraken over network mapped drive if that helps at all.

@lygstate
Copy link

anyway to fixes it in wsl1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests