Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DrvFs: compiler reports file missing, but it is present #2712

Closed
Warblefly opened this issue Dec 1, 2017 · 60 comments
Closed

DrvFs: compiler reports file missing, but it is present #2712

Warblefly opened this issue Dec 1, 2017 · 60 comments
Assignees

Comments

@Warblefly
Copy link

On build 17046, while compiling the GCC cross-compiler, it bailed out with a "No such file or directory" error when referencing builtins.def — but the file is present. Could this be a variant of #2448 or #2464?

The error:

g++ -fno-PIE -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings   -DHAVE_CONFIG_H -I. -I. -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/. -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/../include -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/../libcpp/include -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/bld/gcc/./gmp -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gmp -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/bld/gcc/./mpfr/src -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/mpfr/src -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/mpc/src  -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/../libdecnumber -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/../libdecnumber/bid -I../libdecnumber -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/../libbacktrace -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/bld/gcc/./isl/include -I/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/isl/include  -o tree-streamer-in.o -MT tree-streamer-in.o -MMD -MP -MF ./.deps/tree-streamer-in.TPo /mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/tree-streamer-in.c
In file included from /mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/tree.h:23:0,
                 from /mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/tree-streamer.c:27:
/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/tree-core.h:213:24: fatal error: /mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/builtins.def: No such file or directory
compilation terminated.
Makefile:1099: recipe for target 'tree-streamer.o' failed
make[1]: *** [tree-streamer.o] Error 1
make[1]: *** Waiting for unfinished jobs....

But the file reported as missing is present.

ls -l /mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/builtins.def
-rwxrwxrwx 1 root root 83663 Dec  1 00:38 /mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/builtins.def
@Warblefly
Copy link
Author

After a clean re-start, the compilation has failed again, exactly as above, but with a different file:

/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/gcc/builtins.def:1001:29: fatal error: chkp-builtins.def: No such file or directory

But the file is alive and well, where it is supposed to be.

@Warblefly
Copy link
Author

After a further clean re-start, the same compilation fails in a different place. A file appears to be missing, but it is safely in the right place when found with ls:

In file included from /mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/libgfortran/libgfortran.h:44:0,
                 from /mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/src/gcc-7.2.0/libgfortran/generated/pack_c16.c:26:
/mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/bld/gcc/gcc/include/stddef.h:1:15: fatal error: stddef.h: No such file or directory
 #include_next <stddef.h>
               ^~~~~~~~~~

but:

$ ls -l /mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/bld/gcc/gcc/include/stddef.h
-rwxrwxrwx 1 root root 14165 Dec  1 23:08 /mnt/e/Users/john/Documents/ffmpeg/MultimediaTools-mingw-w64/sandbox/bld/gcc/gcc/include/stddef.h

@nutcasev15
Copy link

@Warblefly
The workaround posted in this link fixes this problem temporarily.
#2448 (comment)
It worked for me. I'm on build 16299.64. Need to see if the cumulative patch of 16299.98 fixes this.

@Warblefly
Copy link
Author

Thank you for replying. I'm afraid this workaround, which I've already tried, is unsuccessful in build 17046.

It is a slightly different problem: the file is occasionally simply "not found"; this isn't an EINVAL return.

@therealkenc
Copy link
Collaborator

#2484 (sic).

@Warblefly
Copy link
Author

Well spotted, @therealkenc.

@nutcasev15
Copy link

@Warblefly Seems like you got all the compilation issues one can get. Lel
Sorry to hear it doesn't fix your problem. My theory is its a fallback as the command states. It remounts the drive with a previous version of the DrvFS driver. Since I'm on 1709 16299.98, the fallback driver for me is the 1607 driver module. Which works great.

Maybe for you since you are beyond 1709, your fallback is the faulty one present in 16299 because its the current stable one.

Anyway my issue was this one: #2464 (comment)

I just posted in related threads to see if someone knew about it because there was no specific issue for it until now.

@therealkenc The cumulative update to 16299.98 fixes this issue. But it seems performance has taken a hit. Kernel compile went from 240~ with ccache to ~350 with ccache.

@Warblefly
Copy link
Author

Thanks for the info @nutcasev15 — indeed, whether or not the DrvFs filesystem is mounted with, or without, the "fallback" workaround given some time ago, the fault occurs. It makes the job impossible, unfortunately.

@AnneTheAgile
Copy link

AnneTheAgile commented Dec 6, 2017

  1. @Warblefly are you on the latest, per yarn install fails inside /mnt with EINVAL for lstat during step 3 #2448 : " The November 30th, 2017 Cumulative Update (KB4051963) for the Fall Creator's Update contains the fix for this issue [2448]. Please install this update, and let us know if you encounter any further issues. To verify if you have installed the update, your OS build number should be 16299.98 or later."

2.Crossref, possibly related; RoliSoft/WSL-Distribution-Switcher#69

@Warblefly
Copy link
Author

Warblefly commented Dec 6, 2017

@AnneTheAgile Thank you for asking these questions. I am happy to offer this information:

(1) The build is 17046.rs_prerelease.171118-1403.
(2) It is difficult to say precisely at this point, but there was an earlier build on which my compilation did work correctly and repeatedly (other than the occasional instance of "fork" not working as reported in #2469)

@DHowett-MSFT
Copy link

A few members of my team (myself, @rajsesh, @yiyang-msft) are seeing this as well, on builds in the 1705x-1706x range.

@Warblefly
Copy link
Author

Correction: I mis-typed the build. Here, it's 17046.

@heldchen
Copy link

this can be "reproduced" fairly "consistently" in my setup in several completely different use cases, see (duplicate) issue @Warblefly linked above.

@DavidCizek
Copy link

This issue can be reproduced in my environment during GCC crosscompiling (Windows version 17063). Version 17040 works fine.

@heldchen
Copy link

still an issue in 17074

@nstrelow
Copy link

Please fix this, it is sooo annoying

@sunilmut
Copy link
Member

Does anyone has a targeted repro for this issue? I can try with @Warblefly's project, but compiling that seems to be a long process.

@heldchen
Copy link

heldchen commented Jan 18, 2018

unfortunately the errors (in my cases) only happen when there's lot of file actions going on, i.e. during long compile/render processes.

@sunilmut
Copy link
Member

Doesn't matter from which angle you look at this issue, it looks like a race condition. It will be a difficult one to chase. But, if someone can provide some specific repro here with any good degree of repro, it will be very helpful.

I tried @Warblefly's project, but, it has lot of dependencies and the instructions in the repo do not cover all of them. Chasing each one of them down seems time consuming. I am trying other projects to see if I can get a local repro.

@Warblefly
Copy link
Author

Thank you for checking @sunilmut — just updated the repo from my live, functioning tree. I'm now compiling it under Ubuntu under Hyper-V Manager and it is perfect. But under Ubuntu Bash on Windows, the failures at random points continue. I don't test Ubuntu Bash on Windows any more; not enough time.

@DavidCizek
Copy link

DavidCizek commented Feb 14, 2018

Isssue still exists in 17093. I tried to create simple program, which simulates this behavior. On DrvFS reproduces problem, on VolFS works without errors.

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>

void *thread(void *arg)
{
	long num = (long)arg;
	
	while(1)
	{
		int handle;
		const char *fileName = (num % 2) == 0
				? "/mnt/d/Linux/1/Test.c"		// existing file
				: "/mnt/d/Linux/2/Test1.c";		// nonexisting file in nonexisting directory
		
		handle = open(fileName, O_RDONLY);
		if (handle == -1)
		{
			if ((num % 2) == 0)		// existing file
			{
				printf("Existing file open error %d !!!!\n", errno);
			}
			
			continue;
		}
		
		close(handle);
	}
}

#define THREAD_NUMBER	16

int main(void)
{
	pthread_t t[THREAD_NUMBER];
	long i;
	
	for (i = 0; i < THREAD_NUMBER; ++i)
	{
		if (pthread_create(&t[i], NULL, thread, (void *)i) != 0)
		{
			printf("Thread creation error\n");
			return 0;
		}
	}
	
	while(1)
	{
		sleep(0);
	}
}

@heldchen
Copy link

in my scenarios, the error also happens on VolFS, albeit not as often as on DrvFS

@HinTak
Copy link

HinTak commented Feb 26, 2018

I think It is a problem with make -j X ?

@SvenGroot
Copy link
Member

We believe we have found the cause of this issue, and are working on a fix. We're trying to get this fix into the upcoming Spring Update as well (probably as an update after release).

@SvenGroot SvenGroot added the bug label Mar 23, 2018
@heldchen
Copy link

heldchen commented May 25, 2018

@therealkenc as said in a few comments before and shown in the above strace, the issue still exists for unlink actions when there is lot of I/O and concurrency. the test case does only use open actions which I can confirm are fixed for me as well since 17655.

@therealkenc
Copy link
Collaborator

Repetition is not constructive.

@heldchen
Copy link

heldchen commented May 25, 2018

just wanted to be helpful by pointing out that the test case only tests one part of the problem, or one problem in a family of problems, and that closing this issue might have been premature.

but before I start "repeating" myself again: where do you want me to continue with my unlink-while-heavy-concurrency-i/o: new issue? #2780? here?

@Brian-Perkins
Copy link

@heldchen - the ESPIPE error you are seeing looks like it is on stdin (fd=0) so is likely unrelated. If you can get a repro where one of the other file system commands fails it would be great if you could share (we can route depending on what the error is, and if there is already an existing issue covering it). Unlink on /mnt/x (DrvFS) has had a few reported issues but has been improving over time.

@heldchen
Copy link

thanks @Brian-Perkins for confirming the ESPIPE is not relevant.

turns out @therealkenc was right to close this issue after all. I spent some more hours logging all threaded unlink-actions using strace and comparing them. it looks like the Magento framework using a file-based self-rolled locking mechanism. when one thread acquires a lock. unfortunately they did not account for multiple threads trying to acquire the lock at the same time, so in some rare circumstances more than one thread would think that it owns the lock. in these cases, according to my strace logs the chances are actually quite high that two threads try to release their lock by unlinking the file at virtually the same time. in these cases, PHP's file stats cache seems to momentarily reports outdated info to the losing thread, thus resulting in the "errors" I've been observing.

@therealkenc
Copy link
Collaborator

I guess this manifests is 17134 (1801 RTM).

@sunilmut
Copy link
Member

This rest of this bug should be fixed in 17677. Do let us know if this issue is completely resolved.

@jowadmax
Copy link

@sunilmut thanks for the fix!
Will this bugfix get backported to 1803?

@sunilmut
Copy link
Member

@jowadmax - Once we get some kind of confirmation (from the community) that the issue is fully resolved by the fixes, yes, the fix will be considered for backporting to 1803.

@nkokla
Copy link

nkokla commented Jun 11, 2018

@sunilmut - Hello and thanks for the fix !
I am also impacted by this issue after1803 update and this is very restrictive on my job. I am not sur I can wait the patch. Do you know if a fresh installation (reinstall Windows) can solve this problem ?

@pgroke-dt
Copy link

@sunilmut Is there a way to install the insider build without permanently enabling preview builds? I might be able to convince my company to let me install one preview build so I can check if the fix works, but I sure don't want to get preview builds all the time.
(I'm seeing this problem many times during one build job, i.e. I have to re-start the build a dozen times or so until it finally succeeds. So if it's fixed, I should be able to tell with good accuracy.)

@jowadmax
Copy link

@pgroke-dt you might be able to install the preview on a virtual machine and run your compile job for testing. You can use VirtualBox's shared directory feature to access your local files from the VM.

Unfortunately there are no ISOs available for the Insider Builds (afaik), so you'll need to enroll the Windows version that's on the virtual machine into the Insiders Program.

@WillsonHaw
Copy link

Can confirm this issue is fixed for me on build 17692. Would love to have this backported to 1803 so I don't have to leave my work machine on a Insider build.

@pgroke-dt
Copy link

@jowadmax Thanks for the tip, unfortunately I don't have enough space for another Windows installation in a VM on my system.

@pgroke-dt
Copy link

@sunilmut
How are things looking regarding a back-port? It's still some months until 1809 RTM and then some months until it's tested and green-lighted by our IT, so a back-port to 1803 would really help us.

@Mike-Horwitz
Copy link

@sunilmut A backport would be great. My 1803 is frustrating me.

@tstackhouse
Copy link

tstackhouse commented Aug 20, 2018

@therealkenc Seeing this in 17134 (1803), confirming your suspicion. A backport or workaround would be wonderful as I'm unable/unwilling (requirement of Microsoft Account/non-system local account) to install one of the insider builds.

Update: Jumped through the hoops to get on insiders and installed build 17741. Confirmed fixed for me.

@therealkenc
Copy link
Collaborator

Yeah the December OP reported 17046 (before 17134) so there was no real reason to expect it to work on the April Update. My marking it insidertransient for a few days was just "wrong" on my part; apologies. It was reported that 16299 is okay, fueling the incorrect assumption. At the time it was somewhat confusing as to what broke when, what was being fixed precisely, and in what release (#2484 still flaps in the wind uncategorized).

Beyond that, four asks for a backport is now three too many. I don't have any specific insight on the interworkings of the backport process (and, I get the impression it is somewhat opaque to the WSL devs as well and not entirely in their control). But, given that it was fixed in late May, this is mid-August, and the Redstone 5 (1809) release is already starting to be locked down for October... if it were me, I wouldn't place a large bet on a QFE. Who knows. Could happen. Either way, rest assured all the devs get all the issue updates whether you ping them or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests