Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming and recreating the project directory causes Gradle to lose track of changes on Windows #17865

Closed
lptr opened this issue Jul 28, 2021 · 12 comments · Fixed by #17922
Closed
Assignees
Labels
a:bug affects-version:6.7 in:virtual-file-system VFS, file system watching, snapshots re:windows Issue related to using Gradle on Windows

Comments

@lptr
Copy link
Member

lptr commented Jul 28, 2021

In one build we've seen Gradle fail to compile a subproject (:build-agent-maven-test-distribution:compileJava) because it ended up compiling against an outdated version of classes for one of the dependency subprojects (:test-distribution-client).

This is a serious issue as it affects build reliability. Here's the result of a very long investigation about how that happened:

  • We ended up with the wrong classes because we loaded the results for :test-distribution-client:compileJava with the wrong cache key (3be04def081300037cfdc35044441cae) – (scan link).
  • The cache key matched the cache key (correctly) calculated by a previous build on the same daemon that was actually compiling against the earlier version of the source code.
  • In the failing build we used the same (but now incorrect) cache key because Gradle believed that the sources were unchanged compared to the previous build, but they were actually different.
  • This happened because we did not get notifications when the new version of the source code was checked out by Git.

So how did we miss these notifications?

It turns out that we've been using a TeamCity pluing called Build Files Cleaner (Swabra) in that specific build. The purpose of this plugin seems to be to restore build checkout directories to their original state before a new build is started. Swabra does two strange things.

First, on Windows, Swabra uses cmd.exe /c rd /s /q <directory> to delete some stuff. Doing so actually results in changes being reported via their short paths (stuff like build\classes\java\main\com\gradle\ENTERP~1\app\rpc\client\HTTPRP~1.CLA). This was a feature introduced in Windows 95 to support long file names on FAT, but apparently even files on NTFS have short names, and that's what is used to report changes to them if you use rd. (We might need to follow up on this, but it ended up being a red herring wrt this issue; see gradle/native-platform#289.)

Secondly, and this will be important, the previous build of the same project on the same machine failed to check out from Git. Most probably because there was a forced push to the PR or something. The assumption is that this circumstance resulted in Swabra losing track of things, and deleting the checkout directory:

17:29:20 Swabra
17:29:20   Checkout directory state is unknown. Need a clean checkout directory snapshot - forcing clean checkout
17:29:20 Clearing temporary directory: C:\tcagent1\temp\buildTmp
17:29:20 Failed to delete file: C:\tcagent1\temp\buildTmp\hsperfdata_tcagent1\10968
17:29:20 Failed to delete file: C:\tcagent1\temp\buildTmp\hsperfdata_tcagent1\15552
17:29:20 Failed to delete file: C:\tcagent1\temp\buildTmp\hsperfdata_tcagent1\4332
17:29:20 Unable to cleanup build temporary directory: C:\tcagent1\temp\buildTmp

https://builds.gradle.org/buildConfiguration/Enterprise_Master_Component_MavenTestDistributionExtension_CrossVersion_GroupRecentWindows/45017572?buildTab=log&focusLine=42&linesState=38.42

What happened here was that Swabra deleted the whole checkout directory (C:\tcagent1\work\8d3306ac5f39e07b), which is also the project root, and thus the directory Gradle was watching. Doing so should have been reported to Gradle, which should have invalidate the relevant parts of the VFS, but that clearly didn't happen. Why?

The first impression was that this might be related to the short paths being reported when deleting directories via cmd.exe /c rd ..., but it turns out that the root of the deletion is still reported via its long path, so that's not it. It turns out that Swabra uses a different way to delete the whole checkout directory in this case...

https://github.com/JetBrains/teamcity-swabra/blob/a6f46f3050c8fb7e71c5d40a871af19eb8152d45/agent/src/jetbrains/buildServer/swabra/Swabra.java#L453-L462

Long story short, Swabra ends up first renaming the checkout directory, and then deleting it.

Now, Windows curiously does not send any notification about the watched directory being renamed, even though it would send a notification if it was deleted. So Gradle stays ignorant of the fact that the project root was moved. Even more curiously, that's still not the cause of our issue, as when the directory is deleted, even in its new location, native-platform still reports the events using the original location. So we end up with Gradle learning about the original location being deleted. So then what?

Close inspection of the logs above shows that in the build in question, Swabra failed to delete some content in the renamed directory, and hence it did not end up deleting the relocated project directory. In turn, Gradle did not get the notification about the watched directory being deleted (correctly, as it hasn't been), and we kept watching it in its new location (C:\tcagent1\temp\buildTmp). However, Gradle keeps thinking that it is watching the project in the old location (C:\tcagent1\work\8d3306ac5f39e07b).

And then TeamCity re-creates the checkout directory (C:\tcagent1\work\8d3306ac5f39e07b) and populates it by checking out the code afresh. And while Gradle thinks it watches that location, there are no notifications being sent.

Later, when Gradle is invoked to build C:\tcagent1\work\8d3306ac5f39e07b again in the same daemon, it thinks whatever is remaining in its VFS is still valid, and does not detect that the source code for :test-distribution-client has been changed.

There.


cc: @gradle/execution, @paplorinc

@lptr
Copy link
Member Author

lptr commented Jul 28, 2021

One thing we could easily do to prevent this specific problem of Swabra renaming the watched directory is to prevent renaming. We can do it by removing FILE_SHARE_DELETE from the flags we give when we open the watched directory. This would prevent the watched directory from being deleted or renamed.

https://github.com/gradle/native-platform/blob/0f0b4440a027fe795a6d7304c498243b15f4fc8e/file-events/src/file-events/headers/win_fsnotifier.h#L19

This is what IDEs typically do, but it also means that Gradle daemons would need to be killed before their watched directories could be deleted.

As we can't observe the rename, I'm not sure what else we could do.

@lptr lptr self-assigned this Jul 28, 2021
@lptr
Copy link
Member Author

lptr commented Jul 28, 2021

We could also reach out to JetBrains to make Swabra delete the checkout directory in a better way.

@lptr
Copy link
Member Author

lptr commented Jul 28, 2021

Yet another option, just to have everything on the table, is to somehow not enable FSW on CI by default. This would obviously require Gradle to be aware somehow whether we are running on CI or not.

@lptr lptr changed the title File changes missed after TeamCity Swabra removes entire checkout directory on Windows Renaming and recreating the project directory causes Gradle to lose track of changes on Windows Jul 28, 2021
@lptr
Copy link
Member Author

lptr commented Jul 28, 2021

There's another thing we can do: we could (perhaps at the start of the build, or every ten minutes etc.) check if our watchers still point to where they are supposed to. We keep hold of a file handle that we can use to retrieve the full path of the watched directory, and match it against the registered location.

We could do something like FileWatcher.stopWatchingMovedLocations() that would stop watching anything that is not pointing to its originally registered location anymore.

I wonder how useful this would be for other operating systems. Probably not very.

@lptr
Copy link
Member Author

lptr commented Jul 28, 2021

We could also go crazy, and check the current location of the watched directory against the registered path every time we receive notifications. This would slow things down to some degree, though probably not in any significant way. It's a less leaky solution than checking the location at arbitrary times, and it does not require changing the native-platform file watching APIs. It also doesn't require the consumer of the API to be aware of this problem at all.

@wolfs
Copy link
Member

wolfs commented Jul 29, 2021

We could also go crazy, and check the current location of the watched directory against the registered path every time we receive notifications.

This has the problem of not detecting if somebody only moves the project directory, but doesn't delete/modify any files in it. Gradle would keep thinking that it watches the location of the project directory, but it actually doesn't.

@wolfs wolfs added the a:bug label Jul 29, 2021
@lptr
Copy link
Member Author

lptr commented Jul 29, 2021

It looks like other file watchers like Watchman also allow for FILE_SHARE_DELETE. This is another reason not to forbid it.

@lptr
Copy link
Member Author

lptr commented Jul 29, 2021

Yet another option would be to change the API of file watching in native-platform, exposing more of the underlying infrastructure. Specifically, we could expose each WatchPoint, and let them report relative paths on Windows at least. This way at least native-platform would stay consistent when watched directories have been moved around. It would still require some sort of way for Gradle to confirm that the watched directories are still where they were in the beginning.

@lptr
Copy link
Member Author

lptr commented Jul 29, 2021

I think I'd like to do the following:

  • have an API for the Windows-specific watcher where it specifically returns the current mapping from original path -> current path
  • Gradle should call this new API and unregister any paths and drop VFS where the current path is different to the original
  • optional: let native-platform reports the (absolute) paths correctly after a directory has been moved by always looking up the current path of the watch root when reporting changes
  • optional: add an API to prevent watch roots being deleted if necessary

@lptr
Copy link
Member Author

lptr commented Jul 29, 2021

I have filed https://youtrack.jetbrains.com/issue/TW-72497 to fix the short names thing with Swabra. The rename+delete thing is not something they should fix I think.

@lptr
Copy link
Member Author

lptr commented Aug 2, 2021

PR to add the necessary functionality in native-platform: gradle/native-platform#290

@lptr
Copy link
Member Author

lptr commented Aug 4, 2021

This is now fixed on release via #17922.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:bug affects-version:6.7 in:virtual-file-system VFS, file system watching, snapshots re:windows Issue related to using Gradle on Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants