Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: handle remains open after watch-del command #764

Closed
derrickstolee opened this issue Nov 25, 2019 · 4 comments
Closed

Windows: handle remains open after watch-del command #764

derrickstolee opened this issue Nov 25, 2019 · 4 comments

Comments

@derrickstolee
Copy link

I'm using Git's fsmonitor tool with Watchman to speed up performance when there are many files. However, after using the repo, I'm unable to delete it without closing the watchman process (on Windows). There must be an open handle somewhere still.

I'm using the watch-del command to remove the watch, but the handle still is not closed.

$ watchman watch-list
{
    "version": "4.9.4",
    "roots": [
        "C:/_git/ForTests/src"
    ]
}

$ rm -rf ForTests
rm: cannot remove 'ForTests/src': Device or resource busy

$ watchman watch-del "C:/_git/ForTests/src"

$ watchman watch-list
{
    "version": "4.9.4",
    "roots": []
}

$ rm -rf ForTests
rm: cannot remove 'ForTests/src': Device or resource busy

$ handle.exe | grep -B 1 ForTests
cmd.exe pid: 24192 DESKTOP-LE3SIE3\stole
   40: File  (RW-)   C:\_git\ForTests\src
--
watchman.exe pid: 22960 DESKTOP-LE3SIE3\stole
   44: File  (RW-)   C:\_git\ForTests\src

(The "cmd.exe" process reported by handle.exe is the background runner that is running watchman.exe as a child.)

I'm struggling to read the code and connect the dots (especially because I've failed to build it locally). I see that FileDescriptor has a destructor that will clear the handle, and that should be part of the destructor for WinDirHandle (which extends watchman_dir_handle). What I don't see is any connection between the code in watchlist.cpp and those types.

If it helps, it is important to note that a query must be asked of watchman in order to trigger this failure. This Python test may demonstrate the bug:

@WatchmanTestCase.expand_matrix
class TestHandles(WatchmanTestCase.WatchmanTestCase):
    def checkOSApplicability(self):
        if sys.platform != "win32":
            self.skipTest("N/A unless Windows")

    def test_handles(self):

        root = self.mkdtemp()
        self.watchmanCommand("watch", root)
        self.touchRelative(root, "file")
        self.watchmanCommand("watch-del", root)
        os.rmdir(root)
        self.assertFalse(os.path.isdir("root"))

I'm happy to commit time to fix this issue, but I'll need a little guidance for where to look. My initial inspection has led me in circles so I figured I would write this issue while everything is fresh.

@wez
Copy link
Contributor

wez commented Nov 25, 2019

Thanks for reporting this and digging in! I'm technically on PTO and away from work this week so can't look too closely until I get back, but since you'd started poking around, I thought you'd be ok with some pointers in the mean time!

I'm confident that FileDescriptor is generally managing resources correctly as we'd likely have seen issues on the other platforms and see more than just a single handle in the handle.exe output you shared.

That makes me wonder if something at a higher level is holding on to things. https://github.com/facebook/watchman/blob/master/watcher/win32.cpp is where we do the windows specific watching stuff; I wonder if there is a thread that is stuck holding on to a shared_ptr that is transitively keeping the dir_handle alive? this code captures a reference to the watcher and attempts to lock something during its shutdown and is something that is worth checking first; getting a stack trace from all threads may be helpful!

When it comes to building, it is a bit awkward to grab the deps piecemeal, but we do have a helper script that will download and build all the relevant deps as well as watchman itself; you just need a recent version of visual studio and python to be installed and it will take care of the rest; it will likely take 20-30 minutes to build the dependencies on the first build run, depending on how powerful your system is.

python build/fbcode_builder/getdeps.py build --src-dir=. watchman

You can run the test suite locally after you've built things using this:

python build/fbcode_builder/getdeps.py test --src-dir=. watchman

derrickstolee added a commit to microsoft/scalar that referenced this issue Nov 26, 2019
… tests with Watchman on Windows

To get Watchman working in our functional tests on Windows, we need a few things:

1. A way to install Watchman in the `InstallScalar.bat` script.
2. A way to update our Watchman build to take new versions. Currently, we are pulling from their CI builds, **but should transition to using our own feed.**
3. A new verb, `scalar remove <dir>` to tell Watchman to stop watching a directory and then delete the folder. This is necessary on Windows or the test deletion will fail. It's also helpful for users to remove their enlistments. The `watchman watch-del` command is not enough (see #249) but adding `watchman shutdown-server` clears the handles. This second command can be removed after we figure out the problem with Watchman. See facebook/watchman#764 for more.

* [x] Document the `scalar remove` verb.
* [x] Unregister from service.
* [ ] Change our download location for Watchman to be an Azure Artifacts feed. (Delayed to #245)
@derrickstolee
Copy link
Author

Thanks for the pointer! I'll give that a shot. I'm about to be out until Dec 9 myself, so I'll come back to this later.

For now, I found that I'm able to hack this approach using watchman shutdown-server, which unblocks the deletion but also interrupts any current queries. It gets me unblocked for now, but I'll definitely come back to this soon.

derrickstolee added a commit to derrickstolee/git that referenced this issue Dec 9, 2019
The fsmonitor feature allows an external tool such as watchman to
monitor the working directory. The direct test
t7619-status-fsmonitor.sh provides some coverage, but it would be
better to run the entire test suite with watchman enabled. This
would provide more confidence that the feature is working as
intended.

When running the test suite in parallel with 'prove -j <N>', many
repos are created and deleted in parallel. When GIT_TEST_FSMONITOR
points to t/t7519/fsmonitor-watchman, this can lead to watchman
tracking many different folders, overloading its watch queue.

As a test script completes, we can tell watchman to stop watching
the directories inside the TRASH_DIRECTORY.

This is particularly important on Windows where watchman keeps an
open handle on the directories it watches, preventing them from
being deleted. There is currently a bug in watchman [1] where this
handle still is not closed, but if that is updated then these tests
can be run on Windows as well.

[1] facebook/watchman#764

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
gitster pushed a commit to git/git that referenced this issue Dec 10, 2019
The fsmonitor feature allows an external tool such as watchman to
monitor the working directory. The direct test
t7619-status-fsmonitor.sh provides some coverage, but it would be
better to run the entire test suite with watchman enabled. This
would provide more confidence that the feature is working as
intended.

When running the test suite in parallel with 'prove -j <N>', many
repos are created and deleted in parallel. When GIT_TEST_FSMONITOR
points to t/t7519/fsmonitor-watchman, this can lead to watchman
tracking many different folders, overloading its watch queue.

As a test script completes, we can tell watchman to stop watching
the directories inside the TRASH_DIRECTORY.

This is particularly important on Windows where watchman keeps an
open handle on the directories it watches, preventing them from
being deleted. There is currently a bug in watchman [1] where this
handle still is not closed, but if that is updated then these tests
can be run on Windows as well.

[1] facebook/watchman#764

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@wez
Copy link
Contributor

wez commented Mar 23, 2020

I've been poking at this today and haven't been able to reproduce this issue :-/
Is it possible that one of the threads that owns a handle is blocked in the scenario
that triggers this for you?
Could you capture and share a stack trace from all of the threads in the process?

@wez
Copy link
Contributor

wez commented Apr 30, 2020

I'm going to consider this closed, but will be happy to re-open if we can get this to reproduce with a more recent build.

@wez wez closed this as completed Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants