Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to debug hydration? #1667

Open
canahari opened this issue May 26, 2020 · 6 comments
Open

how to debug hydration? #1667

canahari opened this issue May 26, 2020 · 6 comments

Comments

@canahari
Copy link

canahari commented May 26, 2020

We're considering switching to GIT/GVFS in a very large project. As we try any operation requiring hydration of many files (build, or just opening a big solution), not just the command but the whole operation system hangs, needing a hard reset to restore operation. The hang we think is windows explorer/filesystem-related, some of the processes stay more or less operable, but we cannot browse files, cannot type, or start new programs.
Hydration of a single file (e.g. opening a text file in the repo) works as expected.

We found out that while this command always works in our repo:
gvfs prefetch --files '*',
this command always hangs:
gvfs prefetch --files '*' --hydrate

I've attached logs of the latter, started in verbose mode. The logs are of 2 separate runs. Please excuse me for the quality, but we weren't able to save the log or do a printscreen with the computer hanging, so the log was photographed. In both cases, the hang was experienced between the last 2 lines, a lot before the last line appeared.

We're willing to try debug the issue, but we're not experienced in debugging filter drivers. If there's any way we could provide more information than this, we'll try to help.

log_2
log_3

@derrickstolee
Copy link
Contributor

Hi @canahari. Sorry that you're having issues. Thanks for bringing this to our attention. The gvfs prefetch --files '*' --hydrate command spins up several worker threads, and you are seeing the message that the "find the blobs we need to download" is finishing, but the "actually download the blobs" is not reporting success. Further, with 547,000 missing objects and a batch size of 4,000, you should be seeing over 136 "DownloadObjects" lines. This means that the download thread is not able to finish. But you could also be correct that the filesystem interactions are causing a halt that prevents that thread from continuing.

Here are some basic questions:

  1. What version of Windows are you using?
  2. What version of VFS for Git are you using?
  3. What kind of drive are you using? Solid-state drives are so highly recommended that we don't typically investigate performance issues related to hard disk drives.

Our usual way to diagnose these issues is to have the user run gvfs diagnose and put the resulting .zip file in a network share. If the .zip isn't too big for your repo, perhaps you could email it to gvfsdogfood@service.microsoft.com.

Alternatively, you could place the .zip file in a private GitHub repo and give me access to read it. I'm just trying to make sure we can support you while also not making any potentially sensitive information completely public.

Finally, since you are hydrating the entire working directory, and it seems to have "only" 550 thousand files, perhaps you would have a better time trying Scalar? Here is a blog post describing some of the differences between VFS for Git and Scalar.

@canahari
Copy link
Author

canahari commented May 27, 2020

Hello Derrick, thanks for your response. I'm sending the requested data in email.

About the scenario itself: we'll try out Scalar too, although we're generally not early adopters of anything :) Still, we want GVFS more, since generally a developer works only with a small part of this repository, and will not need a complete build most of the times.
But we do need to make sure the complete build works. And for that, literally the whole repo is needed. The prefetch we tried only for diagnostic purposes, after we experienced the hang when doing a plain build.

@derrickstolee
Copy link
Contributor

I've got the logs, thanks!

Could you retry with Version 1.0.20112.1? I know it's marked "pre-release" but I'll fix that soon. This version is stable and has been shipped to the Windows team for a while.

I see that you have some strange filenames in C:\.gvfsCache\343da45616674b55b17dda968e1efabb\gitObjects\pack, probably due to temporary files being downloaded and then the process shutting down abruptly. Please delete that directory, then run the following:

  1. gvfs prefetch --commits
  2. gvfs prefetch --files '*'
  3. gvfs prefetch --files '*' --hydrate

These will (1) get the full set of commits and trees for the repo, which is important before we explore for missing blobs, (2) download all missing blobs to the shared object cache, then (3) actually hydrate the working directory, but without the Git object downloads.

Splitting the object downloads from the file hydration could help in identifying the problem. Thanks!

@derrickstolee
Copy link
Contributor

I should also make some concrete responses about Scalar, based on your feedback:

About the scenario itself: we'll try out Scalar too, although we're generally not early adopters of anything :) Still, we want GVFS more, since generally a developer works only with a small part of this repository, and will not need a complete build most of the times.

Scalar does work for normal users, but it requires some planning in advance. Do users know which subdirectories they need in advance? Then they could scalar clone followed by git sparse-checkout add <dir1> <dir2> ... to get only the portions they need. While that manual step requires some training for users (or some investment in the engineering system to make this easy, it does significantly reduce the complexity of the system that would be present in VFS for Git.

But we do need to make sure the complete build works. And for that, literally the whole repo is needed. The prefetch we tried only for diagnostic purposes, after we experienced the hang when doing a plain build.

VFS for Git is overly complicated to use in the "I need every file" scenario. Please try Scalar for this scenario, using scalar clone --full-repo <url> for this case. You definitely won't run into the issue you're having here.

@canahari
Copy link
Author

canahari commented May 27, 2020

  1. gvfs prefetch --commits
  2. gvfs prefetch --files '*'
  3. gvfs prefetch --files '*' --hydrate

I've done as you said, updated my gvfs, cleared caches, cloned again, and ran the 3 commands. The first two ran without errors. (I'm attaching the logs, although the 1st command I've forgotten to run in verbose - I'll gladly repeat the experiment if you think the verbose log would be useful.) As I've expected, the 3rd command hanged. Of that result, I can only attach a photographed log again :)

git_logs_wo_hydrate.txt
hang

(I'd really like to try a little debugging as I'm a C#er myself, but your nuget source does not seem to want to serve me. Lots of messages like...
2>D:\repositories\VFSForGit\GVFS\GVFS.Common\GVFS.Common.csproj : error NU1101: Unable to find package NuGet.Commands. No packages exist with this id in source(s): Dependencies
2>D:\repositories\VFSForGit\GVFS\GVFS.Common\GVFS.Common.csproj : error NU1101: Unable to find package Microsoft.Data.Sqlite. No packages exist with this id in source(s): Dependencies
2>D:\repositories\VFSForGit\GVFS\GVFS.Common\GVFS.Common.csproj : error NU1101: Unable to find package LibGit2Sharp.NativeBinaries. No packages exist with this id in source(s): Dependencies ... )

@derrickstolee
Copy link
Contributor

(I'd really like to try a little debugging as I'm a C#er myself, ....

To do debugging, you are better off building from source and then working from there. Please see the instructions in the README. The tricky part is getting all of the dependencies installed, then running your built installer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants