-
Notifications
You must be signed in to change notification settings - Fork 711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status() is slow with a large number of untracked files #181
Comments
This is slow for even four files |
I was curious if anyone was thinking about this or working on it? Any ideas on what the path forward would be? |
I have the same issue at the moment. Running Update:
|
I experience the same. In fact I basically would like to do: |
This has been outstanding for quite a while, but I just wanted to reiterate how painful it is. We have to avoid go-git for checking status because our internal tools became unusable in our nodejs projects thanks to node_modules. |
On a relatively small node.js project, a single call to |
Getting the worktree status using go-git is extremely slow and is a long standing known issue: go-git/go-git#181 Fall back to using command line git for this simple operation. plz-review-url: https://plz.review/review/4863
PLZ-807 Use command line git for getting worktree status Getting the worktree status using go-git is extremely slow and is a long standing known issue: go-git/go-git#181 Fall back to using command line git for this simple operation. plz-review-url: https://plz.review/review/4863
Large worktrees were causing significant delays in displaying the user interface. This was due to calculcating the hash of files to determine the overall status of the worktree. Go has poor performance with SHA1 hashing. Too many files were unnecessarily hashed as well. These combinations caused some repositories to take well over 10 seconds to display the user interface. This is a known problem in worktree status and an issue already exists. go-git/go-git#181 Shelling out to call "git status" allowed for significant performance increases often in the sub second range. A modified implementation was used based on: gitleaks/gitleaks#463 The variation tries to use "git status" and if it fails falls back to the original go-git implementation.
To help us keep things tidy and focus on the active tasks, we've introduced a stale bot to spot issues/PRs that haven't had any activity in a while. This particular issue hasn't had any updates or activity in the past 90 days, so it's been labeled as 'stale'. If it remains inactive for the next 30 days, it'll be automatically closed. We understand everyone's busy, but if this issue is still important to you, please feel free to add a comment or make an update to keep it active. Thanks for your understanding and cooperation! |
This still renders the Status() method unusably slow on repos with large numbers of ignored files (like nodejs/npm working directories). |
I started to analyse the underlying issue that causes slowdowns in the project I'm working and figured out that this issue is the underlying issue. I also delved into go-git to figure out why it's slower than a regular I'm not sure what the best approach is to fix this. My current best guess is to change the My question would now be: Is the |
@codablock yes, unfortunately that is part of the public API and is currently one of the "blockers" for sha256 - so changes to it would target Please note that #825 introduces some performance improvements to this area, but is pending some additional comments/documentation before we can merge it. |
@pjbgf Thanks. Yeah that's what I assumed already. And I was not aware of #825, which will clearly also fix the underlying performance issue, but without the incompatible API change. I'll then wait for it to get merged/released instead of providing my own PR (I got it working locally already, so ping me if you still want to see it). |
To help us keep things tidy and focus on the active tasks, we've introduced a stale bot to spot issues/PRs that haven't had any activity in a while. This particular issue hasn't had any updates or activity in the past 90 days, so it's been labeled as 'stale'. If it remains inactive for the next 30 days, it'll be automatically closed. We understand everyone's busy, but if this issue is still important to you, please feel free to add a comment or make an update to keep it active. Thanks for your understanding and cooperation! |
Commenting to keep this thread active. |
@akshaybabloo Can you provide steps to reproduce of what you are experiencing? Did you try with a version with #825 (e.g. v5.12.0)? |
IMO, this is a good test of the performance of a large codebase: https://gitlab.com/gitlab-org/gitlab Repro steps: check it out and point the following code at it:
It's pretty slow (at least a few seconds) and seems to pick up ignored files (some in Can we not add an API to skip directories that are ignored? Is that existing functionality? |
Here is my example code @pjbgf based of @michaelangeloio for https://gitlab.com/gitlab-org/gitlab package main
import (
"fmt"
"os"
git5 "github.com/go-git/go-git/v5"
)
func main() {
var path string
var err error
if len(os.Args) > 1 {
path = os.Args[1]
} else {
path, err = os.Getwd()
if err != nil {
panic(err)
}
}
fmt.Printf("Checking for untracked files in %s\n", path)
repo, err := git5.PlainOpen(path)
if err != nil {
panic(err)
}
wt, err := repo.Worktree()
if err != nil {
panic(err)
}
status, err := wt.Status()
if err != nil {
panic(err)
}
for file, _ := range status {
if status.IsUntracked(file) {
fmt.Printf("Untracked file: %s\n", file)
}
}
} This shows (built it with
Using
Update Did
|
func (*Worktree) Status()
is slow when there are a large number of untracked files even if they are ignored by.gitignore
. It is much slower thangit status
.I imported this issue from src-d/go-git#844. This seems to be still an issue. Any update?
The text was updated successfully, but these errors were encountered: