-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Allow disabling git_status for large/slow repos #921
Conversation
For the time being we log and ignore. We don't have an error mechanism in place yet.
Ignored tests are tests that require the test environment to be a particular state (e.g. having a pre-installed dependency), or that modify the testing system in a non-temporary way (e.g. creating test files in In my opinion, I don't think that adding more configuration is a great solution for In a perfect world, we would experiment with using https://github.com/romkatv/gitstatus via FFI, but that may be a big undertaking. |
Just wanted to mention that there is an alternative way to use gistatus in starship. You can Call Just an idea. This isn't necessarily easy but might be worth trying. It can result is dramatic speedup: gitstatus is about 40 times faster than libgit2 that starship is using. You can estimate whether the speedup is worth the trouble by trying the native gitstatus prompt or powerlevel10k in the same directory where starship is slow. |
@matchai even with gitstatus our return time is 3-4s (which is still an improvement from the ~15s) but still slow enough that I would want to disable it for that particular directory. To put it in context, we have ~25 years of history and a LOT of files in the repo that I work on. Through various means you can get this down to 3-4 seconds, but that is still a lifetime waiting for the prompt to be active. FYI posh_git has a similar feature for a similar reason https://github.com/dahlbyk/posh-git#customization-variables Personally I would love this feature, but understand if you don't want to take it on :) I am somewhat opposed to the timeout idea as I know this directory is going to cause problems, it will always timeout, so shouldn't startship just avoid spawning the extra process waiting for a period of time before I can enter a command? I want my shell to be snappy, not sitting around for 1s waiting :) |
Modules are computed on parallel threads, so the duration of the wait, in most cases, shouldn't usually affect how long it takes for the prompt to render. I was thinking that a reasonable timeout would be 100ms, which is generally considered to be the upper-bound of what feels immediate, but lowering it is also an option. |
I get a massive delay (~15s) rendering are the moment. Wouldn't threads all have to join before rendering? I don’t think 100ms would be enough for the three main repos I work on, even with gitstatus above. I can get some actual times for that though. |
@jaredwy Let's try to figure out why fetching git status is taking so much time.
How did you measure this?
What's the output of Are you benchmarking in a clean repository? How long does Do timings improve after What OS are you using? On which storage device is the local repository stored? HDD, SATA SSD, NVME SSD? On which file system is the local repository stored? |
It’s slow because it’s a huge project, with git status in a clean repo taking a while.:) it’s a known problem with git for very large repos there are mitigation’s for it but come with trade offs (e.g. it won’t track new files etc). |
If it's not too much to ask, could you answer at least some of my questions? On my machine for Edit: Fixed a typo: 30 million files rather than 3 million. |
I’m not at my computer, but I doubt we are at 30 million but it’s going to be over a million. Os is windows and Mac. Ssd and nvme ssd |
Thanks! If plain Granted, even if I wonder if fetching git status asynchronously would give you good UX. If you have a free moment, could you try https://github.com/romkatv/powerlevel10k to see if it's usable in your huge Git repository out of the box? This requires Zsh. If you aren't using Zsh but can spare a few minutes to test this, here's how:
git clone --depth=1 https://github.com/romkatv/powerlevel10k.git ~/powerlevel10k
echo 'source ~/powerlevel10k/powerlevel10k.zsh-theme' >> ~/.zshrc
If powerlevel10k is unusable out of the box in the Git repository of yours, type As for ignoring large repositories when computing Git status, I can share how this is done in gitstatus and powerlevel10k. Perhaps it'll help.
|
Is it worth me refining this pull request to match the last point in @romkatv comment above. As mentioned above even switching to use gitstatus would result in a pretty unsuable repo. I don't really agree with the timeout solution as it results in extra unneeded work. I know the slow repos, so I would rather not have to wait the 100ms(or whatever value) for something I know is going to fail. As it stands starship is unusable for me, I can't wait 15 seconds on my main work repo for the prompt to become usable again, but I would rather not disable it totally fort the smaller repos, and there is certainly precedent for it in other systems powerlevel10k and posh-git for example. I would love to be able to continue to use so happy to work on a solution that makes everyone happy. |
Perhaps I should clarify the intention of my previous comment. There are many potential solutions for the problem of supporting large Git repositories in Starship. Knowing which ones will and won't work in different circumstances is valuable. I've assembled the data you've already provided in this gist. Even thought it has little specifics, it provides valuable insights that could inform the development of Starship. Specifically, it says that in one specific case My last comment is asking you to check other potential solutions to see if they would work if implemented in Starship.
Answering these questions by imagining the solutions is difficult and error prone. Implementing them in Starship before knowing whether they'll work is expensive. A more efficient approach is to try them in powerlevel10k where they are already implemented. This would allow Starship to build on experience from other projects -- adopt things that work well, and skip those that don't. |
If Starship were to call gitstatus_query asynchronously, would it work for you? It would work, if prompts allow for async updates. Other wise we will just block waiting for it to complete. If Starship were to skip workdir scan for Git repositories with over 100000 files, would it work for you? If Starship were to honor bash.showDirtyState, would it work for you? I don't see how that relates to other shells? |
Fair enough. You have no obligating to help users of other shells. I respect that.
Your PR adds a configuration option to Starship. This is a different configuration option that could be added to Starship. The question is which one is better. Specifying the upper limit on repository size has the following advantages:
Starship can honor this configuration option, or it could define I'm mentioning it because this is another solution from a very popular project for the same problem you are facing. P.S. If you don't find this conversation productive, please tell me. I'll withdraw, no hard feelings. |
This is one of the feature I am looking for. In the meantime I am using something similar to what default git ps1 does: https://github.com/git/git/blob/master/contrib/completion/git-prompt.sh#L510 I have modified starship, where you can set the bash.shoDirtyState true or false for each repo or entirely by setting "GIT_PS1_SHOWDIRTYSTATE".
|
Well I use powershell as well, so need it to work there at least :)
For me it comes with a disadvantage of having to now go to that repo, perform ls-files get the count and update. With my solution you just add the directory you know is slow.
The issue that I have with this approach is now you have split the config. I now have to set some options in starship, others in my git config. I would favour a solution that allows me to keep all my config in the one spot. This would aid in portability for me. E.g. I put all my source in the same directory across every machine. I know that repo A will be slow. So even if I setup a new machine, i just have to pull down my startship config and i can clone the repo and have it work. Where with this solution I would need to set a gitconfig after a clone.
|
Hm... Don't you now have to go to that repo, perform I think we both understand each other's points. I believe that options that are used by other successful projects work very well. You believe that your PR works very well. Let's leave it at that. |
Given how much this has become a need as of late, I'd be open to seeing this solution through. 😊 |
@matchai happy to get this across the line. Which refinments would you like to see particularly and happy to do the work. Been a lot of discussions, so just not sure which direction you would like to see it go. |
I'd be inclined to experiment with the solution proposed by @romkatv, using |
Happy to wire that up. just want to register my concern: This feels like it could be a bit surprising to users. Imagine, starship is working fine, you do a fetch and suddenly part of your prompt disappears. It isn't immediately obvious why without reading docs why you lost status. |
Yeah, that's not so great. 😕 It seems to me like the lesser of two evils, but I could be convinced otherwise. |
Not showing Git status when the size of a repository is above some threshold would indeed be surprising and unhelpful. I would recommend against this approach. In Powerlevel10k the following algorithm has worked great:
|
The contents of romkatv's gist from this comment, in case the gist ever goes offline later.
|
Since this PR was created we've had a lot of focus put into git performance: we've moved from using libgit2 to parsing git and using gitoxide, and we've introduced command timeouts for slow repositories. While it's possible the approach here might still be necessary in the end, I'm going to go ahead and close this for now as a way to reduce noise while we focus on improving our gitoxide perf. The discussions about interface and what's expected on this PR have been invaluable---thank you to everyone who contributed! |
Description
Some repos are very large. This can take a while to run git status on them.
I am creating this as a draft repo as I had some questions and some remaining work but wanted to get some early feedback.
What is the best way to fail reading the config file. E.g. If a path isn't absolute, we should skip it. Do we fail loading or do we warn and ignore.
I added tests and they run but they seem to be ignored. Is there a reason for that? Should I also add mine to be ignored?
I need to test this on windows.
Motivation and Context
As above, the repo I work with at work is rather large and using git status is prohibitively slow. I don't want to not have status on repos where it is still usable. This gives me the ability to turn it off just for the slow repos.
Types of changes
Screenshots (if appropriate):
How Has This Been Tested?
Checklist: