-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seek for ignored files only in dirs that their content has changed. #5429
Conversation
Eagerly loading all the ignored files slows down the regular status query. Remember tracked files to not treat them as a candidate for ignored file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution will fire off a series of "git-status -- 'ignore-candidates'" which in total could add up to more than listing all files in one go. In the repos I have at home, the time for git-status with and without candidates is very close. It is probably an improvement, but it is hard to determine. The complexity increases with this change too.
Another approach to minimize the first time until update which is annoying could have been to fire an update without ignored files the first time, next time with ignored (then wait minutes until next as before).
A way to limit the candidate checking and memory consumption (the list can be huge) is to use --ignore=matching when listing and filter on directories. That requires Git 2.16 or so but can be backwards compatible. That is more important if listing all ignored, but could benefit this approach too. See #4256 for an issue of the ignore files.
I would like to test this by use some time before approving. Well implemented though, not really any comments on the code.
@jbialobr Proposal:
A simple alternative is to just skip --ignored for "requested" invocations so the first update is quicker. |
They run really fast, but they do not return individual files. We would have to change the logic that verifies if a file is an ignored one.
If you are talking about: if (IsAboutToUpdate())
{
return;
} then yes, it is there on purpose. The regular use case is that you work with none ignored files. In most cases the git status will return the same files that are candidates for ignored and there will be no need to call git status --ignored command.
You have to implement your own logic to check if the file matches a dir returned by the command. It will be much slower than checking if a file is in hashmap. It will not simplify the implementation but has a chance to work fast.
I prefer to focus on making it always fast.
I am not sure about what cancellation you are talking.
It is already the case. And the --ignored command only is called when after the regular git status command there are candidates left which should not be too often.
I am not satisfied with that, as I am experiencing lags after the initial load as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbialobr, I've taken a liberty to refactor your proposal extracting (what I perceive as) caching related implementations into own class. This way it is modular and can be unit tested.
Passing IsAboutToUpdate
as an arguments feels very dirty, but it is EOD here and I'm quite tired to think clearly.
private HashSet<string> _ignoredFiles = new HashSet<string>(); | ||
private ConcurrentDictionary<string, object> _ignoredFiles = new ConcurrentDictionary<string, object>(); | ||
private ConcurrentDictionary<string, object> _ignoredCandidates = new ConcurrentDictionary<string, object>(); | ||
private ConcurrentDictionary<string, object> _trackedSeen = new ConcurrentDictionary<string, object>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these 3 can be made readonly
@RussKie Thanks, it is for sure a right direction. The |
Perhaps pass `workingDir` in and instantiate the module?
It makes me rather uneasy passing behavioural components as input
parameters.
…On Sun, 16 Sep 2018 at 22:57, Janusz Białobrzewski ***@***.***> wrote:
@RussKie <https://github.com/RussKie> Thanks, it is for sure a right
direction. The UpdateCandidatesAsync method can be split into two
methods: one for updating from the git status result, and the other for
verifying candidates. This way there will be no need to pass
IsAboutToUpdate to the cache. What we should be passing, is the GitModule
captured inside the Update method. We have to keep the module we use in
sync with _statusCancellationToken.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5429 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEMyXuYxn2LRrof_sYUy9GzTnwC6ZGnIks5ubkq-gaJpZM4Wo3hs>
.
|
Perhaps GitModule should be passed to the constructor by value and the cache should be recreated after GitModule has changed? |
For --ignored==matching: That should be required as all ignored files could require GB of storage. See #4256 In situations where a build is running, there will be ignored files updated constantly. The detection of ignored files do not have to be perfect, but will limit how often git-status runs. Since 2.51, the update is throttled dynamically, to detect changes after manual changes quickly and not load too much at massive updates like when building. Note that WorkTreeChanged() will return directly if an update is already scheduled. In the current implementation, some further updates from FileSystemWatcher will slip through, but that should be changed. To just check "the first directories" is normally OK, we just want to limit the rates. Previously, the complete list were updated anyway, not so with this change. The change is not necessarily a problem, but the current implementation has some drawbacks. The previous implementation, starting with no ignored files: The previous implementation, starting with no ignored files: This PR, starting with no ignored files: Possibilities:
Will submit a PR with a few adjustments to this PR (I get an exception) |
Removing the ignored list is not a bad option, that decreases the load for me compared to me (still bringing the benefits). #4256 is resolved too. |
@gerhardol I am leaning towards removing the ignored list at all as you proposed. |
Based on gitextensions#5429 no ignore
Based on gitextensions#5429 no ignore
fixes gitextensions#5439 fixes gitextensions#4256 partly based on gitextensions#5429 The handling was improved in gitextensions#4327, gitextensions#4328, fixing the regression Those PRs limited the max load for git-status Changes: React faster on changing directory (1.0s->0.2s) Some tweaks to the checks, do not check if update is required if an update is already scheduled Do not read ignored files due to two problems: * Reading ignored takes longer times. Since this was done the first time, the first update was delayed. (Reading ignored could be done later, but this will still delay the next update). * Saving all seen ignored files require substantial memory, see gitextensions#4256. The list could be limited (see the issue), but that will increase the test time for ignored files As ignored files will now always trigger git-status may cause some more runs. However, the ignored file list was never complete, the list was better over time, so GE must be able to handle this load anyway.
fixes gitextensions#5439 fixes gitextensions#4256 partly based on gitextensions#5429 The handling was improved in gitextensions#4327, gitextensions#4328, fixing the regression Those PRs limited the max load for git-status Changes: React faster on changing directory (1.0s->0.2s) Some tweaks to the checks, do not check if update is required if an update is already scheduled Do not read ignored files due to two problems: * Reading ignored can takes significant time. Since this was done the first time, the first update was delayed. (Reading ignored could be done later, but this will still delay the next update). * Saving all seen ignored files require substantial memory, see gitextensions#4256. The list could be limited (see the issue), but that will increase the test time for ignored files As ignored files will now always trigger git-status may cause some more runs. However, the ignored file list was never complete, the list was better over time, so GE must be able to handle this load anyway.
Based on gitextensions#5429 no ignore
fixes gitextensions#5439 fixes gitextensions#4256 partly based on gitextensions#5429 The handling was improved in gitextensions#4327, gitextensions#4328, fixing the regression Those PRs limited the max load for git-status Changes: React faster on changing directory (1.0s->0.2s) Some tweaks to the checks, do not check if update is required if an update is already scheduled Do not read ignored files due to two problems: * Reading ignored can takes significant time. Since this was done the first time, the first update was delayed. (Reading ignored could be done later, but this will still delay the next update). * Saving all seen ignored files require substantial memory, see gitextensions#4256. The list could be limited (see the issue), but that will increase the test time for ignored files As ignored files will now always trigger git-status may cause some more runs. However, the ignored file list was never complete, the list was better over time, so GE must be able to handle this load anyway.
fixes gitextensions#5439 fixes gitextensions#4256 partly based on gitextensions#5429 The handling was improved in gitextensions#4327, gitextensions#4328, fixing the regression Those PRs limited the max load for git-status Changes: React faster on changing directory (1.0s->0.2s) Some tweaks to the checks, do not check if update is required if an update is already scheduled Do not read ignored files due to two problems: * Reading ignored can takes significant time. Since this was done the first time, the first update was delayed. (Reading ignored could be done later, but this will still delay the next update). * Saving all seen ignored files require substantial memory, see gitextensions#4256. The list could be limited (see the issue), but that will increase the test time for ignored files As ignored files will now always trigger git-status may cause some more runs. However, the ignored file list was never complete, the list was better over time, so GE must be able to handle this load anyway.
fixes gitextensions#5439 fixes gitextensions#4256 partly based on gitextensions#5429 The handling was improved in gitextensions#4327, gitextensions#4328, fixing the regression Those PRs limited the max load for git-status Changes: React faster on changing directory (1.0s->0.2s) Some tweaks to the checks, do not check if update is required if an update is already scheduled Do not read ignored files due to two problems: * Reading ignored can takes significant time. Since this was done the first time, the first update was delayed. (Reading ignored could be done later, but this will still delay the next update). * Saving all seen ignored files require substantial memory, see gitextensions#4256. The list could be limited (see the issue), but that will increase the test time for ignored files As ignored files will now always trigger git-status may cause some more runs. However, the ignored file list was never complete, the list was better over time, so GE must be able to handle this load anyway.
fixes gitextensions#5439 fixes gitextensions#4256 partly based on gitextensions#5429 The handling was improved in gitextensions#4327, gitextensions#4328, fixing the regression Those PRs limited the max load for git-status Changes: React faster on changing directory (1.0s->0.2s) Some tweaks to the checks, disable file system monitoring temporary if an update is already scheduled Do not read ignored files due to two problems: * Reading ignored can takes significant time. Since this was done the first time, the first update was delayed. (Reading ignored could be done later, but this will still delay the next update). * Saving all seen ignored files require substantial memory, see gitextensions#4256. The list could be limited (see the issue), but that will increase the test time for ignored files As ignored files will now always trigger git-status may cause some more runs. However, the ignored file list was never complete, the list was better over time, so GE must be able to handle this load anyway.
Eagerly loading all the ignored files slows down the regular status query.
When you have tens of thousands of ignored files it can take up to a minute to get git status completed
when run with ignored files included. Since we base the working dir and commit index info on this, it is a poor experience to have to wait that long for seeing changes to be reflected in the revisions grid.
Changes proposed in this pull request:
When not touching ignored files, gitext will not have to load the ignored files list at all.
What did I do to test the code and ensure quality:
Has been tested on (remove any that don't apply):