Support untracked cache with '--untracked-files=all' if configured #985

gitgitgadget · 2022-03-29T17:47:55Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Tao Klerks <tao@klerks.biz> > > Untracked cache was originally designed to only work with > '--untracked-files=normal', but this causes performance issues for UI > tooling that wants to see "all" on a frequent basis. On the other hand, > the conditions that prevented applicability to the "all" mode no > longer seem to apply. There is a slight leap in logic flow before ", but this causes" that forces readers read the above twice and guess what is missing. I am guessing that ... was designed to only work with "--untracked-files=normal", and is bypassed when "--untracked-files=all" request, but this causes ... is what you meant to say. The claim in the last sentence "... no longer seem to apply" is thrown at the readers without much rationale. Either justify it more solidly or discard the claim? > The disqualification of untracked cache has a particularly significant > impact on Windows with the defaulted fscache, where the introduction > of fsmonitor can make the first and costly directory-iteration happen > in untracked file detection, single-threaded, rather than in > preload-index on multiple threads. Improving the performance of a > "normal" 'git status' run with fsmonitor can make > 'git status --untracked-files=all' perform much worse. The last sentence is a bit hard to parse and very hard to reason about. Do you mean to say "--untracked-files=all is slower when the untracked cache is in use, so the performance of 'git status' may be improved for '--untracked-files=normal' when the untracked cache is used, it hurts when 'git status --untracked-files=all' is run"? > To partially address this, align the supported directory flags for the > stored untracked cache data with the git config. If a user specifies > an '--untracked-files=' commandline parameter that does not align with > their 'status.showuntrackedfiles' config value, then the untracked > cache will be ignored - as it is for other unsupported situations like > when a pathspec is specified. > > If the previously stored flags no longer match the current > configuration, but the currently-applicable flags do match the current > configuration, then discard the previously stored untracked cache > data. Let me follow what actually happens in the patch aloud. > -static void new_untracked_cache(struct index_state *istate) > +static unsigned new_untracked_cache_flags(struct index_state *istate) > +{ > + struct repository *repo = istate->repo; > + char *val; > + > + /* > + * This logic is coordinated with the setting of these flags in > + * wt-status.c#wt_status_collect_untracked(), and the evaluation > + * of the config setting in commit.c#git_status_config() > + */ > + if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) && > + !strcmp(val, "all")) > + return 0; > + > + /* > + * The default, if "all" is not set, is "normal" - leading us here. > + * If the value is "none" then it really doesn't matter. > + */ > + return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES; > +} This _guesses_ the user preference from the configuration. > +static void new_untracked_cache(struct index_state *istate, int flags) > { > struct untracked_cache *uc = xcalloc(1, sizeof(*uc)); > strbuf_init(&uc->ident, 100); > uc->exclude_per_dir = ".gitignore"; > - /* should be the same flags used by git-status */ > - uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES; > + uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate); We use unsigned for the flag word unless there is a reason not to, but this function wants to use top-bit as a signal to "guess from config". The caller either dictates what bits to set, or we guess. And the created uc records the flags. > @@ -2762,11 +2782,11 @@ static void new_untracked_cache(struct index_state *istate) > void add_untracked_cache(struct index_state *istate) > { > if (!istate->untracked) { > - new_untracked_cache(istate); > + new_untracked_cache(istate, -1); We are newly creating, so "-1" (guess from config) may be the most appropriate (unless the caller of add_untracked_cache() already knows what operation it is using for, that is). > } else { > if (!ident_in_untracked(istate->untracked)) { Found an existing one but need to recreate. > free_untracked_cache(istate->untracked); > - new_untracked_cache(istate); > + new_untracked_cache(istate, -1); In this case, is it likely to give us a better guess to read the configuration, or copy from the existing untracked file? My gut feeling says it would be the latter, and if that is correct, then we might want + int old_flags = istate->untracked->dir_flags; free_untracked_cache(istate->untracked); - new_untracked_cache(istate); + new_untracked_cache(istate, old_flags); instead? I dunno. > @@ -2781,9 +2801,9 @@ void remove_untracked_cache(struct index_state *istate) > } > > static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir, > - int base_len, > - const struct pathspec *pathspec, > - struct index_state *istate) > + int base_len, > + const struct pathspec *pathspec, > + struct index_state *istate) > { > struct untracked_cache_dir *root; > static int untracked_cache_disabled = -1; Is this just re-indenting? Not very welcome, but OK. > @@ -2814,17 +2834,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d > if (base_len || (pathspec && pathspec->nr)) > return NULL; > > - /* Different set of flags may produce different results */ > - if (dir->flags != dir->untracked->dir_flags || > - /* > - * See treat_directory(), case index_nonexistent. Without > - * this flag, we may need to also cache .git file content > - * for the resolve_gitlink_ref() call, which we don't. > - */ > - !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) || > - /* We don't support collecting ignore files */ > - (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO | > - DIR_COLLECT_IGNORED))) > + /* We don't support collecting ignore files */ > + if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO | > + DIR_COLLECT_IGNORED)) > return NULL; > > /* > @@ -2847,6 +2859,41 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d > return NULL; > } > > + /* > + * We don't support using or preparing the untracked cache if > + * the current effective flags don't match the configured > + * flags. > + */ Is that what we want? What happens when your user does this: - set showuntrackedfiles to "normal" - generate the untracked cache - set showuntrackedfiles to "all" And now the user wants to make a request that wants to see flags that are suitable for "normal". The best case would be for the untracked cache to know what flags were in use when it was generated, so that in the above sequence, even though the current value of configuration variable and the current request from the command line contradict each other, we can notice that the untracked cache data is suitable for the current request and the right thing happens. > + if (dir->flags != new_untracked_cache_flags(istate)) > + return NULL; But this only pays attention to what is in the configuration? It seems to me that it is being too pessimistic, but perhaps it is necessary for correctness somehow? Thanks.

On the Git mailing list, Tao Klerks wrote (reply to this):

the On Tue, Mar 29, 2022 at 7:43 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Tao Klerks <tao@klerks.biz> > > > > Untracked cache was originally designed to only work with > > '--untracked-files=normal', but this causes performance issues for UI > > tooling that wants to see "all" on a frequent basis. On the other hand, > > the conditions that prevented applicability to the "all" mode no > > longer seem to apply. > > There is a slight leap in logic flow before ", but this causes" that > forces readers read the above twice and guess what is missing. I am > guessing that > > ... was designed to only work with "--untracked-files=normal", > and is bypassed when "--untracked-files=all" request, but this > causes ... > > is what you meant to say. Yep, will fix. > > The claim in the last sentence "... no longer seem to apply" is > thrown at the readers without much rationale. Either justify it > more solidly or discard the claim? > I can add references to previous conversation threads around the topic, and add the word "experimentally". > > The disqualification of untracked cache has a particularly significant > > impact on Windows with the defaulted fscache, where the introduction > > of fsmonitor can make the first and costly directory-iteration happen > > in untracked file detection, single-threaded, rather than in > > preload-index on multiple threads. Improving the performance of a > > "normal" 'git status' run with fsmonitor can make > > 'git status --untracked-files=all' perform much worse. > > The last sentence is a bit hard to parse and very hard to reason > about. Do you mean to say "--untracked-files=all is slower when the > untracked cache is in use, so the performance of 'git status' may be > improved for '--untracked-files=normal' when the untracked cache is > used, it hurts when 'git status --untracked-files=all' is run"? Something like that. The case where untracked cache bypassing (or disabling) has a huge impact is on windows, and involves fscache and fsmonitor: * Untracked cache saves some work under qualifying conditions * If qualifying conditions are not met (eg when -uall), we do the work anyway as though untracked cache were disabled. * fscache optimises directory-iteration on windows, with an N-thread preload strategy. This is enabled by default, and improves windows "git status" performance transparently from "abominably slow" to just "slow". * fsmonitor, when combined with untracked cache, eliminates recursive directory-iteration in favor of using a persistent in-memory model of what's changed on disk (to compare to the index and untracked cache) * When fsmonitor is enabled and used, the fscache preload strategy is (obviously) disabled * When fsmonitor is enabled and untracked cache is disabled or bypassed, the otherwise-ubiquitous "fscache preload" does not happen, and the untracked file search (directory iteration) takes... forever. In other words: In windows, where fscache behavior is critical for large repos and defaulted-on, when fsmonitor is also enabled which generally further improves things but disqualifies fscache preload, the bypassing of untracked cache causes a huge performance impact. A "-uall" run can take a minute, where a "-unormal" run takes only a second, because the enumeration of untracked files without using the untracked cache data means a full directory iteration, not optimized by fscache. On the other hand, removing fsmonitor and untracked cache altogether, and leaving just the fscache optimization to do its job consistently, results in consistent middle-of-the-road performance in the order of 10 seconds in this example. Enabling fsmonitor in this situation causes -uall to perform pathologically badly, because of untracked cache bypassing (interacting with the fscache-preload-bypassing of fsmonitor). However, the benefits of dropping status time from 10 seconds to 1 second make fsmonitor very attractive. On the other hand, the pathological performance with -uall is a problem if you use GUIs that *require* -uall. Therefore, fixing untracked cache to be compatible with -uall is critical to large repos on windows where the ideal experience would be fscache working, fsmonitor working, untracked cache working, and GUIs able to use -uall without the user waiting a minute or more and sometimes timing out. This problem sounds like an edge-case when I describe the chain of interactions, but is actually very common among large-repo windows users. Windows users are typically GUI users. GUIs typically use -uall, as git's default "hide contents of untracked folders" behavior is obviously limiting. This patch doesn't actually change anything I've described by default - but it makes it *possible* to get good/consistent performance with -uall (with fscache, with fsmonitor, with untracked cache), *if* you configure status.showuntrackedfiles=all. Now, coming back to the paragraph in question; maybe, instead of trying to summarize this windows-fscache-fsmonitor-untrackedcache interaction, I should be a little more vague. Would this work? -- When 'git status' runs without using the untracked cache, on a large repo, on windows, with fsmonitor, it can run very slowly. This can make GUIs that need to use "-uall" (and therefore currently bypass untracked cache) unusable when fsmonitor is enabled, on such large repos. -- > Let me follow what actually happens in the patch aloud. > [...] > This _guesses_ the user preference from the configuration. > Yes - the proposal, in this patch, is that the untracked cache usability/compatibility be aligned with the configuration. We are adding an extra meaning to the "status.showuntrackedfiles" configuration, such that it not only indicates what you want to have happen when you don't specify "-u" to git status, but it *also* indicates what kind of status output the untracked cache should be targeting/supporting. > > +static void new_untracked_cache(struct index_state *istate, int flags) > > { > > struct untracked_cache *uc = xcalloc(1, sizeof(*uc)); > > strbuf_init(&uc->ident, 100); > > uc->exclude_per_dir = ".gitignore"; > > - /* should be the same flags used by git-status */ > > - uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES; > > + uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate); > > We use unsigned for the flag word unless there is a reason not to, > but this function wants to use top-bit as a signal to "guess from > config". The caller either dictates what bits to set, or we guess. > > And the created uc records the flags. > Yep - although the word "guess" may be slightly misleading here. The proposed semantics are such that if the existing untracked cache flags are inconsistent with the config, we *discard* the no-longer-consistent-with-config untracked cache; so it's not so much a guess as a mandate. > > @@ -2762,11 +2782,11 @@ static void new_untracked_cache(struct index_state *istate) > > void add_untracked_cache(struct index_state *istate) > > { > > if (!istate->untracked) { > > - new_untracked_cache(istate); > > + new_untracked_cache(istate, -1); > > We are newly creating, so "-1" (guess from config) may be the most > appropriate (unless the caller of add_untracked_cache() already > knows what operation it is using for, that is). > > > } else { > > if (!ident_in_untracked(istate->untracked)) { > > Found an existing one but need to recreate. > > > free_untracked_cache(istate->untracked); > > - new_untracked_cache(istate); > > + new_untracked_cache(istate, -1); > > In this case, is it likely to give us a better guess to read the > configuration, or copy from the existing untracked file? My gut > feeling says it would be the latter, and if that is correct, then > we might want > > + int old_flags = istate->untracked->dir_flags; > free_untracked_cache(istate->untracked); > - new_untracked_cache(istate); > + new_untracked_cache(istate, old_flags); > > instead? I dunno. > As I noted above, we later consider that an untracked cache with flags that are inconsistent with the current config is an invalid untracked cache, and discard it; so setting the flags based on config is the right thing (the only useful thing) to do. > > @@ -2781,9 +2801,9 @@ void remove_untracked_cache(struct index_state *istate) > > } > > > > static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir, > > - int base_len, > > - const struct pathspec *pathspec, > > - struct index_state *istate) > > + int base_len, > > + const struct pathspec *pathspec, > > + struct index_state *istate) > > { > > struct untracked_cache_dir *root; > > static int untracked_cache_disabled = -1; > > Is this just re-indenting? Not very welcome, but OK. > Will fix; I think there was a previous version in which these lines actually changed. > > @@ -2814,17 +2834,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d > > if (base_len || (pathspec && pathspec->nr)) > > return NULL; > > > > - /* Different set of flags may produce different results */ > > - if (dir->flags != dir->untracked->dir_flags || > > - /* > > - * See treat_directory(), case index_nonexistent. Without > > - * this flag, we may need to also cache .git file content > > - * for the resolve_gitlink_ref() call, which we don't. > > - */ > > - !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) || > > - /* We don't support collecting ignore files */ > > - (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO | > > - DIR_COLLECT_IGNORED))) > > + /* We don't support collecting ignore files */ > > + if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO | > > + DIR_COLLECT_IGNORED)) > > return NULL; > > > > /* > > @@ -2847,6 +2859,41 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d > > return NULL; > > } > > > > + /* > > + * We don't support using or preparing the untracked cache if > > + * the current effective flags don't match the configured > > + * flags. > > + */ > > Is that what we want? What happens when your user does this: > > - set showuntrackedfiles to "normal" > - generate the untracked cache > - set showuntrackedfiles to "all" > > And now the user wants to make a request that wants to see flags > that are suitable for "normal". > > The best case would be for the untracked cache to know what flags > were in use when it was generated, so that in the above sequence, > even though the current value of configuration variable and the > current request from the command line contradict each other, we > can notice that the untracked cache data is suitable for the current > request and the right thing happens. > > > + if (dir->flags != new_untracked_cache_flags(istate)) > > + return NULL; > > But this only pays attention to what is in the configuration? It > seems to me that it is being too pessimistic, but perhaps it is > necessary for correctness somehow? The logic in the current patch is: * If the configuration doesn't match the request, ignore untracked cache completely (exit the UC logic) * If the configuration doesn't match the current untracked cache data, discard the existing untracked cache data (and later recreate it if everything else works out) I think your proposal is something like: * If the current untracked cache data doesn't match the request ** If the current untracked cache data doesn't match the configuration, then discard the existing untracked cache (and later recreate it if everything else works out) ** Otherwise, ignore untracked cache completely (exit the UC logic) * (otherwise, if current untracked cache data *does* match the request, assume it's good enough to keep using for now - this is the new behavior) I find the latter slightly harder to understand and explain, but I'm reasonably certain it addresses the case you identified above without compromising correctness in any other cases - it can only have a positive impact on performance. I'll have a go at making that change today.

gitgitgadget · 2022-03-29T16:53:54Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes: > +# Bypassing the untracked cache here is not desirable, but it expected > +# in the current implementation If that is the case, it is much more desirable to squash it into a single [2/2] patch so that the desirable working is documented (so future breakage can be caught), the reviewers can read what the intended behaviour is more easily (so we do not have to be confused by this one saying "expect success"), make it easier to cherry-pick the fix and test in the same patch elsewhere, and the existing breakage can easily be caught by applying only the test part of the patch. Thanks. > +test_expect_success 'untracked cache is bypassed with -uall' ' > + : >../trace.output && > + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ > + git status -uall --porcelain >../actual && > + iuc status -uall --porcelain >../status.iuc && > + test_cmp ../status_uall.expect ../status.iuc && > + test_cmp ../status_uall.expect ../actual && > + get_relevant_traces ../trace.output ../trace.relevant && > + cat >../trace.expect <<EOF && > + ....path: > +EOF > + test_cmp ../trace.expect ../trace.relevant > +' > + > +test_expect_success 'untracked cache remains after bypass' ' > + test-tool dump-untracked-cache >../actual && > + test_cmp ../dump.expect ../actual > +' > + > test_expect_success 'modify in root directory, one dir invalidation' ' > : >four && > test-tool chmtime =-240 four &&

On the Git mailing list, Tao Klerks wrote (reply to this):

On Tue, Mar 29, 2022 at 6:51 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > +# Bypassing the untracked cache here is not desirable, but it expected > > +# in the current implementation > > If that is the case, it is much more desirable to squash it into a > single [2/2] patch so that the desirable working is documented (so > future breakage can be caught), the reviewers can read what the > intended behaviour is more easily (so we do not have to be confused > by this one saying "expect success"), make it easier to cherry-pick > the fix and test in the same patch elsewhere, and the existing > breakage can easily be caught by applying only the test part of the > patch. Sorry, I'm not completely sure whether my comment was misleading, or whether I'm misunderstanding your feedback. The test added here does not test "desirable" behavior from an end-user functional perspective, but it does test behavior that is working "as-designed" as of many years ago. The intent in adding this test is to ensure that if/when someone changes this behavior down the line, they be forced to understand the implications (eg: the current untracked cache structure does not store the same data for a -uall run as for a -unormal run, and so using the data from one in another would lead to output correctness errors). Importantly, the behavior that this test is exercising *is not changed* by the 2/2 "improvement" patch; this is why I separated it out - it's a change that I feel we should make either way, and could even be its own independent patch. It's in this series as it adds relevant context and I am later adding similar tests for the new/additional behavior. Given the (I believe) misunderstanding, a better comment would be something like: # Bypassing the untracked cache here is not desirable from an # end-user perspective, but is expected in the current implementation. # The untracked cache data stored for a -unormal run cannot be # correctly used in a -uall run - it would yield incorrect output. Does that make more sense?

On the Git mailing list, Junio C Hamano wrote (reply to this):

Tao Klerks <tao@klerks.biz> writes: > Sorry, I'm not completely sure whether my comment was misleading, or > whether I'm misunderstanding your feedback. > > The test added here does not test "desirable" behavior from an > end-user functional perspective, but it does test behavior that is > working "as-designed" as of many years ago. Is it "as-designed", or just "left buggy"? I somehow had an impression that you meant the latter, and it would be our aspirational goal to eventually fix it. And for such case, it would be better to write the test body to show what the command should do, which would make the test fail with today's Git, and mark it as test_expect_failure (which is not an ideal mechanism to prepare for a future fix, but that is what we have now and should use, until a better alternative being worked on is finished). But if it is "as-designed, some users may find it suboptimal or confusing or with any other negative adjectives, but it is too late to change now and more importantly it is unthinkable to change it because existing tools and users do depend on the current behaviour", then what you did is perfectly fine. > The intent in adding this test is to ensure that if/when someone > changes this behavior down the line, they be forced to understand the > implications. That, too. Thanks.

On the Git mailing list, Tao Klerks wrote (reply to this):

On Wed, Mar 30, 2022 at 6:39 PM Junio C Hamano <gitster@pobox.com> wrote: > > Tao Klerks <tao@klerks.biz> writes: > > > Sorry, I'm not completely sure whether my comment was misleading, or > > whether I'm misunderstanding your feedback. > > > > The test added here does not test "desirable" behavior from an > > end-user functional perspective, but it does test behavior that is > > working "as-designed" as of many years ago. > > Is it "as-designed", or just "left buggy"? > > I somehow had an impression that you meant the latter, and it would > be our aspirational goal to eventually fix it. And for such case, > it would be better to write the test body to show what the command > should do, which would make the test fail with today's Git, and mark > it as test_expect_failure (which is not an ideal mechanism to prepare > for a future fix, but that is what we have now and should use, until > a better alternative being worked on is finished). > > But if it is "as-designed, some users may find it suboptimal or > confusing or with any other negative adjectives, but it is too late > to change now and more importantly it is unthinkable to change it > because existing tools and users do depend on the current behaviour", > then what you did is perfectly fine. > Heh, the answer is in-between: there certainly is an aspiration that someone figure out how to have the single untracked cache structure safely support both -uall and -unormal modes with the same data... but it's not clear that this is possible, and it's certainly not planned. I don't feel comfortable saying "we should fix this", and that's not the intent of this test - its intent is to say "this is how it is (currently) designed to work. If you accidentally change it, be sure that you are changing the design, not just accidentally changing behavior and potentially/probably introducing a bug". I'll try with this updated comment, using the term "current design" to make the distinction: # Bypassing the untracked cache here is not desirable from an # end-user perspective, but is expected in the current design. # The untracked cache data stored for a -unormal run cannot be # correctly used in a -uall run - it would yield incorrect output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support untracked cache with '--untracked-files=all' if configured #985

Uh oh!

Diff view

Diff view

There are no files selected for viewing

gitgitgadget bot Mar 29, 2022

Uh oh!

gitgitgadget bot Mar 30, 2022

Uh oh!

gitgitgadget bot Mar 29, 2022

Uh oh!

gitgitgadget bot Mar 30, 2022

Uh oh!

gitgitgadget bot Mar 30, 2022

Uh oh!

gitgitgadget bot Mar 31, 2022

Uh oh!

-Original file line number
+Diff line change
@@ Expand Up @@
     	strbuf_addch(&uc->ident, 0);
     }
-    static void new_untracked_cache(struct index_state *istate)
+    static unsigned new_untracked_cache_flags(struct index_state *istate)
+    {
+    	struct repository *repo = istate->repo;
+    	char *val;
+    	/*
+    	 * This logic is coordinated with the setting of these flags in
+    	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+    	 * of the config setting in commit.c#git_status_config()
+    	 */
+    	if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) &&
+    	    !strcmp(val, "all"))
+    		return 0;
+    	/*
+    	 * The default, if "all" is not set, is "normal" - leading us here.
+    	 * If the value is "none" then it really doesn't matter.
+    	 */
+    	return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+    }
+    static void new_untracked_cache(struct index_state *istate, int flags)
     {
     	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
     	strbuf_init(&uc->ident, 100);
     	uc->exclude_per_dir = ".gitignore";
-    	/* should be the same flags used by git-status */
-    	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+    	uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);
     	set_untracked_ident(uc);
     	istate->untracked = uc;
     	istate->cache_changed |= UNTRACKED_CHANGED;
@@ Expand All @@
     void add_untracked_cache(struct index_state *istate)
     {
     	if (!istate->untracked) {
-    		new_untracked_cache(istate);
+    		new_untracked_cache(istate, -1);
     	} else {
     		if (!ident_in_untracked(istate->untracked)) {
     			free_untracked_cache(istate->untracked);
-    			new_untracked_cache(istate);
+    			new_untracked_cache(istate, -1);
     		}
     	}
     }
@@ Expand Down Expand Up @@
     	if (base_len || (pathspec && pathspec->nr))
     		return NULL;
-    	/* Different set of flags may produce different results */
-    	if (dir->flags != dir->untracked->dir_flags ||
-    	    /*
-    	     * See treat_directory(), case index_nonexistent. Without
-    	     * this flag, we may need to also cache .git file content
-    	     * for the resolve_gitlink_ref() call, which we don't.
-    	     */
-    	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-    	    /* We don't support collecting ignore files */
-    	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-    			   DIR_COLLECT_IGNORED)))
+    	/* We don't support collecting ignore files */
+    	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+    			DIR_COLLECT_IGNORED))
     		return NULL;
     	/*
@@ Expand All @@
     		return NULL;
     	}
+    	/*
+    	 * If the untracked structure we received does not have the same flags
+    	 * as requested in this run, we're going to need to either discard the
+    	 * existing structure (and potentially later recreate), or bypass the
+    	 * untracked cache mechanism for this run.
+    	 */
+    	if (dir->flags != dir->untracked->dir_flags) {
+    		/*
+    		 * If the untracked structure we received does not have the same flags
+    		 * as configured, then we need to reset / create a new "untracked"
+    		 * structure to match the new config.
+    		 *
+    		 * Keeping the saved and used untracked cache consistent with the
+    		 * configuration provides an opportunity for frequent users of
+    		 * "git status -uall" to leverage the untracked cache by aligning their
+    		 * configuration - setting "status.showuntrackedfiles" to "all" or
+    		 * "normal" as appropriate.
+    		 *
+    		 * Previously using -uall (or setting "status.showuntrackedfiles" to
+    		 * "all") was incompatible with untracked cache and *consistently*
+    		 * caused surprisingly bad performance (with fscache and fsmonitor
+    		 * enabled) on Windows.
+    		 *
+    		 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+    		 * to not be as bound up with the desired output in a given run,
+    		 * and instead iterated through and stored enough information to
+    		 * correctly serve both "modes", then users could get peak performance
+    		 * with or without '-uall' regardless of their
+    		 * "status.showuntrackedfiles" config.
+    		 */
+    		if (dir->untracked->dir_flags != new_untracked_cache_flags(istate)) {
+    			free_untracked_cache(istate->untracked);
+    			new_untracked_cache(istate, dir->flags);
+    			dir->untracked = istate->untracked;
+    		}
+    		else {
+    			/*
+    			 * Current untracked cache data is consistent with config, but not
+    			 * usable in this request/run; just bypass untracked cache.
+    			 */
+    			return NULL;
+    		}
+    	}
     	if (!dir->untracked->root) {
     		/* Untracked cache existed but is not initialized; fix that */
     		FLEX_ALLOC_STR(dir->untracked->root, name, "");
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up @@
     	test_cmp ../dump.expect ../actual
     '
+    cat >../status_uall.expect <<EOF &&
+    A  done/one
+    A  one
+    A  two
+    ?? dthree/three
+    ?? dtwo/two
+    ?? three
+    EOF
+    # Bypassing the untracked cache here is not desirable from an
+    # end-user perspective, but is expected in the current design.
+    # The untracked cache data stored for a -unormal run cannot be
+    # correctly used in a -uall run - it would yield incorrect output.
+    test_expect_success 'untracked cache is bypassed with -uall' '
+    	: >../trace.output &&
+    	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+    	git status -uall --porcelain >../actual &&
+    	iuc status -uall --porcelain >../status.iuc &&
+    	test_cmp ../status_uall.expect ../status.iuc &&
+    	test_cmp ../status_uall.expect ../actual &&
+    	get_relevant_traces ../trace.output ../trace.relevant &&
+    	cat >../trace.expect <<EOF &&
+     ....path:
+    EOF
+    	test_cmp ../trace.expect ../trace.relevant
+    '
+    test_expect_success 'untracked cache remains after bypass' '
+    	test-tool dump-untracked-cache >../actual &&
+    	test_cmp ../dump.expect ../actual
+    '
+    test_expect_success 'if -uall is configured, untracked cache gets populated by default' '
+    	test_config status.showuntrackedfiles all &&
+    	: >../trace.output &&
+    	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+    	git status --porcelain >../actual &&
+    	iuc status --porcelain >../status.iuc &&
+    	test_cmp ../status_uall.expect ../status.iuc &&
+    	test_cmp ../status_uall.expect ../actual &&
+    	get_relevant_traces ../trace.output ../trace.relevant &&
+    	cat >../trace.expect <<EOF &&
+     ....path:
+     ....node-creation:3
+     ....gitignore-invalidation:1
+     ....directory-invalidation:0
+     ....opendir:4
+    EOF
+    	test_cmp ../trace.expect ../trace.relevant
+    '
+    cat >../dump_uall.expect <<EOF &&
+    info/exclude $EMPTY_BLOB
+    core.excludesfile $ZERO_OID
+    exclude_per_dir .gitignore
+    flags 00000000
+    / $ZERO_OID recurse valid
+    three
+    /done/ $ZERO_OID recurse valid
+    /dthree/ $ZERO_OID recurse valid
+    three
+    /dtwo/ $ZERO_OID recurse valid
+    two
+    EOF
+    test_expect_success 'if -uall was configured, untracked cache is populated' '
+    	test-tool dump-untracked-cache >../actual &&
+    	test_cmp ../dump_uall.expect ../actual
+    '
+    test_expect_success 'if -uall is configured, untracked cache is used by default' '
+    	test_config status.showuntrackedfiles all &&
+    	: >../trace.output &&
+    	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+    	git status --porcelain >../actual &&
+    	iuc status --porcelain >../status.iuc &&
+    	test_cmp ../status_uall.expect ../status.iuc &&
+    	test_cmp ../status_uall.expect ../actual &&
+    	get_relevant_traces ../trace.output ../trace.relevant &&
+    	cat >../trace.expect <<EOF &&
+     ....path:
+     ....node-creation:0
+     ....gitignore-invalidation:0
+     ....directory-invalidation:0
+     ....opendir:0
+    EOF
+    	test_cmp ../trace.expect ../trace.relevant
+    '
+    # Bypassing the untracked cache here is not desirable from an
+    # end-user perspective, but is expected in the current design.
+    # The untracked cache data stored for a -all run cannot be
+    # correctly used in a -unormal run - it would yield incorrect
+    # output.
+    test_expect_success 'if -uall is configured, untracked cache is bypassed with -unormal' '
+    	test_config status.showuntrackedfiles all &&
+    	: >../trace.output &&
+    	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+    	git status -unormal --porcelain >../actual &&
+    	iuc status -unormal --porcelain >../status.iuc &&
+    	test_cmp ../status.expect ../status.iuc &&
+    	test_cmp ../status.expect ../actual &&
+    	get_relevant_traces ../trace.output ../trace.relevant &&
+    	cat >../trace.expect <<EOF &&
+     ....path:
+    EOF
+    	test_cmp ../trace.expect ../trace.relevant
+    '
+    test_expect_success 'repopulate untracked cache for -unormal' '
+    	git status --porcelain
+    '
     test_expect_success 'modify in root directory, one dir invalidation' '
     	: >four &&
     	test-tool chmtime =-240 four &&
@@ Expand Down @@

Support untracked cache with '--untracked-files=all' if configured #985

Uh oh!

Support untracked cache with '--untracked-files=all' if configured #985

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

gitgitgadget bot Mar 29, 2022

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Mar 30, 2022

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Mar 29, 2022

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Mar 30, 2022

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Mar 30, 2022

Choose a reason for hiding this comment

Uh oh!

gitgitgadget bot Mar 31, 2022

Choose a reason for hiding this comment

Uh oh!