Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 72 additions & 16 deletions dir.c
Original file line number Diff line number Diff line change
Expand Up @@ -2747,13 +2747,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
strbuf_addch(&uc->ident, 0);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Tao Klerks <tao@klerks.biz>
>
> Untracked cache was originally designed to only work with
> '--untracked-files=normal', but this causes performance issues for UI
> tooling that wants to see "all" on a frequent basis. On the other hand,
> the conditions that prevented applicability to the "all" mode no
> longer seem to apply.

There is a slight leap in logic flow before ", but this causes" that
forces readers read the above twice and guess what is missing.  I am
guessing that

    ... was designed to only work with "--untracked-files=normal",
    and is bypassed when "--untracked-files=all" request, but this
    causes ...

is what you meant to say.

The claim in the last sentence "... no longer seem to apply" is
thrown at the readers without much rationale.  Either justify it
more solidly or discard the claim?

> The disqualification of untracked cache has a particularly significant
> impact on Windows with the defaulted fscache, where the introduction
> of fsmonitor can make the first and costly directory-iteration happen
> in untracked file detection, single-threaded, rather than in
> preload-index on multiple threads. Improving the performance of a
> "normal" 'git status' run with fsmonitor can make
> 'git status --untracked-files=all' perform much worse.

The last sentence is a bit hard to parse and very hard to reason
about.  Do you mean to say "--untracked-files=all is slower when the
untracked cache is in use, so the performance of 'git status' may be
improved for '--untracked-files=normal' when the untracked cache is
used, it hurts when 'git status --untracked-files=all' is run"?

> To partially address this, align the supported directory flags for the
> stored untracked cache data with the git config. If a user specifies
> an '--untracked-files=' commandline parameter that does not align with
> their 'status.showuntrackedfiles' config value, then the untracked
> cache will be ignored - as it is for other unsupported situations like
> when a pathspec is specified.
>
> If the previously stored flags no longer match the current
> configuration, but the currently-applicable flags do match the current
> configuration, then discard the previously stored untracked cache
> data.

Let me follow what actually happens in the patch aloud.

> -static void new_untracked_cache(struct index_state *istate)
> +static unsigned new_untracked_cache_flags(struct index_state *istate)
> +{
> +	struct repository *repo = istate->repo;
> +	char *val;
> +
> +	/*
> +	 * This logic is coordinated with the setting of these flags in
> +	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
> +	 * of the config setting in commit.c#git_status_config()
> +	 */
> +	if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) &&
> +	    !strcmp(val, "all"))
> +		return 0;
> +
> +	/*
> +	 * The default, if "all" is not set, is "normal" - leading us here.
> +	 * If the value is "none" then it really doesn't matter.
> +	 */
> +	return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +}

This _guesses_ the user preference from the configuration.

> +static void new_untracked_cache(struct index_state *istate, int flags)
>  {
>  	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
>  	strbuf_init(&uc->ident, 100);
>  	uc->exclude_per_dir = ".gitignore";
> -	/* should be the same flags used by git-status */
> -	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +	uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);

We use unsigned for the flag word unless there is a reason not to,
but this function wants to use top-bit as a signal to "guess from
config".  The caller either dictates what bits to set, or we guess.

And the created uc records the flags.

> @@ -2762,11 +2782,11 @@ static void new_untracked_cache(struct index_state *istate)
>  void add_untracked_cache(struct index_state *istate)
>  {
>  	if (!istate->untracked) {
> -		new_untracked_cache(istate);
> +		new_untracked_cache(istate, -1);

We are newly creating, so "-1" (guess from config) may be the most
appropriate (unless the caller of add_untracked_cache() already
knows what operation it is using for, that is).

>  	} else {
>  		if (!ident_in_untracked(istate->untracked)) {

Found an existing one but need to recreate.

>  			free_untracked_cache(istate->untracked);
> -			new_untracked_cache(istate);
> +			new_untracked_cache(istate, -1);

In this case, is it likely to give us a better guess to read the
configuration, or copy from the existing untracked file?  My gut
feeling says it would be the latter, and if that is correct, then
we might want

+	      		int old_flags = istate->untracked->dir_flags;
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate, old_flags);

instead?  I dunno.

> @@ -2781,9 +2801,9 @@ void remove_untracked_cache(struct index_state *istate)
>  }
>  
>  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
> -						      int base_len,
> -						      const struct pathspec *pathspec,
> -						      struct index_state *istate)
> +							    int base_len,
> +							    const struct pathspec *pathspec,
> +							    struct index_state *istate)
>  {
>  	struct untracked_cache_dir *root;
>  	static int untracked_cache_disabled = -1;

Is this just re-indenting?  Not very welcome, but OK.

> @@ -2814,17 +2834,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>  	if (base_len || (pathspec && pathspec->nr))
>  		return NULL;
>  
> -	/* Different set of flags may produce different results */
> -	if (dir->flags != dir->untracked->dir_flags ||
> -	    /*
> -	     * See treat_directory(), case index_nonexistent. Without
> -	     * this flag, we may need to also cache .git file content
> -	     * for the resolve_gitlink_ref() call, which we don't.
> -	     */
> -	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
> -	    /* We don't support collecting ignore files */
> -	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> -			   DIR_COLLECT_IGNORED)))
> +	/* We don't support collecting ignore files */
> +	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> +			DIR_COLLECT_IGNORED))
>  		return NULL;
>  
>  	/*
> @@ -2847,6 +2859,41 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>  		return NULL;
>  	}
>  
> +	/*
> +	 * We don't support using or preparing the untracked cache if
> +	 * the current effective flags don't match the configured
> +	 * flags.
> +	 */

Is that what we want?  What happens when your user does this:

    - set showuntrackedfiles to "normal"
    - generate the untracked cache
    - set showuntrackedfiles to "all"

And now the user wants to make a request that wants to see flags
that are suitable for "normal".

The best case would be for the untracked cache to know what flags
were in use when it was generated, so that in the above sequence,
even though the current value of configuration variable and the
current request from the command line contradict each other, we
can notice that the untracked cache data is suitable for the current
request and the right thing happens.

> +	if (dir->flags != new_untracked_cache_flags(istate))
> +		return NULL;

But this only pays attention to what is in the configuration?  It
seems to me that it is being too pessimistic, but perhaps it is
necessary for correctness somehow?

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Tao Klerks wrote (reply to this):

 the

On Tue, Mar 29, 2022 at 7:43 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Tao Klerks <tao@klerks.biz>
> >
> > Untracked cache was originally designed to only work with
> > '--untracked-files=normal', but this causes performance issues for UI
> > tooling that wants to see "all" on a frequent basis. On the other hand,
> > the conditions that prevented applicability to the "all" mode no
> > longer seem to apply.
>
> There is a slight leap in logic flow before ", but this causes" that
> forces readers read the above twice and guess what is missing.  I am
> guessing that
>
>     ... was designed to only work with "--untracked-files=normal",
>     and is bypassed when "--untracked-files=all" request, but this
>     causes ...
>
> is what you meant to say.

Yep, will fix.

>
> The claim in the last sentence "... no longer seem to apply" is
> thrown at the readers without much rationale.  Either justify it
> more solidly or discard the claim?
>

I can add references to previous conversation threads around the
topic, and add the word "experimentally".

> > The disqualification of untracked cache has a particularly significant
> > impact on Windows with the defaulted fscache, where the introduction
> > of fsmonitor can make the first and costly directory-iteration happen
> > in untracked file detection, single-threaded, rather than in
> > preload-index on multiple threads. Improving the performance of a
> > "normal" 'git status' run with fsmonitor can make
> > 'git status --untracked-files=all' perform much worse.
>
> The last sentence is a bit hard to parse and very hard to reason
> about.  Do you mean to say "--untracked-files=all is slower when the
> untracked cache is in use, so the performance of 'git status' may be
> improved for '--untracked-files=normal' when the untracked cache is
> used, it hurts when 'git status --untracked-files=all' is run"?

Something like that. The case where untracked cache bypassing (or
disabling) has a huge impact is on windows, and involves fscache and
fsmonitor:
* Untracked cache saves some work under qualifying conditions
* If qualifying conditions are not met (eg when -uall), we do the work
anyway as though untracked cache were disabled.
* fscache optimises directory-iteration on windows, with an N-thread
preload strategy. This is enabled by default, and improves windows
"git status" performance transparently from "abominably slow" to just
"slow".
* fsmonitor, when combined with untracked cache, eliminates recursive
directory-iteration in favor of using a persistent in-memory model of
what's changed on disk (to compare to the index and untracked cache)
* When fsmonitor is enabled and used, the fscache preload strategy is
(obviously) disabled
* When fsmonitor is enabled and untracked cache is disabled or
bypassed, the otherwise-ubiquitous "fscache preload" does not happen,
and the untracked file search (directory iteration) takes... forever.

In other words: In windows, where fscache behavior is critical for
large repos and defaulted-on, when fsmonitor is also enabled which
generally further improves things but disqualifies fscache preload,
the bypassing of untracked cache causes a huge performance impact. A
"-uall" run can take a minute, where a "-unormal" run takes only a
second, because the enumeration of untracked files without using the
untracked cache data means a full directory iteration, not optimized
by fscache. On the other hand, removing fsmonitor and untracked cache
altogether, and leaving just the fscache optimization to do its job
consistently, results in consistent middle-of-the-road performance in
the order of 10 seconds in this example.

Enabling fsmonitor in this situation causes -uall to perform
pathologically badly, because of untracked cache bypassing
(interacting with the fscache-preload-bypassing of fsmonitor).
However, the benefits of dropping status time from 10 seconds to 1
second make fsmonitor very attractive. On the other hand, the
pathological performance with -uall is a problem if you use GUIs that
*require* -uall. Therefore, fixing untracked cache to be compatible
with -uall is critical to large repos on windows where the ideal
experience would be fscache working, fsmonitor working, untracked
cache working, and GUIs able to use -uall without the user waiting a
minute or more and sometimes timing out.

This problem sounds like an edge-case when I describe the chain of
interactions, but is actually very common among large-repo windows
users. Windows users are typically GUI users. GUIs typically use
-uall, as git's default "hide contents of untracked folders" behavior
is obviously limiting.

This patch doesn't actually change anything I've described by default
- but it makes it *possible* to get good/consistent performance with
-uall (with fscache, with fsmonitor, with untracked cache), *if* you
configure status.showuntrackedfiles=all.

Now, coming back to the paragraph in question; maybe, instead of
trying to summarize this windows-fscache-fsmonitor-untrackedcache
interaction, I should be a little more vague. Would this work?

--
When 'git status' runs without using the untracked cache, on a large
repo, on windows, with fsmonitor, it can run very slowly. This can
make GUIs that need to use "-uall" (and therefore currently bypass
untracked cache) unusable when fsmonitor is enabled, on such large
repos.
--


> Let me follow what actually happens in the patch aloud.
>
[...]
> This _guesses_ the user preference from the configuration.
>

Yes - the proposal, in this patch, is that the untracked cache
usability/compatibility be aligned with the configuration. We are
adding an extra meaning to the "status.showuntrackedfiles"
configuration, such that it not only indicates what you want to have
happen when you don't specify "-u" to git status, but it *also*
indicates what kind of status output the untracked cache should be
targeting/supporting.

> > +static void new_untracked_cache(struct index_state *istate, int flags)
> >  {
> >       struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
> >       strbuf_init(&uc->ident, 100);
> >       uc->exclude_per_dir = ".gitignore";
> > -     /* should be the same flags used by git-status */
> > -     uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> > +     uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);
>
> We use unsigned for the flag word unless there is a reason not to,
> but this function wants to use top-bit as a signal to "guess from
> config".  The caller either dictates what bits to set, or we guess.
>
> And the created uc records the flags.
>

Yep - although the word "guess" may be slightly misleading here. The
proposed semantics are such that if the existing untracked cache flags
are inconsistent with the config, we *discard* the
no-longer-consistent-with-config untracked cache; so it's not so much
a guess as a mandate.

> > @@ -2762,11 +2782,11 @@ static void new_untracked_cache(struct index_state *istate)
> >  void add_untracked_cache(struct index_state *istate)
> >  {
> >       if (!istate->untracked) {
> > -             new_untracked_cache(istate);
> > +             new_untracked_cache(istate, -1);
>
> We are newly creating, so "-1" (guess from config) may be the most
> appropriate (unless the caller of add_untracked_cache() already
> knows what operation it is using for, that is).
>
> >       } else {
> >               if (!ident_in_untracked(istate->untracked)) {
>
> Found an existing one but need to recreate.
>
> >                       free_untracked_cache(istate->untracked);
> > -                     new_untracked_cache(istate);
> > +                     new_untracked_cache(istate, -1);
>
> In this case, is it likely to give us a better guess to read the
> configuration, or copy from the existing untracked file?  My gut
> feeling says it would be the latter, and if that is correct, then
> we might want
>
> +                       int old_flags = istate->untracked->dir_flags;
>                         free_untracked_cache(istate->untracked);
> -                       new_untracked_cache(istate);
> +                       new_untracked_cache(istate, old_flags);
>
> instead?  I dunno.
>

As I noted above, we later consider that an untracked cache with flags
that are inconsistent with the current config is an invalid untracked
cache, and discard it; so setting the flags based on config is the
right thing (the only useful thing) to do.


> > @@ -2781,9 +2801,9 @@ void remove_untracked_cache(struct index_state *istate)
> >  }
> >
> >  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
> > -                                                   int base_len,
> > -                                                   const struct pathspec *pathspec,
> > -                                                   struct index_state *istate)
> > +                                                         int base_len,
> > +                                                         const struct pathspec *pathspec,
> > +                                                         struct index_state *istate)
> >  {
> >       struct untracked_cache_dir *root;
> >       static int untracked_cache_disabled = -1;
>
> Is this just re-indenting?  Not very welcome, but OK.
>

Will fix; I think there was a previous version in which these lines
actually changed.

> > @@ -2814,17 +2834,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >       if (base_len || (pathspec && pathspec->nr))
> >               return NULL;
> >
> > -     /* Different set of flags may produce different results */
> > -     if (dir->flags != dir->untracked->dir_flags ||
> > -         /*
> > -          * See treat_directory(), case index_nonexistent. Without
> > -          * this flag, we may need to also cache .git file content
> > -          * for the resolve_gitlink_ref() call, which we don't.
> > -          */
> > -         !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
> > -         /* We don't support collecting ignore files */
> > -         (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> > -                        DIR_COLLECT_IGNORED)))
> > +     /* We don't support collecting ignore files */
> > +     if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> > +                     DIR_COLLECT_IGNORED))
> >               return NULL;
> >
> >       /*
> > @@ -2847,6 +2859,41 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >               return NULL;
> >       }
> >
> > +     /*
> > +      * We don't support using or preparing the untracked cache if
> > +      * the current effective flags don't match the configured
> > +      * flags.
> > +      */
>
> Is that what we want?  What happens when your user does this:
>
>     - set showuntrackedfiles to "normal"
>     - generate the untracked cache
>     - set showuntrackedfiles to "all"
>
> And now the user wants to make a request that wants to see flags
> that are suitable for "normal".
>
> The best case would be for the untracked cache to know what flags
> were in use when it was generated, so that in the above sequence,
> even though the current value of configuration variable and the
> current request from the command line contradict each other, we
> can notice that the untracked cache data is suitable for the current
> request and the right thing happens.
>
> > +     if (dir->flags != new_untracked_cache_flags(istate))
> > +             return NULL;
>
> But this only pays attention to what is in the configuration?  It
> seems to me that it is being too pessimistic, but perhaps it is
> necessary for correctness somehow?

The logic in the current patch is:
* If the configuration doesn't match the request, ignore untracked
cache completely (exit the UC logic)
* If the configuration doesn't match the current untracked cache data,
discard the existing untracked cache data (and later recreate it if
everything else works out)

I think your proposal is something like:
* If the current untracked cache data doesn't match the request
** If the current untracked cache data doesn't match the
configuration, then discard the existing untracked cache (and later
recreate it if everything else works out)
** Otherwise, ignore untracked cache completely (exit the UC logic)
* (otherwise, if current untracked cache data *does* match the
request, assume it's good enough to keep using for now - this is the
new behavior)

I find the latter slightly harder to understand and explain, but I'm
reasonably certain it addresses the case you identified above without
compromising correctness in any other cases - it can only have a
positive impact on performance. I'll have a go at making that change
today.

}

static void new_untracked_cache(struct index_state *istate)
static unsigned new_untracked_cache_flags(struct index_state *istate)
{
struct repository *repo = istate->repo;
char *val;

/*
* This logic is coordinated with the setting of these flags in
* wt-status.c#wt_status_collect_untracked(), and the evaluation
* of the config setting in commit.c#git_status_config()
*/
if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) &&
!strcmp(val, "all"))
return 0;

/*
* The default, if "all" is not set, is "normal" - leading us here.
* If the value is "none" then it really doesn't matter.
*/
return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
}

static void new_untracked_cache(struct index_state *istate, int flags)
{
struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
strbuf_init(&uc->ident, 100);
uc->exclude_per_dir = ".gitignore";
/* should be the same flags used by git-status */
uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);
set_untracked_ident(uc);
istate->untracked = uc;
istate->cache_changed |= UNTRACKED_CHANGED;
Expand All @@ -2762,11 +2782,11 @@ static void new_untracked_cache(struct index_state *istate)
void add_untracked_cache(struct index_state *istate)
{
if (!istate->untracked) {
new_untracked_cache(istate);
new_untracked_cache(istate, -1);
} else {
if (!ident_in_untracked(istate->untracked)) {
free_untracked_cache(istate->untracked);
new_untracked_cache(istate);
new_untracked_cache(istate, -1);
}
}
}
Expand Down Expand Up @@ -2814,17 +2834,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
if (base_len || (pathspec && pathspec->nr))
return NULL;

/* Different set of flags may produce different results */
if (dir->flags != dir->untracked->dir_flags ||
/*
* See treat_directory(), case index_nonexistent. Without
* this flag, we may need to also cache .git file content
* for the resolve_gitlink_ref() call, which we don't.
*/
!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
/* We don't support collecting ignore files */
(dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
DIR_COLLECT_IGNORED)))
/* We don't support collecting ignore files */
if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
DIR_COLLECT_IGNORED))
return NULL;

/*
Expand All @@ -2847,6 +2859,50 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
return NULL;
}

/*
* If the untracked structure we received does not have the same flags
* as requested in this run, we're going to need to either discard the
* existing structure (and potentially later recreate), or bypass the
* untracked cache mechanism for this run.
*/
if (dir->flags != dir->untracked->dir_flags) {
/*
* If the untracked structure we received does not have the same flags
* as configured, then we need to reset / create a new "untracked"
* structure to match the new config.
*
* Keeping the saved and used untracked cache consistent with the
* configuration provides an opportunity for frequent users of
* "git status -uall" to leverage the untracked cache by aligning their
* configuration - setting "status.showuntrackedfiles" to "all" or
* "normal" as appropriate.
*
* Previously using -uall (or setting "status.showuntrackedfiles" to
* "all") was incompatible with untracked cache and *consistently*
* caused surprisingly bad performance (with fscache and fsmonitor
* enabled) on Windows.
*
* IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
* to not be as bound up with the desired output in a given run,
* and instead iterated through and stored enough information to
* correctly serve both "modes", then users could get peak performance
* with or without '-uall' regardless of their
* "status.showuntrackedfiles" config.
*/
if (dir->untracked->dir_flags != new_untracked_cache_flags(istate)) {
free_untracked_cache(istate->untracked);
new_untracked_cache(istate, dir->flags);
dir->untracked = istate->untracked;
}
else {
/*
* Current untracked cache data is consistent with config, but not
* usable in this request/run; just bypass untracked cache.
*/
return NULL;
}
}

if (!dir->untracked->root) {
/* Untracked cache existed but is not initialized; fix that */
FLEX_ALLOC_STR(dir->untracked->root, name, "");
Expand Down
113 changes: 113 additions & 0 deletions t/t7063-status-untracked-cache.sh
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,119 @@ test_expect_success 'untracked cache after second status' '
test_cmp ../dump.expect ../actual
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +# Bypassing the untracked cache here is not desirable, but it expected
> +# in the current implementation

If that is the case, it is much more desirable to squash it into a
single [2/2] patch so that the desirable working is documented (so
future breakage can be caught), the reviewers can read what the
intended behaviour is more easily (so we do not have to be confused
by this one saying "expect success"), make it easier to cherry-pick
the fix and test in the same patch elsewhere, and the existing
breakage can easily be caught by applying only the test part of the
patch.

Thanks.

> +test_expect_success 'untracked cache is bypassed with -uall' '
> +	: >../trace.output &&
> +	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
> +	git status -uall --porcelain >../actual &&
> +	iuc status -uall --porcelain >../status.iuc &&
> +	test_cmp ../status_uall.expect ../status.iuc &&
> +	test_cmp ../status_uall.expect ../actual &&
> +	get_relevant_traces ../trace.output ../trace.relevant &&
> +	cat >../trace.expect <<EOF &&
> + ....path:
> +EOF
> +	test_cmp ../trace.expect ../trace.relevant
> +'
> +
> +test_expect_success 'untracked cache remains after bypass' '
> +	test-tool dump-untracked-cache >../actual &&
> +	test_cmp ../dump.expect ../actual
> +'
> +
>  test_expect_success 'modify in root directory, one dir invalidation' '
>  	: >four &&
>  	test-tool chmtime =-240 four &&

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Tao Klerks wrote (reply to this):

On Tue, Mar 29, 2022 at 6:51 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +# Bypassing the untracked cache here is not desirable, but it expected
> > +# in the current implementation
>
> If that is the case, it is much more desirable to squash it into a
> single [2/2] patch so that the desirable working is documented (so
> future breakage can be caught), the reviewers can read what the
> intended behaviour is more easily (so we do not have to be confused
> by this one saying "expect success"), make it easier to cherry-pick
> the fix and test in the same patch elsewhere, and the existing
> breakage can easily be caught by applying only the test part of the
> patch.

Sorry, I'm not completely sure whether my comment was misleading, or
whether I'm misunderstanding your feedback.

The test added here does not test "desirable" behavior from an
end-user functional perspective, but it does test behavior that is
working "as-designed" as of many years ago.

The intent in adding this test is to ensure that if/when someone
changes this behavior down the line, they be forced to understand the
implications (eg: the current untracked cache structure does not store
the same data for a -uall run as for a -unormal run, and so using the
data from one in another would lead to output correctness errors).

Importantly, the behavior that this test is exercising *is not
changed* by the 2/2 "improvement" patch; this is why I separated it
out - it's a change that I feel we should make either way, and could
even be its own independent patch. It's in this series as it adds
relevant context and I am later adding similar tests for the
new/additional behavior.

Given the (I believe) misunderstanding, a better comment would be
something like:

# Bypassing the untracked cache here is not desirable from an
# end-user perspective, but is expected in the current implementation.
# The untracked cache data stored for a -unormal run cannot be
# correctly used in a -uall run - it would yield incorrect output.

Does that make more sense?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Tao Klerks <tao@klerks.biz> writes:

> Sorry, I'm not completely sure whether my comment was misleading, or
> whether I'm misunderstanding your feedback.
>
> The test added here does not test "desirable" behavior from an
> end-user functional perspective, but it does test behavior that is
> working "as-designed" as of many years ago.

Is it "as-designed", or just "left buggy"?

I somehow had an impression that you meant the latter, and it would
be our aspirational goal to eventually fix it.  And for such case,
it would be better to write the test body to show what the command 
should do, which would make the test fail with today's Git, and mark
it as test_expect_failure (which is not an ideal mechanism to prepare
for a future fix, but that is what we have now and should use, until
a better alternative being worked on is finished).

But if it is "as-designed, some users may find it suboptimal or
confusing or with any other negative adjectives, but it is too late
to change now and more importantly it is unthinkable to change it
because existing tools and users do depend on the current behaviour",
then what you did is perfectly fine.

> The intent in adding this test is to ensure that if/when someone
> changes this behavior down the line, they be forced to understand the
> implications.

That, too.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Tao Klerks wrote (reply to this):

On Wed, Mar 30, 2022 at 6:39 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Tao Klerks <tao@klerks.biz> writes:
>
> > Sorry, I'm not completely sure whether my comment was misleading, or
> > whether I'm misunderstanding your feedback.
> >
> > The test added here does not test "desirable" behavior from an
> > end-user functional perspective, but it does test behavior that is
> > working "as-designed" as of many years ago.
>
> Is it "as-designed", or just "left buggy"?
>
> I somehow had an impression that you meant the latter, and it would
> be our aspirational goal to eventually fix it.  And for such case,
> it would be better to write the test body to show what the command
> should do, which would make the test fail with today's Git, and mark
> it as test_expect_failure (which is not an ideal mechanism to prepare
> for a future fix, but that is what we have now and should use, until
> a better alternative being worked on is finished).
>
> But if it is "as-designed, some users may find it suboptimal or
> confusing or with any other negative adjectives, but it is too late
> to change now and more importantly it is unthinkable to change it
> because existing tools and users do depend on the current behaviour",
> then what you did is perfectly fine.
>

Heh, the answer is in-between: there certainly is an aspiration that
someone figure out how to have the single untracked cache structure
safely support both -uall and -unormal modes with the same data... but
it's not clear that this is possible, and it's certainly not planned.

I don't feel comfortable saying "we should fix this", and that's not
the intent of this test - its intent is to say "this is how it is
(currently) designed to work. If you accidentally change it, be sure
that you are changing the design, not just accidentally changing
behavior and potentially/probably introducing a bug".

I'll try with this updated comment, using the term "current design" to
make the distinction:

# Bypassing the untracked cache here is not desirable from an
# end-user perspective, but is expected in the current design.
# The untracked cache data stored for a -unormal run cannot be
# correctly used in a -uall run - it would yield incorrect output.

'

cat >../status_uall.expect <<EOF &&
A done/one
A one
A two
?? dthree/three
?? dtwo/two
?? three
EOF

# Bypassing the untracked cache here is not desirable from an
# end-user perspective, but is expected in the current design.
# The untracked cache data stored for a -unormal run cannot be
# correctly used in a -uall run - it would yield incorrect output.
test_expect_success 'untracked cache is bypassed with -uall' '
: >../trace.output &&
GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
git status -uall --porcelain >../actual &&
iuc status -uall --porcelain >../status.iuc &&
test_cmp ../status_uall.expect ../status.iuc &&
test_cmp ../status_uall.expect ../actual &&
get_relevant_traces ../trace.output ../trace.relevant &&
cat >../trace.expect <<EOF &&
....path:
EOF
test_cmp ../trace.expect ../trace.relevant
'

test_expect_success 'untracked cache remains after bypass' '
test-tool dump-untracked-cache >../actual &&
test_cmp ../dump.expect ../actual
'

test_expect_success 'if -uall is configured, untracked cache gets populated by default' '
test_config status.showuntrackedfiles all &&
: >../trace.output &&
GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
git status --porcelain >../actual &&
iuc status --porcelain >../status.iuc &&
test_cmp ../status_uall.expect ../status.iuc &&
test_cmp ../status_uall.expect ../actual &&
get_relevant_traces ../trace.output ../trace.relevant &&
cat >../trace.expect <<EOF &&
....path:
....node-creation:3
....gitignore-invalidation:1
....directory-invalidation:0
....opendir:4
EOF
test_cmp ../trace.expect ../trace.relevant
'

cat >../dump_uall.expect <<EOF &&
info/exclude $EMPTY_BLOB
core.excludesfile $ZERO_OID
exclude_per_dir .gitignore
flags 00000000
/ $ZERO_OID recurse valid
three
/done/ $ZERO_OID recurse valid
/dthree/ $ZERO_OID recurse valid
three
/dtwo/ $ZERO_OID recurse valid
two
EOF

test_expect_success 'if -uall was configured, untracked cache is populated' '
test-tool dump-untracked-cache >../actual &&
test_cmp ../dump_uall.expect ../actual
'

test_expect_success 'if -uall is configured, untracked cache is used by default' '
test_config status.showuntrackedfiles all &&
: >../trace.output &&
GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
git status --porcelain >../actual &&
iuc status --porcelain >../status.iuc &&
test_cmp ../status_uall.expect ../status.iuc &&
test_cmp ../status_uall.expect ../actual &&
get_relevant_traces ../trace.output ../trace.relevant &&
cat >../trace.expect <<EOF &&
....path:
....node-creation:0
....gitignore-invalidation:0
....directory-invalidation:0
....opendir:0
EOF
test_cmp ../trace.expect ../trace.relevant
'

# Bypassing the untracked cache here is not desirable from an
# end-user perspective, but is expected in the current design.
# The untracked cache data stored for a -all run cannot be
# correctly used in a -unormal run - it would yield incorrect
# output.
test_expect_success 'if -uall is configured, untracked cache is bypassed with -unormal' '
test_config status.showuntrackedfiles all &&
: >../trace.output &&
GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
git status -unormal --porcelain >../actual &&
iuc status -unormal --porcelain >../status.iuc &&
test_cmp ../status.expect ../status.iuc &&
test_cmp ../status.expect ../actual &&
get_relevant_traces ../trace.output ../trace.relevant &&
cat >../trace.expect <<EOF &&
....path:
EOF
test_cmp ../trace.expect ../trace.relevant
'

test_expect_success 'repopulate untracked cache for -unormal' '
git status --porcelain
'

test_expect_success 'modify in root directory, one dir invalidation' '
: >four &&
test-tool chmtime =-240 four &&
Expand Down