Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Non-critical]: sparse index vs split index fixes #1119

Closed
wants to merge 3 commits into from

Conversation

dscho
Copy link
Member

@dscho dscho commented Jan 19, 2022

We noticed split/sparse index-related issues while rebasing Microsoft's fork of Git. These fixes are necessary for that fork's test suite to pass, but they might not be super critical to get into upstream v2.35.0 (especially this close to -rc2). However, the team felt that the decision should be left to the Git maintainer whether to take these patches into v2.35.0 or not.

cc: Elijah Newren newren@gmail.com

In 6e77352 (sparse-index: convert from full to sparse, 2021-03-30),
we introduced initial support for a sparse index, and were careful to
avoid converting to a sparse index in the presence of a split index.

However, when we _just_ read a freshly-initialized index, it might not
contain a split index even if _writing_ it will add one by virtue of
being asked for via the `GIT_TEST_SPLIT_INDEX` variable.

We did not notice any problems with checking _only_ for `split_index`
(and not `GIT_TEST_SPLIT_INDEX`) right until both
`vd/sparse-sparsity-fix-on-read` _and_ `vd/sparse-reset` were merged.

Those two topics' interplay triggers a bug in conjunction with running
t1091.15 when `GIT_TEST_SPLIT_INDEX=true` in the following way:
`vd/sparse-sparsity-fix-on-read` ensures that the index is made sparse
right after reading, and `vd/sparse-reset` ensures that the index is
made non-sparse again unless running in the `--soft` mode. Since the
split index feature is incompatible with the sparse index feature, we
see a symptom like this:

	fatal: position for replacement 4 exceeds base index size 4

Let's fix this by avoiding the conversion to a sparse index when
`GIT_TEST_SPLIT_INDEX=true`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In 61feddc (tests: disable GIT_TEST_SPLIT_INDEX for sparse index
tests, 2021-08-26), it was already called out that the split index
feature is incompatible with the sparse index feature, and its commit
message wondered aloud whether more checks would be required to ensure
that the split index and sparse index features aren't enabled at the
same time.

We are about to introduce such additional checks, and indeed, t1091
would utterly fail with them. Therefore, let's preemptively disable the
split index for the entirety of t1091.

This partially reverts above-mentioned patch because it covered only one
test case whereas we want to cover the entire test script.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
... at least for now. So let's error out if we are even trying to
initialize the split index when the index is sparse, or when trying to
write the split index extension for a sparse index.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho
Copy link
Member Author

dscho commented Jan 19, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 19, 2022

Submitted as pull.1119.git.1642613379.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1119/dscho/sparse-index-vs-split-index-v1

To fetch this version to local tag pr-1119/dscho/sparse-index-vs-split-index-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1119/dscho/sparse-index-vs-split-index-v1

@dscho dscho self-assigned this Jan 19, 2022
@@ -3009,6 +3009,9 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
!is_null_oid(&istate->split_index->base_oid)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> ... at least for now. So let's error out if we are even trying to
> initialize the split index when the index is sparse, or when trying to
> write the split index extension for a sparse index.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  read-cache.c  | 3 +++
>  split-index.c | 3 +++
>  2 files changed, 6 insertions(+)

Good.  Will queue.


> diff --git a/read-cache.c b/read-cache.c
> index cbe73f14e5e..a932e01fc7a 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -3009,6 +3009,9 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>  	    !is_null_oid(&istate->split_index->base_oid)) {
>  		struct strbuf sb = STRBUF_INIT;
>  
> +		if (istate->sparse_index)
> +			die(_("cannot write split index for a sparse index"));
> +
>  		err = write_link_extension(&sb, istate) < 0 ||
>  			write_index_ext_header(f, eoie_c, CACHE_EXT_LINK,
>  					       sb.len) < 0;
> diff --git a/split-index.c b/split-index.c
> index 8e52e891c3b..9d0ccc30d00 100644
> --- a/split-index.c
> +++ b/split-index.c
> @@ -5,6 +5,9 @@
>  struct split_index *init_split_index(struct index_state *istate)
>  {
>  	if (!istate->split_index) {
> +		if (istate->sparse_index)
> +			die(_("cannot use split index with a sparse index"));
> +
>  		CALLOC_ARRAY(istate->split_index, 1);
>  		istate->split_index->refcount = 1;
>  	}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Fri, 21 Jan 2022, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > ... at least for now. So let's error out if we are even trying to
> > initialize the split index when the index is sparse, or when trying to
> > write the split index extension for a sparse index.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  read-cache.c  | 3 +++
> >  split-index.c | 3 +++
> >  2 files changed, 6 insertions(+)
>
> Good.  Will queue.

Thank you!
Dscho

@@ -136,7 +136,7 @@ static int is_sparse_index_allowed(struct index_state *istate, int flags)
/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In 6e773527b6b (sparse-index: convert from full to sparse, 2021-03-30),
> we introduced initial support for a sparse index, and were careful to
> avoid converting to a sparse index in the presence of a split index.
>
> However, when we _just_ read a freshly-initialized index, it might not
> contain a split index even if _writing_ it will add one by virtue of
> being asked for via the `GIT_TEST_SPLIT_INDEX` variable.
>
> We did not notice any problems with checking _only_ for `split_index`
> (and not `GIT_TEST_SPLIT_INDEX`) right until both
> `vd/sparse-sparsity-fix-on-read` _and_ `vd/sparse-reset` were merged.
>
> Those two topics' interplay triggers a bug in conjunction with running
> t1091.15 when `GIT_TEST_SPLIT_INDEX=true` in the following way:
> `vd/sparse-sparsity-fix-on-read` ensures that the index is made sparse
> right after reading, and `vd/sparse-reset` ensures that the index is
> made non-sparse again unless running in the `--soft` mode. Since the
> split index feature is incompatible with the sparse index feature, we
> see a symptom like this:
>
> 	fatal: position for replacement 4 exceeds base index size 4
>
> Let's fix this by avoiding the conversion to a sparse index when
> `GIT_TEST_SPLIT_INDEX=true`.

Does [2/3] allow you to sidestep that issue entirely by skipping
1091 altogether?

There are 4 hits of "if (istate->split_index" in the codebase, and
this patch makes me wonder why it is suffice to patch only one of
them.

I also wondered why we test both split_index and environment
separately, instead of splitting the index very early when the
environment variable is set, so that the rest of the runtime does
not have to worry about the environment, but is the reason why such
an approach was not taken was because GIT_TEST_SPLIT_INDEX can later
allow the index to be splitted, even if istate->split_index is still
NULL right now when this function is called?

If that is the reason, it leads to another question.  Even if we
ignore GIT_TEST_SPLIT_INDEX and concentrate only on real workload,
if the in-core index can be NULL when this function is called and
then later can become split, there still must be somebody who
notices that sparse index is unallowed when (or after) such a split
happens, no?  If there is no such code, then this patch would not be
the whole fix and there needs more change to do so, no?

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  sparse-index.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index a1d505d50e9..08f54747bb4 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -136,7 +136,7 @@ static int is_sparse_index_allowed(struct index_state *istate, int flags)
>  		/*
>  		 * The sparse index is not (yet) integrated with a split index.
>  		 */
> -		if (istate->split_index)
> +		if (istate->split_index || git_env_bool("GIT_TEST_SPLIT_INDEX", 0))
>  			return 0;
>  		/*
>  		 * The GIT_TEST_SPARSE_INDEX environment variable triggers the

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Fri, 21 Jan 2022, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > In 6e773527b6b (sparse-index: convert from full to sparse, 2021-03-30),
> > we introduced initial support for a sparse index, and were careful to
> > avoid converting to a sparse index in the presence of a split index.
> >
> > However, when we _just_ read a freshly-initialized index, it might not
> > contain a split index even if _writing_ it will add one by virtue of
> > being asked for via the `GIT_TEST_SPLIT_INDEX` variable.
> >
> > We did not notice any problems with checking _only_ for `split_index`
> > (and not `GIT_TEST_SPLIT_INDEX`) right until both
> > `vd/sparse-sparsity-fix-on-read` _and_ `vd/sparse-reset` were merged.
> >
> > Those two topics' interplay triggers a bug in conjunction with running
> > t1091.15 when `GIT_TEST_SPLIT_INDEX=true` in the following way:
> > `vd/sparse-sparsity-fix-on-read` ensures that the index is made sparse
> > right after reading, and `vd/sparse-reset` ensures that the index is
> > made non-sparse again unless running in the `--soft` mode. Since the
> > split index feature is incompatible with the sparse index feature, we
> > see a symptom like this:
> >
> > 	fatal: position for replacement 4 exceeds base index size 4
> >
> > Let's fix this by avoiding the conversion to a sparse index when
> > `GIT_TEST_SPLIT_INDEX=true`.
>
> Does [2/3] allow you to sidestep that issue entirely by skipping
> 1091 altogether?

You are right that reverting this patch after applying the next patch
_still_ works around the issue. That's because currently only t1091
exposes the bug where `GIT_TEST_SPLIT_INDEX` writes a split index that was
initially non-split (because it was just created).

My intention here was to avoid any problems in case _another_ test script
triggers the same condition. After all, we do have an entire CI job that
forces `GIT_TEST_SPLIT_INDEX=1`, and in microsoft/git we enable sparse
index by default. The entire test suite is therefore surface for the bug
to show.

> There are 4 hits of "if (istate->split_index" in the codebase, and
> this patch makes me wonder why it is suffice to patch only one of
> them.

I tried to catch only those locations where we run the danger of trying to
make a split index also sparse, or a sparse index also split.

> I also wondered why we test both split_index and environment
> separately, instead of splitting the index very early when the
> environment variable is set, so that the rest of the runtime does
> not have to worry about the environment, but is the reason why such
> an approach was not taken was because GIT_TEST_SPLIT_INDEX can later
> allow the index to be splitted, even if istate->split_index is still
> NULL right now when this function is called?

Indeed. A freshly created index will never be split, even if
`GIT_TEST_SPLIT_INDEX` is set. Only when we _write_ this to disk will it
be turned into a split index. At this point, we might have mistakenly
converted the index into a sparse one, though, and this here patch wants
to prevent that from happening.

> If that is the reason, it leads to another question.  Even if we
> ignore GIT_TEST_SPLIT_INDEX and concentrate only on real workload,
> if the in-core index can be NULL when this function is called and
> then later can become split, there still must be somebody who
> notices that sparse index is unallowed when (or after) such a split
> happens, no?  If there is no such code, then this patch would not be
> the whole fix and there needs more change to do so, no?

Indeed. Hence 3/3 which tries specifically to prevent the user from
turning an already-sparse index into a split index. You will note that
3/3 calls die() instead BUG() in such a case because it would constitute a
pilot error, not a bug in Git's code.

Ciao,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2022

This branch is now known as js/sparse-vs-split-index.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2022

This patch series was integrated into seen via git@8cdc332.

@gitgitgadget gitgitgadget bot added the seen label Jan 22, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2022

There was a status update in the "New Topics" section about the branch js/sparse-vs-split-index on the Git mailing list:

Mark in various places in the code that the sparse index and the
split index features are mutually incompatible.

source: <pull.1119.git.1642613379.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2022

On the Git mailing list, Elijah Newren wrote (reply to this):

On Fri, Jan 21, 2022 at 11:53 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> We noticed split/sparse index-related issues while rebasing Microsoft's fork
> of Git. These fixes are necessary for that fork's test suite to pass, but
> they might not be super critical to get into upstream v2.35.0 (especially
> this close to -rc2). However, the team felt that the decision should be left
> to the Git maintainer whether to take these patches into v2.35.0 or not.

Thanks for digging into these and putting in some guardrails to
prevent similar issues.  I hit similar things with some of my changes
and had to fight t1091 to get it to pass in CI under
GIT_TEST_SPLIT_INDEX=1.  I don't recall seeing the specific error you
mention in patch 1, but maybe I just overlooked it.  I ended up
finding little workarounds such as disabling sparse-checkouts at the
end of various tests before new ones I was adding, and being extra
careful to not leave the sparse-index selected.  I probably should
have dug further, but didn't.  Thanks for digging in.

I've read over the patch series; it looks good to me:

Reviewed-by: Elijah Newren <newren@gmail.com>


>
> Johannes Schindelin (3):
>   sparse-index: sparse index is disallowed when split index is active
>   t1091: disable split index
>   split-index: it really is incompatible with the sparse index
>
>  read-cache.c                       |  3 ++
>  sparse-index.c                     |  2 +-
>  split-index.c                      |  3 ++
>  t/t1091-sparse-checkout-builtin.sh | 54 ++++++++++++++----------------
>  4 files changed, 33 insertions(+), 29 deletions(-)
>
>
> base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1119%2Fdscho%2Fsparse-index-vs-split-index-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1119/dscho/sparse-index-vs-split-index-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1119
> --
> gitgitgadget

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2022

User Elijah Newren <newren@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2022

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Elijah,

On Fri, 21 Jan 2022, Elijah Newren wrote:

> On Fri, Jan 21, 2022 at 11:53 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > We noticed split/sparse index-related issues while rebasing Microsoft's fork
> > of Git. These fixes are necessary for that fork's test suite to pass, but
> > they might not be super critical to get into upstream v2.35.0 (especially
> > this close to -rc2). However, the team felt that the decision should be left
> > to the Git maintainer whether to take these patches into v2.35.0 or not.
>
> Thanks for digging into these and putting in some guardrails to
> prevent similar issues.  I hit similar things with some of my changes
> and had to fight t1091 to get it to pass in CI under
> GIT_TEST_SPLIT_INDEX=1.  I don't recall seeing the specific error you
> mention in patch 1, but maybe I just overlooked it.  I ended up
> finding little workarounds such as disabling sparse-checkouts at the
> end of various tests before new ones I was adding, and being extra
> careful to not leave the sparse-index selected.  I probably should
> have dug further, but didn't.  Thanks for digging in.

Seeing how much time it took to properly diagnose and fix these issues, I
do not fault you ;-)

> I've read over the patch series; it looks good to me:
>
> Reviewed-by: Elijah Newren <newren@gmail.com>

Thank you!
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 24, 2022

This patch series was integrated into seen via git@2bea118.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 24, 2022

This patch series was integrated into seen via git@bc2f00a.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 24, 2022

This patch series was integrated into seen via git@b5c4f4a.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 24, 2022

This patch series was integrated into seen via git@d05dc72.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 24, 2022

There was a status update in the "Cooking" section about the branch js/sparse-vs-split-index on the Git mailing list:

Mark in various places in the code that the sparse index and the
split index features are mutually incompatible.

Will merge to 'next'.
source: <pull.1119.git.1642613379.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 26, 2022

This patch series was integrated into seen via git@8cd9af2.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 26, 2022

This patch series was integrated into seen via git@f8160a6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 27, 2022

This patch series was integrated into seen via git@bf0f820.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 27, 2022

This patch series was integrated into next via git@7443487.

@gitgitgadget gitgitgadget bot added the next label Jan 27, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 29, 2022

This patch series was integrated into seen via git@04b94c3.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 2, 2022

This patch series was integrated into seen via git@e115833.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 3, 2022

This patch series was integrated into seen via git@25c8b6c.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 4, 2022

There was a status update in the "Cooking" section about the branch js/sparse-vs-split-index on the Git mailing list:

Mark in various places in the code that the sparse index and the
split index features are mutually incompatible.

Will merge to 'master'.
source: <pull.1119.git.1642613379.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 4, 2022

This patch series was integrated into seen via git@93c2591.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 5, 2022

This patch series was integrated into seen via git@80542ac.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 8, 2022

This patch series was integrated into seen via git@3c99780.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 9, 2022

This patch series was integrated into seen via git@1b82b93.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 9, 2022

This patch series was integrated into next via git@1b82b93.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 9, 2022

This patch series was integrated into master via git@1b82b93.

@gitgitgadget gitgitgadget bot added the master label Feb 9, 2022
@gitgitgadget gitgitgadget bot closed this Feb 9, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Feb 9, 2022

Closed via 1b82b93.

@dscho dscho deleted the sparse-index-vs-split-index branch February 10, 2022 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant