New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Non-critical]: sparse index vs split index fixes #1119
Conversation
In 6e77352 (sparse-index: convert from full to sparse, 2021-03-30), we introduced initial support for a sparse index, and were careful to avoid converting to a sparse index in the presence of a split index. However, when we _just_ read a freshly-initialized index, it might not contain a split index even if _writing_ it will add one by virtue of being asked for via the `GIT_TEST_SPLIT_INDEX` variable. We did not notice any problems with checking _only_ for `split_index` (and not `GIT_TEST_SPLIT_INDEX`) right until both `vd/sparse-sparsity-fix-on-read` _and_ `vd/sparse-reset` were merged. Those two topics' interplay triggers a bug in conjunction with running t1091.15 when `GIT_TEST_SPLIT_INDEX=true` in the following way: `vd/sparse-sparsity-fix-on-read` ensures that the index is made sparse right after reading, and `vd/sparse-reset` ensures that the index is made non-sparse again unless running in the `--soft` mode. Since the split index feature is incompatible with the sparse index feature, we see a symptom like this: fatal: position for replacement 4 exceeds base index size 4 Let's fix this by avoiding the conversion to a sparse index when `GIT_TEST_SPLIT_INDEX=true`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In 61feddc (tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests, 2021-08-26), it was already called out that the split index feature is incompatible with the sparse index feature, and its commit message wondered aloud whether more checks would be required to ensure that the split index and sparse index features aren't enabled at the same time. We are about to introduce such additional checks, and indeed, t1091 would utterly fail with them. Therefore, let's preemptively disable the split index for the entirety of t1091. This partially reverts above-mentioned patch because it covered only one test case whereas we want to cover the entire test script. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
... at least for now. So let's error out if we are even trying to initialize the split index when the index is sparse, or when trying to write the split index extension for a sparse index. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
/submit |
Submitted as pull.1119.git.1642613379.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -3009,6 +3009,9 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, | |||
!is_null_oid(&istate->split_index->base_oid)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> ... at least for now. So let's error out if we are even trying to
> initialize the split index when the index is sparse, or when trying to
> write the split index extension for a sparse index.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> read-cache.c | 3 +++
> split-index.c | 3 +++
> 2 files changed, 6 insertions(+)
Good. Will queue.
> diff --git a/read-cache.c b/read-cache.c
> index cbe73f14e5e..a932e01fc7a 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -3009,6 +3009,9 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
> !is_null_oid(&istate->split_index->base_oid)) {
> struct strbuf sb = STRBUF_INIT;
>
> + if (istate->sparse_index)
> + die(_("cannot write split index for a sparse index"));
> +
> err = write_link_extension(&sb, istate) < 0 ||
> write_index_ext_header(f, eoie_c, CACHE_EXT_LINK,
> sb.len) < 0;
> diff --git a/split-index.c b/split-index.c
> index 8e52e891c3b..9d0ccc30d00 100644
> --- a/split-index.c
> +++ b/split-index.c
> @@ -5,6 +5,9 @@
> struct split_index *init_split_index(struct index_state *istate)
> {
> if (!istate->split_index) {
> + if (istate->sparse_index)
> + die(_("cannot use split index with a sparse index"));
> +
> CALLOC_ARRAY(istate->split_index, 1);
> istate->split_index->refcount = 1;
> }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio,
On Fri, 21 Jan 2022, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > ... at least for now. So let's error out if we are even trying to
> > initialize the split index when the index is sparse, or when trying to
> > write the split index extension for a sparse index.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > read-cache.c | 3 +++
> > split-index.c | 3 +++
> > 2 files changed, 6 insertions(+)
>
> Good. Will queue.
Thank you!
Dscho
@@ -136,7 +136,7 @@ static int is_sparse_index_allowed(struct index_state *istate, int flags) | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In 6e773527b6b (sparse-index: convert from full to sparse, 2021-03-30),
> we introduced initial support for a sparse index, and were careful to
> avoid converting to a sparse index in the presence of a split index.
>
> However, when we _just_ read a freshly-initialized index, it might not
> contain a split index even if _writing_ it will add one by virtue of
> being asked for via the `GIT_TEST_SPLIT_INDEX` variable.
>
> We did not notice any problems with checking _only_ for `split_index`
> (and not `GIT_TEST_SPLIT_INDEX`) right until both
> `vd/sparse-sparsity-fix-on-read` _and_ `vd/sparse-reset` were merged.
>
> Those two topics' interplay triggers a bug in conjunction with running
> t1091.15 when `GIT_TEST_SPLIT_INDEX=true` in the following way:
> `vd/sparse-sparsity-fix-on-read` ensures that the index is made sparse
> right after reading, and `vd/sparse-reset` ensures that the index is
> made non-sparse again unless running in the `--soft` mode. Since the
> split index feature is incompatible with the sparse index feature, we
> see a symptom like this:
>
> fatal: position for replacement 4 exceeds base index size 4
>
> Let's fix this by avoiding the conversion to a sparse index when
> `GIT_TEST_SPLIT_INDEX=true`.
Does [2/3] allow you to sidestep that issue entirely by skipping
1091 altogether?
There are 4 hits of "if (istate->split_index" in the codebase, and
this patch makes me wonder why it is suffice to patch only one of
them.
I also wondered why we test both split_index and environment
separately, instead of splitting the index very early when the
environment variable is set, so that the rest of the runtime does
not have to worry about the environment, but is the reason why such
an approach was not taken was because GIT_TEST_SPLIT_INDEX can later
allow the index to be splitted, even if istate->split_index is still
NULL right now when this function is called?
If that is the reason, it leads to another question. Even if we
ignore GIT_TEST_SPLIT_INDEX and concentrate only on real workload,
if the in-core index can be NULL when this function is called and
then later can become split, there still must be somebody who
notices that sparse index is unallowed when (or after) such a split
happens, no? If there is no such code, then this patch would not be
the whole fix and there needs more change to do so, no?
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> sparse-index.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index a1d505d50e9..08f54747bb4 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -136,7 +136,7 @@ static int is_sparse_index_allowed(struct index_state *istate, int flags)
> /*
> * The sparse index is not (yet) integrated with a split index.
> */
> - if (istate->split_index)
> + if (istate->split_index || git_env_bool("GIT_TEST_SPLIT_INDEX", 0))
> return 0;
> /*
> * The GIT_TEST_SPARSE_INDEX environment variable triggers the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio,
On Fri, 21 Jan 2022, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > In 6e773527b6b (sparse-index: convert from full to sparse, 2021-03-30),
> > we introduced initial support for a sparse index, and were careful to
> > avoid converting to a sparse index in the presence of a split index.
> >
> > However, when we _just_ read a freshly-initialized index, it might not
> > contain a split index even if _writing_ it will add one by virtue of
> > being asked for via the `GIT_TEST_SPLIT_INDEX` variable.
> >
> > We did not notice any problems with checking _only_ for `split_index`
> > (and not `GIT_TEST_SPLIT_INDEX`) right until both
> > `vd/sparse-sparsity-fix-on-read` _and_ `vd/sparse-reset` were merged.
> >
> > Those two topics' interplay triggers a bug in conjunction with running
> > t1091.15 when `GIT_TEST_SPLIT_INDEX=true` in the following way:
> > `vd/sparse-sparsity-fix-on-read` ensures that the index is made sparse
> > right after reading, and `vd/sparse-reset` ensures that the index is
> > made non-sparse again unless running in the `--soft` mode. Since the
> > split index feature is incompatible with the sparse index feature, we
> > see a symptom like this:
> >
> > fatal: position for replacement 4 exceeds base index size 4
> >
> > Let's fix this by avoiding the conversion to a sparse index when
> > `GIT_TEST_SPLIT_INDEX=true`.
>
> Does [2/3] allow you to sidestep that issue entirely by skipping
> 1091 altogether?
You are right that reverting this patch after applying the next patch
_still_ works around the issue. That's because currently only t1091
exposes the bug where `GIT_TEST_SPLIT_INDEX` writes a split index that was
initially non-split (because it was just created).
My intention here was to avoid any problems in case _another_ test script
triggers the same condition. After all, we do have an entire CI job that
forces `GIT_TEST_SPLIT_INDEX=1`, and in microsoft/git we enable sparse
index by default. The entire test suite is therefore surface for the bug
to show.
> There are 4 hits of "if (istate->split_index" in the codebase, and
> this patch makes me wonder why it is suffice to patch only one of
> them.
I tried to catch only those locations where we run the danger of trying to
make a split index also sparse, or a sparse index also split.
> I also wondered why we test both split_index and environment
> separately, instead of splitting the index very early when the
> environment variable is set, so that the rest of the runtime does
> not have to worry about the environment, but is the reason why such
> an approach was not taken was because GIT_TEST_SPLIT_INDEX can later
> allow the index to be splitted, even if istate->split_index is still
> NULL right now when this function is called?
Indeed. A freshly created index will never be split, even if
`GIT_TEST_SPLIT_INDEX` is set. Only when we _write_ this to disk will it
be turned into a split index. At this point, we might have mistakenly
converted the index into a sparse one, though, and this here patch wants
to prevent that from happening.
> If that is the reason, it leads to another question. Even if we
> ignore GIT_TEST_SPLIT_INDEX and concentrate only on real workload,
> if the in-core index can be NULL when this function is called and
> then later can become split, there still must be somebody who
> notices that sparse index is unallowed when (or after) such a split
> happens, no? If there is no such code, then this patch would not be
> the whole fix and there needs more change to do so, no?
Indeed. Hence 3/3 which tries specifically to prevent the user from
turning an already-sparse index into a split index. You will note that
3/3 calls die() instead BUG() in such a case because it would constitute a
pilot error, not a bug in Git's code.
Ciao,
Dscho
This branch is now known as |
This patch series was integrated into seen via git@8cdc332. |
There was a status update in the "New Topics" section about the branch Mark in various places in the code that the sparse index and the split index features are mutually incompatible. source: <pull.1119.git.1642613379.gitgitgadget@gmail.com> |
On the Git mailing list, Elijah Newren wrote (reply to this):
|
User |
On the Git mailing list, Johannes Schindelin wrote (reply to this):
|
This patch series was integrated into seen via git@2bea118. |
This patch series was integrated into seen via git@bc2f00a. |
This patch series was integrated into seen via git@b5c4f4a. |
This patch series was integrated into seen via git@d05dc72. |
There was a status update in the "Cooking" section about the branch Mark in various places in the code that the sparse index and the split index features are mutually incompatible. Will merge to 'next'. source: <pull.1119.git.1642613379.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@8cd9af2. |
This patch series was integrated into seen via git@f8160a6. |
This patch series was integrated into seen via git@bf0f820. |
This patch series was integrated into next via git@7443487. |
This patch series was integrated into seen via git@04b94c3. |
This patch series was integrated into seen via git@e115833. |
This patch series was integrated into seen via git@25c8b6c. |
There was a status update in the "Cooking" section about the branch Mark in various places in the code that the sparse index and the split index features are mutually incompatible. Will merge to 'master'. source: <pull.1119.git.1642613379.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@93c2591. |
This patch series was integrated into seen via git@80542ac. |
This patch series was integrated into seen via git@3c99780. |
This patch series was integrated into seen via git@1b82b93. |
This patch series was integrated into next via git@1b82b93. |
This patch series was integrated into master via git@1b82b93. |
Closed via 1b82b93. |
We noticed split/sparse index-related issues while rebasing Microsoft's fork of Git. These fixes are necessary for that fork's test suite to pass, but they might not be super critical to get into upstream v2.35.0 (especially this close to -rc2). However, the team felt that the decision should be left to the Git maintainer whether to take these patches into v2.35.0 or not.
cc: Elijah Newren newren@gmail.com