Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion fsmonitor.h
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache
!(ce->ce_flags & CE_FSMONITOR_VALID)) {
if (S_ISGITLINK(ce->ce_mode))
return;
istate->cache_changed = 1;
istate->cache_changed |= FSMONITOR_CHANGED;
ce->ce_flags |= CE_FSMONITOR_VALID;
trace_printf_key(&trace_fsmonitor, "mark_fsmonitor_clean '%s'", ce->name);
}
Expand Down
49 changes: 32 additions & 17 deletions read-cache.c
Original file line number Diff line number Diff line change
Expand Up @@ -2868,6 +2868,16 @@ static int record_ieot(void)
return !git_config_get_index_threads(&val) && val != 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> When a split-index is in effect, the `$GIT_DIR/index` file needs to
> contain a "link" extension that contains all the information about the
> split-index, including the information about the shared index.
> ...
> Let's stop zeroing out the `base_oid` to indicate that the "link"
> extension should not be written.

Nicely explained.

> One might be tempted to simply call `discard_split_index()` instead,
> under the assumption that Git decided to write a non-split index and
> therefore the the `split_index` structure might no longer be wanted.

"the the".

> +enum strip_extensions {
> +	WRITE_ALL_EXTENSIONS = 0,
> +	STRIP_ALL_EXTENSIONS = 1,
> +	STRIP_LINK_EXTENSION_ONLY = 2
> +};

We do not need to spell out the specific values for this enum; the
users' (i.e. the callers of do_write_index()) sole requirement is
for these symbols to have different values.

Also do we envision that (1) we would need to keep STRIP_LINK_ONLY
to be with the largest value among the enum values, or (2) we would
never add new value to the set?  Otherwise let's end the last one
with a trailing comma.

Looking at the way strip_extensions variable is used in
do_write_index(), an alternative design might be to make it a set of
bits (e.g. unsigned write_extension) and give one bit to each
extension.  But such a clean-up is better left outside the topic, I
would imagine, as we do not have any need to skip an arbitrary set
of extensions right now.

> +/*
> + * Write the Git index into a `.lock` file
> + *
> + * If `strip_link_extension` is non-zero, avoid writing any "link" extension
> + * (used by the split-index feature).
> + */

Not exposing "enum strip_extensions" to the caller of this function,
like this patch does, is probably a very safe and sensible thing to
do.  We do not have a reason to allow its callers to (perhaps
mistakenly) pass STRIP_ALL_EXTENSIONS to it.

>  static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
> -				 unsigned flags)
> +				 unsigned flags, int strip_link_extension)
>  {
>  	int ret;
>  	int was_full = istate->sparse_index == INDEX_EXPANDED;
> @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>  	 */
>  	trace2_region_enter_printf("index", "do_write_index", the_repository,
>  				   "%s", get_lock_file_path(lock));
> -	ret = do_write_index(istate, lock->tempfile, 0, flags);
> +	ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags);
>  	trace2_region_leave_printf("index", "do_write_index", the_repository,
>  				   "%s", get_lock_file_path(lock));
>  

OK.

Very nicely done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 3/22/23 5:24 PM, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> >> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>
>> When a split-index is in effect, the `$GIT_DIR/index` file needs to
>> contain a "link" extension that contains all the information about the
>> split-index, including the information about the shared index.
>> ...
>> Let's stop zeroing out the `base_oid` to indicate that the "link"
>> extension should not be written.
> > Nicely explained.
> >> One might be tempted to simply call `discard_split_index()` instead,
>> under the assumption that Git decided to write a non-split index and
>> therefore the the `split_index` structure might no longer be wanted.
> > "the the".
> >> +enum strip_extensions {
>> +	WRITE_ALL_EXTENSIONS = 0,
>> +	STRIP_ALL_EXTENSIONS = 1,
>> +	STRIP_LINK_EXTENSION_ONLY = 2
>> +};
> > We do not need to spell out the specific values for this enum; the
> users' (i.e. the callers of do_write_index()) sole requirement is
> for these symbols to have different values.

There are several calls to do_write_locked_index() that pass 0 or 1
as the new final arg.  If we update them to use these enum values,
then we don't need integer values here.

> > Also do we envision that (1) we would need to keep STRIP_LINK_ONLY
> to be with the largest value among the enum values, or (2) we would
> never add new value to the set?  Otherwise let's end the last one
> with a trailing comma.
> > Looking at the way strip_extensions variable is used in
> do_write_index(), an alternative design might be to make it a set of
> bits (e.g. unsigned write_extension) and give one bit to each
> extension.  But such a clean-up is better left outside the topic, I
> would imagine, as we do not have any need to skip an arbitrary set
> of extensions right now.

Agreed, I thought about suggesting a set of bits too, but right now
we only need to strip all of them or just this one.

> >> +/*
>> + * Write the Git index into a `.lock` file
>> + *
>> + * If `strip_link_extension` is non-zero, avoid writing any "link" extension
>> + * (used by the split-index feature).
>> + */
> > Not exposing "enum strip_extensions" to the caller of this function,
> like this patch does, is probably a very safe and sensible thing to
> do.  We do not have a reason to allow its callers to (perhaps
> mistakenly) pass STRIP_ALL_EXTENSIONS to it.
> >>   static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
>> -				 unsigned flags)
>> +				 unsigned flags, int strip_link_extension)
>>   {
>>   	int ret;
>>   	int was_full = istate->sparse_index == INDEX_EXPANDED;
>> @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>>   	 */
>>   	trace2_region_enter_printf("index", "do_write_index", the_repository,
>>   				   "%s", get_lock_file_path(lock));
>> -	ret = do_write_index(istate, lock->tempfile, 0, flags);
>> +	ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags);

In the else of the ?: operator, could we use the WRITE_ALL_EXTENSIONS
instead of 0?

>>   	trace2_region_leave_printf("index", "do_write_index", the_repository,
>>   				   "%s", get_lock_file_path(lock));
>>   > > OK.
> > Very nicely done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff Hostetler <git@jeffhostetler.com> writes:

>>> +enum strip_extensions {
>>> +	WRITE_ALL_EXTENSIONS = 0,
>>> +	STRIP_ALL_EXTENSIONS = 1,
>>> +	STRIP_LINK_EXTENSION_ONLY = 2
>>> +};
>> We do not need to spell out the specific values for this enum; the
>> users' (i.e. the callers of do_write_index()) sole requirement is
>> for these symbols to have different values.
>
> There are several calls to do_write_locked_index() that pass 0 or 1
> as the new final arg.  If we update them to use these enum values,
> then we don't need integer values here.

Good eyes.  Yes, the new caller that selectively passes
STRIP_LINK_EXTENSION_ONLY should pass WRITE_ALL_EXTENSIONS, not 0,
on the other side of ?: as you pointed out.

Thanks.

}

enum write_extensions {
WRITE_NO_EXTENSION = 0,
WRITE_SPLIT_INDEX_EXTENSION = 1<<0,
WRITE_CACHE_TREE_EXTENSION = 1<<1,
WRITE_RESOLVE_UNDO_EXTENSION = 1<<2,
WRITE_UNTRACKED_CACHE_EXTENSION = 1<<3,
WRITE_FSMONITOR_EXTENSION = 1<<4,
};
#define WRITE_ALL_EXTENSIONS ((enum write_extensions)-1)

/*
* On success, `tempfile` is closed. If it is the temporary file
* of a `struct lock_file`, we will therefore effectively perform
Expand All @@ -2876,7 +2886,7 @@ static int record_ieot(void)
* rely on it.
*/
static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
int strip_extensions, unsigned flags)
enum write_extensions write_extensions, unsigned flags)
{
uint64_t start = getnanotime();
struct hashfile *f;
Expand Down Expand Up @@ -3045,8 +3055,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
return -1;
}

if (!strip_extensions && istate->split_index &&
!is_null_oid(&istate->split_index->base_oid)) {
if (write_extensions & WRITE_SPLIT_INDEX_EXTENSION &&
istate->split_index) {
struct strbuf sb = STRBUF_INIT;

if (istate->sparse_index)
Expand All @@ -3060,7 +3070,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
if (err)
return -1;
}
if (!strip_extensions && !drop_cache_tree && istate->cache_tree) {
if (write_extensions & WRITE_CACHE_TREE_EXTENSION &&
!drop_cache_tree && istate->cache_tree) {
struct strbuf sb = STRBUF_INIT;

cache_tree_write(&sb, istate->cache_tree);
Expand All @@ -3070,7 +3081,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
if (err)
return -1;
}
if (!strip_extensions && istate->resolve_undo) {
if (write_extensions & WRITE_RESOLVE_UNDO_EXTENSION &&
istate->resolve_undo) {
struct strbuf sb = STRBUF_INIT;

resolve_undo_write(&sb, istate->resolve_undo);
Expand All @@ -3081,7 +3093,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
if (err)
return -1;
}
if (!strip_extensions && istate->untracked) {
if (write_extensions & WRITE_UNTRACKED_CACHE_EXTENSION &&
istate->untracked) {
struct strbuf sb = STRBUF_INIT;

write_untracked_extension(&sb, istate->untracked);
Expand All @@ -3092,7 +3105,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
if (err)
return -1;
}
if (!strip_extensions && istate->fsmonitor_last_update) {
if (write_extensions & WRITE_FSMONITOR_EXTENSION &&
istate->fsmonitor_last_update) {
struct strbuf sb = STRBUF_INIT;

write_fsmonitor_extension(&sb, istate);
Expand Down Expand Up @@ -3166,8 +3180,10 @@ static int commit_locked_index(struct lock_file *lk)
return commit_lock_file(lk);
}

static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
unsigned flags)
static int do_write_locked_index(struct index_state *istate,
struct lock_file *lock,
unsigned flags,
enum write_extensions write_extensions)
{
int ret;
int was_full = istate->sparse_index == INDEX_EXPANDED;
Expand All @@ -3185,7 +3201,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
*/
trace2_region_enter_printf("index", "do_write_index", the_repository,
"%s", get_lock_file_path(lock));
ret = do_write_index(istate, lock->tempfile, 0, flags);
ret = do_write_index(istate, lock->tempfile, write_extensions, flags);
trace2_region_leave_printf("index", "do_write_index", the_repository,
"%s", get_lock_file_path(lock));

Expand Down Expand Up @@ -3214,7 +3230,7 @@ static int write_split_index(struct index_state *istate,
{
int ret;
prepare_to_write_split_index(istate);
ret = do_write_locked_index(istate, lock, flags);
ret = do_write_locked_index(istate, lock, flags, WRITE_ALL_EXTENSIONS);
finish_writing_split_index(istate);
return ret;
}
Expand Down Expand Up @@ -3289,7 +3305,7 @@ static int write_shared_index(struct index_state *istate,

trace2_region_enter_printf("index", "shared/do_write_index",
the_repository, "%s", get_tempfile_path(*temp));
ret = do_write_index(si->base, *temp, 1, flags);
ret = do_write_index(si->base, *temp, WRITE_NO_EXTENSION, flags);
trace2_region_leave_printf("index", "shared/do_write_index",
the_repository, "%s", get_tempfile_path(*temp));

Expand Down Expand Up @@ -3366,9 +3382,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
if ((!si && !test_split_index_env) ||
alternate_index_output ||
(istate->cache_changed & ~EXTMASK)) {
if (si)
oidclr(&si->base_oid);
ret = do_write_locked_index(istate, lock, flags);
ret = do_write_locked_index(istate, lock, flags,
~WRITE_SPLIT_INDEX_EXTENSION);
goto out;
}

Expand All @@ -3394,8 +3409,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
/* Same initial permissions as the main .git/index file */
temp = mks_tempfile_sm(git_path("sharedindex_XXXXXX"), 0, 0666);
if (!temp) {
oidclr(&si->base_oid);
ret = do_write_locked_index(istate, lock, flags);
ret = do_write_locked_index(istate, lock, flags,
~WRITE_SPLIT_INDEX_EXTENSION);
goto out;
}
ret = write_shared_index(istate, &temp, flags);
Expand Down
37 changes: 37 additions & 0 deletions t/t7527-builtin-fsmonitor.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1003,4 +1003,41 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 3/22/23 12:00 PM, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > This commit adds a new test case that demonstrates a bug in the
> split-index code that is triggered under certain circumstances when the
> FSMonitor is enabled, and its symptom manifests in the form of one of
> the following error messages:
> >      BUG: fsmonitor.c:20: fsmonitor_dirty has more entries than the index (2 > 1)
> >      BUG: unpack-trees.c:776: pos <n> doesn't point to the first entry of <dir>/ in index
> >      error: invalid path ''
>      error: The following untracked working tree files would be overwritten by reset:
>              initial.t
> > Which of these error messages appears depends on timing-dependent
> conditions.
> > Technically the root cause lies with a bug in the split-index code that
> has nothing to do with FSMonitor, but for the sake of this new test case
> it was the easiest way to trigger the bug.
> > The bug is this: Under specific conditions, Git needs to skip writing
> the "link" extension (which is the index extension containing the
> information pertaining to the split-index). To do that, the `base_oid`
> attribute of the `split_index` structure in the in-memory index is
> zeroed out, and `do_write_index()` specifically checks for a "null"
> `base_oid` to understand that the "link" extension should not be
> written. However, this violates the consistency of the in-memory index
> structure, but that does not cause problems in most cases because the
> process exits without using the in-memory index structure anymore,
> anyway.
> > But: _When_ the in-memory index is still used (which is the case e.g. in
> `git rebase`), subsequent writes of `the_index` are at risk of writing
> out a bogus index file, one that _should_ have a "link" extension but
> does not. In many cases, the `SPLIT_INDEX_ORDERED` flag _happens_ to be
> set for subsequent writes, forcing the shared index to be written, which
> re-initializes `base_oid` to a non-bogus state, and all is good.
> > When it is _not_ set, however, all kinds of mayhem ensue, resulting in
> above-mentioned error messages, and often enough putting worktrees in a
> totally broken state where the only recourse is to manually delete the
> `index` and the `index.lock` files and then call `git reset` manually.
> Not something to ask users to do.
> > The reason why it is comparatively easy to trigger the bug with
> FSMonitor is that there is _another_ bug in the FSMonitor code:
> `mark_fsmonitor_valid()` sets `cache_changed` to 1, i.e. treating that
> variable as a Boolean. But it is a bit field, and 1 happens to be the
> `SOMETHING_CHANGED` bit that forces the "link" extension to be skipped
> when writing the index, among other things.
> > "Comparatively easy" is a relative term in this context, for sure. The
> essence of how the new test case triggers the bug is as following:
> > 1. The `git rebase` invocation will first reset the worktree to
>     a commit that contains only the `one.t` file, and then execute a
>     rebase script that starts with the following commands (commit hashes
>     skipped):
> >     label onto
> >     reset initial
>     pick two
>     label two
> >     reset two
>     pick three
>     [...]
> > 2. Before executing the `label` command, a split index is written, as
>     well as the shared index.
> > 3. The `reset initial` command in the rebase script writes out a new
>     split index but skips writing the shared index, as intended.
> > 4. The `pick two` command updates the worktree and refreshes the index,
>     marking the `two.t` entry as valid via the FSMonitor, which sets the
>     `SOMETHING_CHANGED` bit in `cache_changed`, which in turn causes the
>     `base_oid` attribute to be zeroed out and a full (non-split) index
>     to be written (making sure _not_ to write the "link" extension).
> > 5. Now, the `reset two` command will leave the worktree alone, but
>     still write out a new split index, not writing the shared index
>     (because `base_oid` is still zeroed out, and there is no index entry
>     update requiring it to be written, either).
> > 6. When it is turn to run `pick three`, the index is read, but it is
>     too short: It only contains a single entry when there should be two,
>     because the "link" extension is missing from the written-out index
>     file.
> > There are three bugs at play, actually, which will be fixed over the
> course of the next commits:
> > - The `base_oid` attribute should not be zeroed out to indicate when
>    the "link" extension should not be written, as it puts the in-memory
>    index structure into an inconsistent state.
> > - The FSMonitor should not overwrite bits in `cache_changed`.
> > - The `unpack_trees()` function tries to reuse the `split_index`
>    structure from the source index, if any, but does not propagate the
>    `SPLIT_INDEX_ORDERED` flag.
> > While a fix for the second bug would let this test case pass, there are
> other conditions where the `SOMETHING_CHANGED` bit is set. Therefore,
> the bug that most crucially needs to be fixed is the first one.
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

Very well said.  Thank you!!!

> ---
>   t/t7527-builtin-fsmonitor.sh | 37 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 37 insertions(+)
> > diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index d419085379c..cbafdd69602 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -1003,4 +1003,41 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
>   	egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
>   '
>   > +test_expect_failure 'split-index and FSMonitor work well together' '
> +	git init split-index &&
> +	test_when_finished "git -C \"$PWD/split-index\" \
> +		fsmonitor--daemon stop" &&
> +	(
> +		cd split-index &&
> +		git config core.splitIndex true &&
> +		# force split-index in most cases
> +		git config splitIndex.maxPercentChange 99 &&
> +		git config core.fsmonitor true &&
> +
> +		# Create the following commit topology:
> +		#
> +		# *   merge three
> +		# |\
> +		# | * three
> +		# * | merge two
> +		# |\|
> +		# | * two
> +		# * | one
> +		# |/
> +		# * 5a5efd7 initial
> +
> +		test_commit initial &&
> +		test_commit two &&
> +		test_commit three &&
> +		git reset --hard initial &&
> +		test_commit one &&
> +		test_tick &&
> +		git merge two &&
> +		test_tick &&
> +		git merge three &&
> +
> +		git rebase --force-rebase -r one
> +	)
> +'
> +
>   test_done

'

test_expect_success 'split-index and FSMonitor work well together' '
git init split-index &&
test_when_finished "git -C \"$PWD/split-index\" \
fsmonitor--daemon stop" &&
(
cd split-index &&
git config core.splitIndex true &&
# force split-index in most cases
git config splitIndex.maxPercentChange 99 &&
git config core.fsmonitor true &&

# Create the following commit topology:
#
# * merge three
# |\
# | * three
# * | merge two
# |\|
# | * two
# * | one
# |/
# * 5a5efd7 initial

test_commit initial &&
test_commit two &&
test_commit three &&
git reset --hard initial &&
test_commit one &&
test_tick &&
git merge two &&
test_tick &&
git merge three &&

git rebase --force-rebase -r one
)
'

test_done
2 changes: 2 additions & 0 deletions unpack-trees.c
Original file line number Diff line number Diff line change
Expand Up @@ -1916,6 +1916,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
* create a new one.
*/
o->result.split_index = o->src_index->split_index;
if (o->src_index->cache_changed & SPLIT_INDEX_ORDERED)
o->result.cache_changed |= SPLIT_INDEX_ORDERED;
o->result.split_index->refcount++;
} else {
o->result.split_index = init_split_index(&o->result);
Expand Down