Skip to content

stash: reuse cached index entries in --patch temporary index#2306

Open
adamchainz wants to merge 1 commit into
git:masterfrom
adamchainz:aj/optimize-stash-patch
Open

stash: reuse cached index entries in --patch temporary index#2306
adamchainz wants to merge 1 commit into
git:masterfrom
adamchainz:aj/optimize-stash-patch

Conversation

@adamchainz
Copy link
Copy Markdown
Contributor

CC: Thomas Gummerer t.gummerer@gmail.com, Elijah Newren newren@gmail.com, Phillip Wood phillip.wood@dunelm.org.uk, Victoria Dye vdye@github.com

`git stash -p` prepares the interactive selection by creating a
temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
running the `add -p` machinery.

That temporary index was created by running `git read-tree HEAD`.  The
resulting index had no useful cached stat data or fsmonitor-valid bits
from the real index.  When `run_add_p()` refreshed that temporary index
before showing the first prompt, it could end up lstat(2)-ing every
tracked file, even in a repository where `git diff` and `git restore -p`
can use fsmonitor to avoid that work.

Create the temporary index in-process instead.  Use `unpack_trees()` to
reset the real index contents to HEAD while writing the result to the
temporary index path.  For paths whose index entries already match HEAD,
`oneway_merge()` reuses the existing cache entries, preserving their
cached stat data and `CE_FSMONITOR_VALID` state.

This makes the refresh performed by `run_add_p()` behave like the one
used by `git restore -p`: unchanged paths can be skipped via fsmonitor
instead of being scanned again.

In a 206k file repository with `core.fsmonitor` enabled and a one-line
change in one file, time to first prompt dropped from 34.774 seconds to
0.659 seconds.

Signed-off-by: Adam Johnson <me@adamj.eu>
@adamchainz adamchainz force-pushed the aj/optimize-stash-patch branch from d8438ee to b228160 Compare May 19, 2026 12:15
@adamchainz
Copy link
Copy Markdown
Contributor Author

/preview

@gitgitgadget-git
Copy link
Copy Markdown

Preview email sent as pull.2306.git.git.1779194003106.gitgitgadget@gmail.com

@adamchainz
Copy link
Copy Markdown
Contributor Author

/submit

@gitgitgadget-git
Copy link
Copy Markdown

Submitted as pull.2306.git.git.1779194605735.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-2306/adamchainz/aj/optimize-stash-patch-v1

To fetch this version to local tag pr-git-2306/adamchainz/aj/optimize-stash-patch-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-2306/adamchainz/aj/optimize-stash-patch-v1

@gitgitgadget-git
Copy link
Copy Markdown

This patch series was integrated into seen via 15bd84f.

@gitgitgadget-git
Copy link
Copy Markdown

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Adam Johnson <me@adamj.eu>
>
> `git stash -p` prepares the interactive selection by creating a
> temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
> running the `add -p` machinery.
>
> That temporary index was created by running `git read-tree HEAD`.  The
> resulting index had no useful cached stat data or fsmonitor-valid bits
> from the real index.  When `run_add_p()` refreshed that temporary index
> before showing the first prompt, it could end up lstat(2)-ing every
> tracked file, even in a repository where `git diff` and `git restore -p`
> can use fsmonitor to avoid that work.
>
> Create the temporary index in-process instead.  Use `unpack_trees()` to
> reset the real index contents to HEAD while writing the result to the
> temporary index path.  For paths whose index entries already match HEAD,
> `oneway_merge()` reuses the existing cache entries, preserving their
> cached stat data and `CE_FSMONITOR_VALID` state.

Clever.  As the fsmonitor_valid bit is in-core only, updating the
index in-process would be an obvious and probably the only sensible
way to preserve it.

I however have to wonder if simply replacing the external process
invocation with "git read-tree -m HEAD" (i.e., oneway merge) gives
a similar speed-up.

> This makes the refresh performed by `run_add_p()` behave like the one
> used by `git restore -p`: unchanged paths can be skipped via fsmonitor
> instead of being scanned again.
>
> In a 206k file repository with `core.fsmonitor` enabled and a one-line
> change in one file, time to first prompt dropped from 34.774 seconds to
> 0.659 seconds.

Interesting.

> diff --git a/t/t3904-stash-patch.sh b/t/t3904-stash-patch.sh
> index 90a4ff2c10..4b3241c8cd 100755
> --- a/t/t3904-stash-patch.sh
> +++ b/t/t3904-stash-patch.sh
> @@ -84,6 +84,24 @@ test_expect_success 'none of this moved HEAD' '
>  	verify_saved_head
>  '
>  
> +test_expect_success 'stash -p with unmodified tracked files present' '
> +	git reset --hard &&
> +	echo line1 >alpha &&
> +	echo line1 >beta &&
> +	git add alpha beta &&
> +	git commit -m "add alpha and beta" &&
> +	echo line2 >>alpha &&
> +	echo y | git stash -p &&
> +	echo line1 >expect &&
> +	test_cmp expect alpha &&
> +	test_cmp expect beta &&
> +	git stash pop &&
> +	printf "line1\nline2\n" >expect &&
> +	test_cmp expect alpha &&
> +	echo line1 >expect &&
> +	test_cmp expect beta
> +'

What I read from the proposed log message is that the change is
purely about performance and should not change any behaviour.  Why
do we need a new test in t/t3904?  I would not have surprised if we
saw a new test in t/perf/, though.

Thanks.

@gitgitgadget-git
Copy link
Copy Markdown

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  2 files changed, 83 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/stash.c b/builtin/stash.c
> index 32dbc97b47..48189cb9f7 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -372,6 +372,57 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
>  	return 0;
>  }
>  
> +static int create_index_from_tree(const struct object_id *tree_id,
> +				  const char *index_path)
> +{
> +	int nr_trees = 1;
> +	int ret = 0;
> +	struct unpack_trees_options opts;
> +	struct tree_desc t[MAX_UNPACK_TREES];
> +	struct tree *tree;
> +	struct index_state dst_istate = INDEX_STATE_INIT(the_repository);
> +	struct lock_file lock_file = LOCK_INIT;
> +
> +	repo_read_index_preload(the_repository, NULL, 0);
> +	if (refresh_index(the_repository->index, REFRESH_QUIET, NULL, NULL, NULL))
> +		return -1;

Is this "non-zero return from refresh_index() leads to a failure"
intended?  The old "git read-tree HEAD" wouldn't have cared if the
original index were unmerged, for example, but with this update, we
will see an immediate failure.  There are other conditions that
refresh_index() flips its local variable has_errors on, which leads
to its non-zero return.

Since "git stash -p" is almost always invoked when the user has
unstaged modifications, I am not sure allowing refresh_index() to
notice and barf is what we want here.

@gitgitgadget-git
Copy link
Copy Markdown

This patch series was integrated into seen via f9c94c1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant