Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse index: integrate with git stash #428

Merged
merged 6 commits into from Sep 21, 2021
Merged

Sparse index: integrate with git stash #428

merged 6 commits into from Sep 21, 2021

Conversation

vdye
Copy link
Collaborator

@vdye vdye commented Sep 20, 2021

Changes

  • bugfix for the reset with wildcard pathspec test in t1092
    • pathspec was being resolved before the git command was executed, the * needed to be escaped
  • bugfix for t/perf/run (courtesy of @derrickstolee)
  • new performance & ensure_not_expanded tests for git stash
  • minor refactor to merge_recursive_generic to allow a custom merge function
  • using the change to merge_recursive_generic, swapped out the merge function used in git stash apply/git stash pop
  • the usual changes to repo settings to enable sparse index for cmd_stash

Performance

Measuring this via performance tests yielded some confusing results. When p2000 was run on just this branch, there was a clear improvement in sparse vs full index (5 timing trials per test):

Test                                             this tree      
----------------------------------------------------------------
2000.2: git stash && git stash pop (full-v3)     4.77(2.87+1.42)
2000.3: git stash && git stash pop (full-v4)     5.21(3.15+1.48)
2000.4: git stash && git stash pop (sparse-v3)   1.83(0.41+2.08)
2000.5: git stash && git stash pop (sparse-v4)   1.75(0.43+2.05)

However, when run comparing different commits (also with 5 trials each), there was basically no difference in the last column (testing the same as above):

Test                                             f28fc0188d495     18844c9f657              de87f1d2d702            52ac0489e8fb            HEAD                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git stash && git stash pop (full-v3)     4.93(3.21+1.42)   5.02(3.28+1.48) +1.8%    4.86(3.13+1.45) -1.4%   4.82(3.12+1.44) -2.2%   5.24(3.38+1.55) +6.3% 
2000.3: git stash && git stash pop (full-v4)     4.25(2.83+1.23)   4.91(3.14+1.45) +15.5%   4.47(2.97+1.27) +5.2%   4.61(2.97+1.37) +8.5%   4.85(3.10+1.44) +14.1%
2000.4: git stash && git stash pop (sparse-v3)   8.86(5.92+2.76)   8.92(6.08+2.69) +0.7%    8.58(5.76+2.63) -3.2%   8.25(5.59+2.49) -6.9%   5.08(2.87+2.83) -42.7%
2000.5: git stash && git stash pop (sparse-v4)   8.05(5.47+2.49)   9.40(6.30+2.85) +16.8%   8.72(5.87+2.68) +8.3%   8.38(5.66+2.52) +4.1%   5.00(2.82+2.86) -37.9%

The above tested commits are:

  • merge git reset with sparse index
  • merge git checkout-index with sparse index
  • merge git update-index with sparse index
  • merge git read-tree with sparse index
  • the tip of this branch

The t1092 tests confirm that the index should not be expanded (and a manual check of the trace2 results from running the commands in the p2000 test do not show the index being expanded), so I'm not sure what's going on there. In any event, the index does not appear to be expanded, so in practice that should probably match up with what happens in the first set of performance test results (I think).

@vdye vdye self-assigned this Sep 20, 2021
@derrickstolee
Copy link
Collaborator

Test                                             this tree      
----------------------------------------------------------------
2000.2: git stash && git stash pop (full-v3)     4.77(2.87+1.42)
2000.3: git stash && git stash pop (full-v4)     5.21(3.15+1.48)
2000.4: git stash && git stash pop (sparse-v3)   1.83(0.41+2.08)
2000.5: git stash && git stash pop (sparse-v4)   1.75(0.43+2.05)

However, when run comparing different commits (also with 5 trials each), there was basically no difference in the last column (testing the same as above):

Test                                             f28fc0188d495     18844c9f657              de87f1d2d702            52ac0489e8fb            HEAD                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git stash && git stash pop (full-v3)     4.93(3.21+1.42)   5.02(3.28+1.48) +1.8%    4.86(3.13+1.45) -1.4%   4.82(3.12+1.44) -2.2%   5.24(3.38+1.55) +6.3% 
2000.3: git stash && git stash pop (full-v4)     4.25(2.83+1.23)   4.91(3.14+1.45) +15.5%   4.47(2.97+1.27) +5.2%   4.61(2.97+1.37) +8.5%   4.85(3.10+1.44) +14.1%
2000.4: git stash && git stash pop (sparse-v3)   8.86(5.92+2.76)   8.92(6.08+2.69) +0.7%    8.58(5.76+2.63) -3.2%   8.25(5.59+2.49) -6.9%   5.08(2.87+2.83) -42.7%
2000.5: git stash && git stash pop (sparse-v4)   8.05(5.47+2.49)   9.40(6.30+2.85) +16.8%   8.72(5.87+2.68) +8.3%   8.38(5.66+2.52) +4.1%   5.00(2.82+2.86) -37.9%

I have reproduced this on my machine. Very strange. I'll look into it a bit more tomorrow.

Fix wildcard in pathspec reset test
@derrickstolee
Copy link
Collaborator

I have reproduced this on my machine. Very strange. I'll look into it a bit more tomorrow.

I re-ran the tests with GIT_TRACE2_PERF=<filename> and saw that the version string for each test was the same: v2.33.0.vfs.0.0. That's my installed version, not the built version. This means that somehow the wrong Git version is being executed. Some subcommands are running the built version, which explains why there is any change in the performance at all.

I'll start looking into why the ./run script is failing to execute the correct version.

@derrickstolee
Copy link
Collaborator

I have reproduced this on my machine. Very strange. I'll look into it a bit more tomorrow.

I re-ran the tests with GIT_TRACE2_PERF=<filename> and saw that the version string for each test was the same: v2.33.0.vfs.0.0. That's my installed version, not the built version. This means that somehow the wrong Git version is being executed. Some subcommands are running the built version, which explains why there is any change in the performance at all.

I'll start looking into why the ./run script is failing to execute the correct version.

False alarm: the build directories for ./run are not full Git repos, so they cannot determine the real version from commit history. This makes them default to the version name in GIT-VERSION-GEN. That leads to confusion as to why some subcommands had different versions. The mystery continues...

@derrickstolee
Copy link
Collaborator

I figured it out! It's a bug in run that incorrectly sets the GIT_TEST_INSTALLED directory. See 15e5dbf which I pushed as derrickstolee/git:sparse-index/stash.

derrickstolee and others added 5 commits September 21, 2021 10:02
The GIT_TEST_INSTALLED was moved from perf-lib.sh to run in df0f502
(perf-lib.sh: remove GIT_TEST_INSTALLED from perf-lib.sh, 2019-05-07)
and that included a change to how it inspected the existence of a
bin-wrappers directory. However, that included a typo that made the
match of bin-wrappers never work. Specifically, the assignment was

	mydir_abs_wrappers="$mydir_abs_wrappers/bin-wrappers"

which uses the same variable before it is initialized. By changing it to

	mydir_abs_wrappers="$mydir_abs/bin-wrappers"

We can correctly use the bin-wrappers directory.

This is critical to successfully computing performance of commands that
execute subcommands. The bin-wrappers ensure that the --exec-path is set
correctly.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Most of the functionality of `git stash` exists in already-sparse aware
subcommands, so enabling sparse index in `cmd_stash` allows the index to remain
sparse in most usages of `git stash`. Tests added to `t1092` reflect
commonly-used commands that are already sparse index-compatible.

Signed-off-by: Victoria Dye <vdye@github.com>
The `merge_recursive_generic` function is a wrapper that allows running a merge
on trees (rather than commits) by wrapping the trees as virtual commits. This
functionality is valuable not only for use with `merge_recursive`, but also
`merge_ort_recursive`. Having the merge function used by
`merge_recursive_generic` be an argument allows greater flexibility of usage.

Signed-off-by: Victoria Dye <vdye@github.com>
Because `merge_recursive` forces the index to always be expanded, switching to
`merge_ort_recursive` is needed to allow the index to remain sparse when
applying a stash.

Signed-off-by: Victoria Dye <vdye@github.com>
@derrickstolee
Copy link
Collaborator

derrickstolee commented Sep 21, 2021

Strangely, none of the other builtin changes seem to have had an affect, sadly:

Test                                             ms/vfs-2.33.0     f28fc0188d495            18844c9f657              de87f1d2d702             ms/vfs-2.33.0           sparse-index/stash    
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                     0.20(0.18+0.04)   0.21(0.19+0.03) +5.0%    0.21(0.17+0.05) +5.0%    0.19(0.16+0.04) -5.0%    0.20(0.18+0.04) +0.0%   0.20(0.18+0.04) +0.0% 
2000.3: git status (full-v4)                     0.19(0.18+0.04)   0.20(0.18+0.04) +5.3%    0.19(0.16+0.05) +0.0%    0.21(0.17+0.06) +10.5%   0.19(0.18+0.04) +0.0%   0.21(0.18+0.05) +10.5%
2000.4: git status (sparse-v3)                   0.04(0.06+0.06)   0.04(0.06+0.05) +0.0%    0.05(0.05+0.08) +25.0%   0.05(0.06+0.06) +25.0%   0.04(0.06+0.06) +0.0%   0.05(0.05+0.07) +25.0%
2000.5: git status (sparse-v4)                   0.04(0.06+0.06)   0.04(0.04+0.06) +0.0%    0.04(0.07+0.06) +0.0%    0.05(0.06+0.06) +25.0%   0.04(0.06+0.06) +0.0%   0.05(0.08+0.04) +25.0%
2000.6: git stash && git stash pop (full-v3)     3.56(2.89+0.41)   3.64(2.93+0.44) +2.2%    3.59(2.86+0.43) +0.8%    3.61(2.89+0.46) +1.4%    3.56(2.89+0.41) +0.0%   3.57(2.87+0.44) +0.3% 
2000.7: git stash && git stash pop (full-v4)     3.33(2.83+0.41)   3.37(2.87+0.35) +1.2%    3.37(2.80+0.39) +1.2%    3.38(2.91+0.32) +1.5%    3.33(2.83+0.41) +0.0%   3.38(2.83+0.35) +1.5% 
2000.8: git stash && git stash pop (sparse-v3)   3.18(3.04+0.31)   3.67(3.51+0.33) +15.4%   3.20(3.06+0.31) +0.6%    3.23(3.13+0.27) +1.6%    3.18(3.04+0.31) +0.0%   0.33(0.32+0.21) -89.6%
2000.9: git stash && git stash pop (sparse-v4)   3.21(3.08+0.29)   3.66(3.47+0.35) +14.0%   3.19(3.09+0.28) -0.6%    3.25(3.09+0.32) +1.2%    3.21(3.08+0.29) +0.0%   0.31(0.35+0.21) -90.3%

But at least we are getting that huge performance benefit at the end.

Edit: That second ms/vfs-2.33.0 is a copy/paste error, so I duplicated that test.

@vdye
Copy link
Collaborator Author

vdye commented Sep 21, 2021

Strangely, none of the other builtin changes seem to have had an affect, sadly:

Test                                             ms/vfs-2.33.0     f28fc0188d495            18844c9f657              de87f1d2d702             ms/vfs-2.33.0           sparse-index/stash    
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                     0.20(0.18+0.04)   0.21(0.19+0.03) +5.0%    0.21(0.17+0.05) +5.0%    0.19(0.16+0.04) -5.0%    0.20(0.18+0.04) +0.0%   0.20(0.18+0.04) +0.0% 
2000.3: git status (full-v4)                     0.19(0.18+0.04)   0.20(0.18+0.04) +5.3%    0.19(0.16+0.05) +0.0%    0.21(0.17+0.06) +10.5%   0.19(0.18+0.04) +0.0%   0.21(0.18+0.05) +10.5%
2000.4: git status (sparse-v3)                   0.04(0.06+0.06)   0.04(0.06+0.05) +0.0%    0.05(0.05+0.08) +25.0%   0.05(0.06+0.06) +25.0%   0.04(0.06+0.06) +0.0%   0.05(0.05+0.07) +25.0%
2000.5: git status (sparse-v4)                   0.04(0.06+0.06)   0.04(0.04+0.06) +0.0%    0.04(0.07+0.06) +0.0%    0.05(0.06+0.06) +25.0%   0.04(0.06+0.06) +0.0%   0.05(0.08+0.04) +25.0%
2000.6: git stash && git stash pop (full-v3)     3.56(2.89+0.41)   3.64(2.93+0.44) +2.2%    3.59(2.86+0.43) +0.8%    3.61(2.89+0.46) +1.4%    3.56(2.89+0.41) +0.0%   3.57(2.87+0.44) +0.3% 
2000.7: git stash && git stash pop (full-v4)     3.33(2.83+0.41)   3.37(2.87+0.35) +1.2%    3.37(2.80+0.39) +1.2%    3.38(2.91+0.32) +1.5%    3.33(2.83+0.41) +0.0%   3.38(2.83+0.35) +1.5% 
2000.8: git stash && git stash pop (sparse-v3)   3.18(3.04+0.31)   3.67(3.51+0.33) +15.4%   3.20(3.06+0.31) +0.6%    3.23(3.13+0.27) +1.6%    3.18(3.04+0.31) +0.0%   0.33(0.32+0.21) -89.6%
2000.9: git stash && git stash pop (sparse-v4)   3.21(3.08+0.29)   3.66(3.47+0.35) +14.0%   3.19(3.09+0.28) -0.6%    3.25(3.09+0.32) +1.2%    3.21(3.08+0.29) +0.0%   0.31(0.35+0.21) -90.3%

But at least we are getting that huge performance benefit at the end.

Edit: That second ms/vfs-2.33.0 is a copy/paste error, so I duplicated that test.

That is strange, and I confirmed it on my end as well. Digging into stash more, it looks like for the basic git stash && git stash pop case, only update-index and reset (which is the baseline case here) are used as subcommands. The other subcommands mostly come up in special cases (e.g., checkout-index in git stash -u when untracked files are present).

update-index was interesting though - it did see a significant speedup from full to sparse once it was integrated (~0.75s to ~0.04s), but stash didn't improve overall because it reads (and expands) the index immediately after the subcommand exits.

Copy link
Collaborator

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like your commit ordering here. I left a note on the one about the function pointer, which is a great strategy for now. We will want to present that to the mailing list as something we are less sure about. They might want to do a fully-configurable approach, but they might also want to do a full replacement in one go. It's unclear.

Comment on lines 3811 to +3812
repo_hold_locked_index(opt->repo, &lock, LOCK_DIE_ON_ERROR);
clean = merge_recursive(opt, head_commit, next_commit, ca,
result);
clean = merge_fn(opt, head_commit, next_commit, ca, result);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is odd that this doesn't already allow the ORT strategy if configured to do so.

I went hunting for how this was configured in builtin/merge.c, but that seems like a very complicated procedure.

We could use something like the pull.twohead config here as a way to allow a user to modify the merge strategy. That gives us a safety valve in case the ORT strategy has a problem in one of these consumers.

The other approach could be to just use the ORT algorithm here no matter what. It's a big change, but it might improve things more rapidly.

After having all of these thoughts I think you've found a good middle ground: the function pointer presents a way to use ORT only when we really need it. This works especially well for our early adoption of sparse index in microsoft/git, but we should call special attention to this when upstreaming the work.

@@ -74,7 +74,7 @@ set_git_test_installed () {
mydir=$1

mydir_abs=$(cd $mydir && pwd)
mydir_abs_wrappers="$mydir_abs_wrappers/bin-wrappers"
mydir_abs_wrappers="$mydir_abs/bin-wrappers"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an FYI: I sent this upstream via gitgitgadget#1044, since it is so simple.

See https://lore.kernel.org/git/pull.1044.git.1632239172735.gitgitgadget@gmail.com/T/#u for the discussion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, should I remove it from this branch? Or will it disappear once vfs-2.33.0 is rebased?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will disappear during the rebase onto v2.34.0 (assuming it gets in quickly).

@vdye vdye merged commit c8c6379 into microsoft:vfs-2.33.0 Sep 21, 2021
@vdye vdye deleted the sparse-index/stash branch September 21, 2021 20:02
dscho pushed a commit that referenced this pull request Oct 30, 2021
Sparse index: integrate with `git stash`
derrickstolee pushed a commit that referenced this pull request Oct 30, 2021
Sparse index: integrate with `git stash`
derrickstolee pushed a commit that referenced this pull request Oct 31, 2021
Sparse index: integrate with `git stash`
derrickstolee pushed a commit that referenced this pull request Nov 4, 2021
Sparse index: integrate with `git stash`
derrickstolee pushed a commit that referenced this pull request Nov 4, 2021
Sparse index: integrate with `git stash`
derrickstolee pushed a commit that referenced this pull request Nov 10, 2021
Sparse index: integrate with `git stash`
derrickstolee pushed a commit that referenced this pull request Nov 15, 2021
Sparse index: integrate with `git stash`
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 12, 2022
Sparse index: integrate with `git stash`
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 19, 2022
Sparse index: integrate with `git stash`
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Sparse index: integrate with `git stash`
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Sparse index: integrate with `git stash`
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Sparse index: integrate with `git stash`
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Sparse index: integrate with `git stash`
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Sparse index: integrate with `git stash`
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Sparse index: integrate with `git stash`
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 25, 2022
Sparse index: integrate with `git stash`
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 25, 2022
Sparse index: integrate with `git stash`
dscho pushed a commit that referenced this pull request Feb 1, 2022
Sparse index: integrate with `git stash`
dscho pushed a commit that referenced this pull request Jun 17, 2022
Sparse index: integrate with `git stash`
dscho pushed a commit that referenced this pull request Jun 17, 2022
Sparse index: integrate with `git stash`
dscho pushed a commit that referenced this pull request Jun 17, 2022
Sparse index: integrate with `git stash`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants