Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse index: integrate with the sparse-checkout builtin #421

Conversation

derrickstolee
Copy link
Collaborator

This integrates the sparse-checkout builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a struct pattern_list in-memory in builtin/sparse-checkout.c then apply those patterns to the index before writing the patterns to the sparse-checkout file. The update_sparsity() method does the work to assign the SKIP_WORKTREE bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new expand_to_pattern_list() method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The clean_tracked_sparse_directories() method is called after update_sparsity(), but we need to read the A/B/.gitignore file (or lack thereof) before we can delete A/B/. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in t1092, but I add checks for ensure_not_expanded in some hopefully interesting cases.

As for performance, git sparse-checkout set can be slow if it needs to move a lot of files. However, no-op git sparse-checkout set (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at HEAD:

Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'

Copy link
Collaborator

@vdye vdye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments pretty much amount to documentation change requests, but I did have some questions about the implications of the updated sparse_index_mode on other commands.

sparse-index.c Outdated Show resolved Hide resolved
sparse-index.c Outdated Show resolved Hide resolved
cache.h Outdated Show resolved Hide resolved
sparse-index.c Outdated Show resolved Hide resolved
In order to allow modifying the sparse-checkout cone using a sparse
index without expanding to a full one, we need to be able to replace
sparse directory entries with their contained files and subdirectories
so other code paths can discover those cache entries and write the
corresponding files to disk before committing the index.

We already have logic in ensure_full_index() that expands the index
entries, so we will use that as our base. Create
expand_to_pattern_list() which takes a pattern list, but for now mostly
ignores it. The current implementation is only correct when the pattern
list is NULL as that does the same as ensure_full_index(). In fact,
ensure_full_index() is converted to a shim over
expand_to_pattern_list().

A future update will actually implement expand_to_pattern_list() to its
full capabilities. For now, it is created and documented. We also start
using doc-style comments in sparse-index.h.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When matching against a generic pattern list, the 'basename' is
important for some patterns. However, it and the 'dtype' parameter are
irrelevant for cone mode sparse-checkout patterns. If we know that we
are working with cone mode patterns from the start, then we can speed up
the pattern check slightly by not computing the 'basename'.

In many existing consumers, the 'basename' is already known from
context, but some new consumers we compute this on-demand. A future
change will add more calls that do not have the 'basename' from context
and would need to compute it for many cache entries in a tight loop.
Avoid this problem by creating the new
path_matches_cone_mode_pattern_list() method.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
A future change will present a temporary, in-memory mode where the index
can both contain sparse directory entries but also not be completely
collapsed to the smallest possible sparse directories. This will be
necessary for modifying the sparse-checkout definition while using a
sparse index.

For now, convert the single-bit member 'sparse_index' in 'struct
index_state' to be a an 'enum sparse_index_mode' with three modes:

* COMPLETELY_FULL (0): No sparse directories exist.

* COMPLETELY_SPARSE (1): Sparse directories may exist. Files outside the
  sparse-checkout cone are reduced to sparse directory entries whenever
  possible.

* PARTIALLY_SPARSE (2): Sparse directories may exist. Some file entries
  outside the sparse-checkout cone may exist. Running
  convert_to_sparse() may further reduce those files to sparse directory
  entries.

The main reason to store this extra information is to allow
convert_to_sparse() to short-circuit when the index is already in
COMPLETELY_SPARSE mode but to actually do the necessary work when in
PARTIALLY_SPARSE mode.

The PARTIALLY_SPARSE mode will be used in an upcoming change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Given a 'struct cache_tree', it may be beneficial to navigate directly
to a node within that corresponds to a given path name. Create
cache_tree_find_path() for this function. It returns NULL when no such
path exists.

The implementation is adapted from do_invalidate_path() which does a
similar search but also modifies the nodes it finds along the way.

This new method is not currently used, but will be in an upcoming
change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When the --no-sparse-index option is supplied, the sparse-checkout
builtin should explicitly ask to expand a sparse index to a full one.
This is currently done implicitly due to the command_requires_full_index
protection, but that will be removed in an upcoming change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The expand_to_pattern_list() method expands sparse directory entries
to their list of contained files when either the pattern list is NULL or
the directory is contained in the new pattern list's cone mode patterns.

It is possible that the pattern list has a recursive match with a
directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need
to be expanded. If there exists a directory 'A/B/D/', then that
directory should not be expanded and instead we can create a sparse
directory.

To implement this, we plug into the add_path_to_index() callback for the
call to read_tree_at(). Since we now need access to both the index we
are writing and the pattern list we are comparing, create a 'struct
modify_index_context' to use as a data transfer object. It is important
that we use the given pattern list since we will use this pattern list
to change the sparse-checkout patterns and cannot use
istate->sparse_checkout_patterns.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
To complete the implementation of expand_to_pattern_list(), we need to
detect when a sparse directory entry should remain sparse. This avoids a
full expansion, so we now need to use the PARTIALLY_SPARSE mode to
indicate this state.

There still are no callers to this method, but we will add one in the
next change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When modifying the sparse-checkout definition, the sparse-checkout
builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all
cache entries in the index. Before, we needed the index to be fully
expanded in order to ensure we had the full list of files necessary that
match the new patterns.

Insert a call to reset_sparse_directories() that expands sparse
directories that are within the new pattern list, but only far enough
that every necessary file path now exists as a cache entry. The
remaining logic within update_sparsity() will modify the SKIP_WORKTREE
bits appropriately.

This allows us to disable command_requires_full_index within the
sparse-checkout builtin. Add tests that demonstrate that we are not
expanding to a full index unnecessarily.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee derrickstolee merged commit f9255a5 into microsoft:vfs-2.33.0 Sep 7, 2021
derrickstolee added a commit that referenced this pull request Sep 13, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Oct 30, 2021
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Oct 30, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee added a commit that referenced this pull request Oct 30, 2021
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
derrickstolee added a commit that referenced this pull request Oct 30, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee added a commit that referenced this pull request Oct 31, 2021
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
derrickstolee added a commit that referenced this pull request Oct 31, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee added a commit that referenced this pull request Nov 4, 2021
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
derrickstolee added a commit that referenced this pull request Nov 4, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee added a commit that referenced this pull request Nov 4, 2021
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
derrickstolee added a commit that referenced this pull request Nov 4, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 25, 2022
…parse-checkout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark microsoft#1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark microsoft#2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 25, 2022
…tests

One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* microsoft#410 
* microsoft#421 
* microsoft#417 
* microsoft#419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because microsoft#423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Feb 1, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Feb 1, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jun 17, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jun 17, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jun 17, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jun 17, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jun 17, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jun 17, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jun 17, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jun 18, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jun 18, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jun 22, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jun 22, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jun 27, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jun 27, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee added a commit that referenced this pull request Jun 27, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
derrickstolee added a commit that referenced this pull request Jun 27, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho pushed a commit that referenced this pull request Jul 12, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
dscho pushed a commit that referenced this pull request Jul 12, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee added a commit that referenced this pull request Aug 31, 2022
…ckout` builtin

This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.

Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.

The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.

This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion.

Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases.

As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`:

```
Benchmark #1: baseline
  Time (mean ± σ):     10.465 s ±  0.018 s    [User: 9.885 s, System: 0.573 s]
  Range (min … max):   10.450 s … 10.497 s    5 runs
 
Benchmark #2: new code
  Time (mean ± σ):      68.9 ms ±   2.9 ms    [User: 45.8 ms, System: 17.1 ms]
  Range (min … max):    63.4 ms …  74.0 ms    41 runs
 
Summary
  'new code' ran
  151.89 ± 6.30 times faster than 'baseline'
```
derrickstolee added a commit that referenced this pull request Aug 31, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations.

Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. 

To get these results, I ran the following command in `t/perf`:

```
 ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh
```

The short-shas correspond to the merge commits for these PRs:

* #410 
* #421 
* #417 
* #419

The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy.


```
Test                                                   4bcd533       f9255a5              f28fc01              b713582           
-------------------------------------------------------------------------------------------------------------------------------------------------
2000.2: git status (full-v3)                           0.19(0.15+0.05)   0.19(0.16+0.05) +0.0%    0.20(0.18+0.03) +5.3%    0.19(0.17+0.04) +0.0% 
2000.3: git status (full-v4)                           0.20(0.18+0.04)   0.19(0.15+0.06) -5.0%    0.21(0.18+0.05) +5.0%    0.18(0.18+0.02) -10.0%
2000.4: git status (sparse-v3)                         0.04(0.04+0.04)   0.05(0.07+0.04) +25.0%   0.04(0.04+0.05) +0.0%    0.04(0.06+0.04) +0.0% 
2000.5: git status (sparse-v4)                         0.04(0.03+0.06)   0.04(0.05+0.05) +0.0%    0.05(0.05+0.04) +25.0%   0.05(0.06+0.04) +25.0%
2000.6: git add -A (full-v3)                           0.36(0.29+0.05)   0.38(0.28+0.07) +5.6%    0.36(0.31+0.05) +0.0%    0.37(0.31+0.05) +2.8% 
2000.7: git add -A (full-v4)                           0.34(0.27+0.06)   0.34(0.29+0.05) +0.0%    0.34(0.29+0.04) +0.0%    0.35(0.28+0.06) +2.9% 
2000.8: git add -A (sparse-v3)                         0.06(0.07+0.04)   0.06(0.05+0.06) +0.0%    0.06(0.09+0.01) +0.0%    0.06(0.08+0.03) +0.0% 
2000.9: git add -A (sparse-v4)                         0.05(0.05+0.04)   0.05(0.05+0.07) +0.0%    0.05(0.04+0.06) +0.0%    0.06(0.06+0.05) +20.0%
2000.10: git add . (full-v3)                           0.38(0.31+0.05)   0.37(0.29+0.06) -2.6%    0.37(0.30+0.07) -2.6%    0.37(0.29+0.06) -2.6% 
2000.11: git add . (full-v4)                           0.35(0.31+0.04)   0.35(0.29+0.07) +0.0%    0.35(0.29+0.05) +0.0%    0.34(0.29+0.06) -2.9% 
2000.12: git add . (sparse-v3)                         0.06(0.06+0.05)   0.06(0.05+0.06) +0.0%    0.06(0.07+0.05) +0.0%    0.06(0.09+0.03) +0.0% 
2000.13: git add . (sparse-v4)                         0.06(0.06+0.06)   0.06(0.07+0.04) +0.0%    0.05(0.06+0.05) -16.7%   0.05(0.05+0.07) -16.7%
2000.14: git commit -a -m A (full-v3)                  0.48(0.37+0.08)   0.45(0.36+0.08) -6.2%    0.45(0.35+0.09) -6.2%    0.44(0.36+0.07) -8.3% 
2000.15: git commit -a -m A (full-v4)                  0.45(0.40+0.06)   0.43(0.34+0.07) -4.4%    0.45(0.37+0.06) +0.0%    0.42(0.36+0.05) -6.7% 
2000.16: git commit -a -m A (sparse-v3)                0.05(0.05+0.06)   0.05(0.05+0.03) +0.0%    0.05(0.06+0.06) +0.0%    0.05(0.04+0.06) +0.0% 
2000.17: git commit -a -m A (sparse-v4)                0.05(0.06+0.03)   0.05(0.06+0.04) +0.0%    0.06(0.07+0.05) +20.0%   0.05(0.04+0.06) +0.0% 
2000.18: git checkout -f - (full-v3)                   0.55(0.43+0.08)   0.54(0.46+0.05) -1.8%    0.55(0.46+0.07) +0.0%    0.54(0.40+0.10) -1.8% 
2000.19: git checkout -f - (full-v4)                   0.55(0.41+0.09)   0.50(0.40+0.09) -9.1%    0.51(0.46+0.05) -7.3%    0.51(0.44+0.06) -7.3% 
2000.20: git checkout -f - (sparse-v3)                 0.06(0.09+0.03)   0.06(0.08+0.03) +0.0%    0.06(0.06+0.05) +0.0%    0.07(0.09+0.03) +16.7%
2000.21: git checkout -f - (sparse-v4)                 0.06(0.08+0.04)   0.05(0.07+0.05) -16.7%   0.05(0.07+0.04) -16.7%   0.06(0.09+0.03) +0.0% 
```

All of the above were already integrated.

```
2000.22: git reset (full-v3)                           0.41(0.32+0.06)   0.40(0.31+0.06) -2.4%    0.41(0.33+0.05) +0.0%    0.42(0.34+0.04) +2.4% 
2000.23: git reset (full-v4)                           0.37(0.32+0.05)   0.35(0.30+0.05) -5.4%    0.37(0.30+0.05) +0.0%    0.35(0.31+0.03) -5.4% 
2000.24: git reset (sparse-v3)                         0.68(0.65+0.05)   0.55(0.52+0.04) -19.1%   0.04(0.05+0.04) -94.1%   0.04(0.05+0.04) -94.1%
2000.25: git reset (sparse-v4)                         0.70(0.65+0.05)   0.54(0.50+0.06) -22.9%   0.04(0.07+0.01) -94.3%   0.03(0.05+0.05) -95.7%
2000.26: git reset --hard (full-v3)                    0.54(0.43+0.07)   0.53(0.43+0.06) -1.9%    0.55(0.46+0.05) +1.9%    0.55(0.44+0.06) +1.9% 
2000.27: git reset --hard (full-v4)                    0.50(0.45+0.03)   0.50(0.43+0.05) +0.0%    0.49(0.41+0.06) -2.0%    0.50(0.42+0.05) +0.0% 
2000.28: git reset --hard (sparse-v3)                  0.83(0.76+0.06)   0.68(0.62+0.05) -18.1%   0.07(0.05+0.02) -91.6%   0.07(0.05+0.02) -91.6%
2000.29: git reset --hard (sparse-v4)                  0.80(0.75+0.05)   0.69(0.62+0.06) -13.8%   0.07(0.04+0.02) -91.2%   0.07(0.04+0.03) -91.2%
```

As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case.

```
2000.30: git update-index --add --remove (full-v3)     0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.01+0.01) +0.0% 
2000.31: git update-index --add --remove (full-v4)     0.03(0.02+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.03+0.00) +0.0%    0.03(0.02+0.01) +0.0% 
2000.32: git update-index --add --remove (sparse-v3)   0.57(0.54+0.02)   0.43(0.42+0.00) -24.6%   0.44(0.41+0.03) -22.8%   0.44(0.42+0.01) -22.8%
2000.33: git update-index --add --remove (sparse-v4)   0.56(0.52+0.04)   0.43(0.42+0.01) -23.2%   0.44(0.42+0.02) -21.4%   0.42(0.41+0.01) -25.0%
```

These do not change significantly because #423 is not merged.

```
2000.34: git diff (full-v3)                            0.07(0.05+0.03)   0.06(0.05+0.03) -14.3%   0.07(0.05+0.03) +0.0%    0.06(0.05+0.03) -14.3%
2000.35: git diff (full-v4)                            0.06(0.05+0.03)   0.06(0.05+0.02) +0.0%    0.06(0.05+0.02) +0.0%    0.06(0.06+0.02) +0.0% 
2000.36: git diff (sparse-v3)                          0.25(0.23+0.03)   0.17(0.17+0.02) -32.0%   0.18(0.18+0.02) -28.0%   0.01(0.03+0.03) -96.0%
2000.37: git diff (sparse-v4)                          0.25(0.22+0.05)   0.16(0.16+0.01) -36.0%   0.18(0.15+0.04) -28.0%   0.01(0.04+0.02) -96.0%
2000.38: git diff --staged (full-v3)                   0.03(0.01+0.01)   0.03(0.02+0.01) +0.0%    0.03(0.02+0.01) +0.0%    0.03(0.02+0.00) +0.0% 
2000.39: git diff --staged (full-v4)                   0.04(0.03+0.01)   0.03(0.02+0.01) -25.0%   0.03(0.03+0.00) -25.0%   0.03(0.03+0.00) -25.0%
2000.40: git diff --staged (sparse-v3)                 0.21(0.19+0.01)   0.15(0.13+0.01) -28.6%   0.15(0.14+0.01) -28.6%   0.01(0.01+0.00) -95.2%
2000.41: git diff --staged (sparse-v4)                 0.22(0.21+0.01)   0.14(0.11+0.03) -36.4%   0.15(0.13+0.02) -31.8%   0.01(0.01+0.00) -95.5%
```

The `git diff` improvements are measurable.

```
2000.42: git sparse-checkout reapply (full-v3)         0.63(0.54+0.05)   0.56(0.48+0.04) -11.1%   0.57(0.48+0.03) -9.5%    0.59(0.48+0.05) -6.3% 
2000.43: git sparse-checkout reapply (full-v4)         0.60(0.54+0.02)   0.51(0.46+0.03) -15.0%   0.54(0.48+0.02) -10.0%   0.50(0.44+0.04) -16.7%
2000.44: git sparse-checkout reapply (sparse-v3)       0.91(0.86+0.05)   0.05(0.05+0.00) -94.5%   0.06(0.05+0.01) -93.4%   0.06(0.06+0.00) -93.4%
2000.45: git sparse-checkout reapply (sparse-v4)       0.92(0.88+0.04)   0.05(0.05+0.00) -94.6%   0.05(0.05+0.01) -94.6%   0.05(0.04+0.01) -94.6%
```

Finally, the `git sparse-checkout` measurements are also present.

This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants