Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Index: Design, Format, Tests #883

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Feb 23, 2021

Here is the first full patch series submission coming out of the sparse-index RFC [1].

[1] https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/

I won't waste too much space here, because PATCH 1 includes a sizeable design document that describes the feature, the reasoning behind it, and my plan for getting this implemented widely throughout the codebase.

There are some new things here that were not in the RFC:

  • Design doc and format updates. (Patch 1)
  • Performance test script. (Patches 2 and 20)

Notably missing in this series from the RFC:

  • The mega-patch inserting ensure_full_index() throughout the codebase. That will be a follow-up series to this one.
  • The integrations with git status and git add to demonstrate the improved performance. Those will also appear in their own series later.

I plan to keep my latest work in this area in my 'sparse-index/wip' branch [2]. It includes all of the work from the RFC right now, updated with the work from this series.

[2] https://github.com/derrickstolee/git/tree/sparse-index/wip

Updates in V5

This version is updated to use an index extension instead of a repository format extension. Thanks, Szeder! This one change affects the range-diff quite a bit, so please review those changes carefully.

In particular: git sparse-checkout init --cone --sparse-index now sets a new index.sparse config option as an indicator that we should attempt writing the index in sparse form.

Updates in V4

  • Rebased onto the latest copy of ab/read-tree.
  • Updated the design document as per Junio's comments.
  • Updated the submodule handling in the performance test.
  • Followed up on some other review from Ævar, mostly style or commit message things.

Updates in V3

For this version, I took Ævar's latest patches and applied them to v2.31.0 and rebased this series on top. It uses his new "read_tree_at()" helper and the associated changes to the function pointer type.

  • Fixed more typos. Thanks Martin and Elijah!
  • Updated the test_sparse_match() macro to use "$@" instead of $*
  • Added a test that git sparse-checkout init --no-sparse-index rewrites the index to be full.

Updates in V2

  • Various typos and awkward grammar is fixed.
  • Cleaned up unnecessary commands in p2000-sparse-operations.sh
  • Added a comment to the sparse_index member of struct index_state.
  • Used tree_type, commit_type, and blob_type in test-read-cache.c.

Thanks,
-Stolee

Cc: newren@gmail.com
Cc: gitster@pobox.com
Cc: pclouds@gmail.com
Cc: jrnieder@gmail.com
cc: Martin Ågren martin.agren@gmail.com
cc: Derrick Stolee stolee@gmail.com
cc: SZEDER Gábor szeder.dev@gmail.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com

@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 23, 2021

Submitted as pull.883.git.1614111270.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v1

To fetch this version to local tag pr-883/derrickstolee/sparse-index/format-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-883/derrickstolee/sparse-index/format-v1

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 24, 2021

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Here is the first full patch series submission coming out of the
> sparse-index RFC [1].

Wahoo!  I'll be reading these over the next few days.

> [1]
> https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/
>
> I won't waste too much space here, because PATCH 1 includes a sizeable
> design document that describes the feature, the reasoning behind it, and my
> plan for getting this implemented widely throughout the codebase.
>
> There are some new things here that were not in the RFC:
>
>  * Design doc and format updates. (Patch 1)
>  * Performance test script. (Patches 2 and 20)
>
> Notably missing in this series from the RFC:
>
>  * The mega-patch inserting ensure_full_index() throughout the codebase.
>    That will be a follow-up series to this one.
>  * The integrations with git status and git add to demonstrate the improved
>    performance. Those will also appear in their own series later.
>
> I plan to keep my latest work in this area in my 'sparse-index/wip' branch
> [2]. It includes all of the work from the RFC right now, updated with the
> work from this series.
>
> [2] https://github.com/derrickstolee/git/tree/sparse-index/wip
>
> Thanks, -Stolee
>
> Derrick Stolee (20):
>   sparse-index: design doc and format update
>   t/perf: add performance test for sparse operations
>   t1092: clean up script quoting
>   sparse-index: add guard to ensure full index
>   sparse-index: implement ensure_full_index()
>   t1092: compare sparse-checkout to sparse-index
>   test-read-cache: print cache entries with --table
>   test-tool: don't force full index
>   unpack-trees: ensure full index
>   sparse-checkout: hold pattern list in index
>   sparse-index: convert from full to sparse
>   submodule: sparse-index should not collapse links
>   unpack-trees: allow sparse directories
>   sparse-index: check index conversion happens
>   sparse-index: create extension for compatibility
>   sparse-checkout: toggle sparse index from builtin
>   sparse-checkout: disable sparse-index
>   cache-tree: integrate with sparse directory entries
>   sparse-index: loose integration with cache_tree_verify()
>   p2000: add sparse-index repos
>
>  Documentation/config/extensions.txt      |   7 +
>  Documentation/git-sparse-checkout.txt    |  14 ++
>  Documentation/technical/index-format.txt |   7 +
>  Documentation/technical/sparse-index.txt | 167 +++++++++++++
>  Makefile                                 |   1 +
>  builtin/sparse-checkout.c                |  44 +++-
>  cache-tree.c                             |  40 ++++
>  cache.h                                  |  12 +-
>  read-cache.c                             |  35 ++-
>  repo-settings.c                          |  15 ++
>  repository.c                             |  11 +-
>  repository.h                             |   3 +
>  setup.c                                  |   3 +
>  sparse-index.c                           | 290 +++++++++++++++++++++++
>  sparse-index.h                           |  11 +
>  t/README                                 |   3 +
>  t/helper/test-read-cache.c               |  61 ++++-
>  t/perf/p2000-sparse-operations.sh        | 104 ++++++++
>  t/t1091-sparse-checkout-builtin.sh       |  13 +
>  t/t1092-sparse-checkout-compatibility.sh | 136 +++++++++--
>  unpack-trees.c                           |  16 +-
>  21 files changed, 953 insertions(+), 40 deletions(-)
>  create mode 100644 Documentation/technical/sparse-index.txt
>  create mode 100644 sparse-index.c
>  create mode 100644 sparse-index.h
>  create mode 100755 t/perf/p2000-sparse-operations.sh
>
>
> base-commit: 966e671106b2fd38301e7c344c754fd118d0bb07
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/883
> --
> gitgitgadget

@@ -44,6 +44,13 @@ Git index format
localization, no special casing of directory separator '/'). Entries
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> This begins a long effort to update the index format to allow sparse
> directory entries. This should result in a significant improvement to
> Git commands when HEAD contains millions of files, but the user has
> selected many fewer files to keep in their sparse-checkout definition.
>
> Currently, the index format is only updated in the presence of
> extensions.sparseIndex instead of increasing a file format version
> number. This is temporary, and index v5 is part of the plan for future
> work in this area.
>
> The design document details many of the reasons for embarking on this
> work, and also the plan for completing it safely.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/technical/index-format.txt |   7 +
>  Documentation/technical/sparse-index.txt | 167 +++++++++++++++++++++++
>  2 files changed, 174 insertions(+)
>  create mode 100644 Documentation/technical/sparse-index.txt
>
> diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
> index b633482b1bdf..387126582556 100644
> --- a/Documentation/technical/index-format.txt
> +++ b/Documentation/technical/index-format.txt
> @@ -44,6 +44,13 @@ Git index format
>    localization, no special casing of directory separator '/'). Entries
>    with the same name are sorted by their stage field.
>
> +  An index entry typically represents a file. However, if sparse-checkout
> +  is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the
> +  `extensions.sparseIndex` extension is enabled, then the index may
> +  contain entries for directories outside of the sparse-checkout definition.
> +  These entries have mode `0040000`, include the `SKIP_WORKTREE` bit, and
> +  the path ends in a directory separator.
> +
>    32-bit ctime seconds, the last time a file's metadata changed
>      this is stat(2) data
>
> diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt
> new file mode 100644
> index 000000000000..9070836f0655
> --- /dev/null
> +++ b/Documentation/technical/sparse-index.txt
> @@ -0,0 +1,167 @@
> +Git Sparse-Index Design Document
> +================================
> +
> +The sparse-checkout feature allows users to focus a working directory on
> +a subset of the files at HEAD. The cone mode patterns, enabled by
> +`core.sparseCheckoutCone`, allow for very fast pattern matching to
> +discover which files at HEAD belong in the sparse-checkout cone.
> +
> +Three important scale dimensions for a Git worktree are:
> +
> +* `HEAD`: How many files are present at `HEAD`?
> +
> +* Populated: How many files are within the sparse-checkout cone.
> +
> +* Modified: How many files has the user modified in the working directory?
> +
> +We will use big-O notation -- O(X) -- to denote how expensive certain
> +operations are in terms of these dimensions.
> +
> +These dimensions are ordered by their magnitude: users (typically) modify
> +fewer files than are populated, and we can only populate files at `HEAD`.
> +These dimensions are also ordered by how expensive they are per item: it
> +is expensive to detect a modified file than it is to write one that we
> +know must be populated; changing `HEAD` only really requires updating the
> +index.
> +
> +Problems occur if there is an extreme imbalance in these dimensions. For
> +example, if `HEAD` contains millions of paths but the populated set has
> +only tens of thousands, then commands like `git status` and `git add` can
> +be dominated by operations that require O(`HEAD`) operations instead of
> +O(Populated). Primarily, the cost is in parsing and rewriting the index,
> +which is filled primarily with files at `HEAD` that are marked with the
> +`SKIP_WORKTREE` bit.
> +
> +The sparse-index intends to take these commands that read and modify the
> +index from O(`HEAD`) to O(Populated). To do this, we need to modify the
> +index format in a significant way: add "sparse directory" entries.
> +
> +With cone mode patterns, it is possible to detect when an entire
> +directory will have its contents outside of the sparse-checkout definition.
> +Instead of listing all of the files it contains as individual entries, a
> +sparse-index contains an entry with the directory name, referencing the
> +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit.
> +If we need to discover the details for paths within that directory, we
> +can parse trees to find that list.
> +
> +This addition of sparse-directory entries violates expectations about the

Violates current expectations, yes.  Documentation tends to live a
long time, and I suspect that 2-3 years from now reading this sentence
might be jarring since we'll have modified the code to have an updated
set of expectations.  Maybe a simple "As of time of writing, ..." at
the beginning of the sentence here?  Or maybe I'm just being overly
worried...

> +index format and its in-memory data structure. There are many consumers in
> +the codebase that expect to iterate through all of the index entries and
> +see only files. In addition, they expect to see all files at `HEAD`. One
> +way to handle this is to parse trees to replace a sparse-directory entry
> +with all of the files within that tree as the index is loaded. However,
> +parsing trees is slower than parsing the index format, so that is a slower
> +operation than if we left the index alone.
> +
> +The implementation plan below follows four phases to slowly integrate with
> +the sparse-index. The intention is to incrementally update Git commands to
> +interact safely with the sparse-index without significant slowdowns. This
> +may not always be possible, but the hope is that the primary commands that
> +users need in their daily work are dramatically improved.
> +
> +Phase I: Format and initial speedups
> +------------------------------------
> +
> +During this phase, Git learns to enable the sparse-index and safely parse
> +one. Protections are put in place so that every consumer of the in-memory
> +data structure can operate with its current assumption of every file at
> +`HEAD`.
> +
> +At first, every index parse will expand the sparse-directory entries into
> +the full list of paths at `HEAD`. This will be slower in all cases. The
> +only noticable change in behavior will be that the serialized index file

noticeable

> +contains sparse-directory entries.
> +
> +To start, we use a new repository extension, `extensions.sparseIndex`, to
> +allow inserting sparse-directory entries into indexes with file format
> +versions 2, 3, and 4. This prevents Git versions that do not understand
> +the sparse-index from operating on one, but it also prevents other
> +operations that do not use the index at all. A new format, index v5, will
> +be introduced that includes sparse-directory entries by default. It might
> +also introduce other features that have been considered for improving the
> +index, as well.
> +
> +Next, consumers of the index will be guarded against operating on a
> +sparse-index by inserting calls to `ensure_full_index()` or
> +`expand_index_to_path()`. After these guards are in place, we can begin
> +leaving sparse-directory entries in the in-memory index structure.
> +
> +Even after inserting these guards, we will keep expanding sparse-indexes
> +for most Git commands using the `command_requires_full_index` repository
> +setting. This setting will be on by default and disabled one builtin at a
> +time until we have sufficient confidence that all of the index operations
> +are properly guarded.
> +
> +To complete this phase, the commands `git status` and `git add` will be
> +integrated with the sparse-index so that they operate with O(Populated)
> +performance. They will be carefully tested for operations within and
> +outside the sparse-checkout definition.

Good plan so far, but there's something else bugging me a little here.
One thing we noticed with our usage of `sparse-checkout` is that
although unimportant _tracked_ files go away, leftover build files and
other untracked files stick around.  So, although 'git status'
shouldn't have to check the tracked files anymore, it is still going
to have to look at each of the *untracked* files and compare to
.gitignore files in order to correctly classify each file as ignored
or just plain untracked.  Our `sparsify` tool has for a long time
tried to warn about such files when changing the sparsity
patterns/modules and had an --remove-old-ignores option for clearing
out ignored files that are found within directories that are sparse
(Meaning the directories where all files under them are marked
SKIP_WORKTREE.). I was never sure whether a warning was enough, or if
pushing that option more made sense, but about a month ago my
colleagues made the tool just auto-invoke that option from other
sparsify invocations.  To my knowledge, there have been no complaints
about that being automatically turned on; but there were
complaints/confusion before about the directories being left around.
(Of course, non-ignored files are still left around by that option.)

I'm worried that since sparse-checkout doesn't do anything to help
with all these untracked/ignored files, we might not get all the
performance improvements we want from a `git status` with sparse
directories.  We'll be dropping from walking O(2*HEAD) files (1 source
+ 1 object file) down to O(HEAD) files (just the object files) rather
than actually getting down to O(Populated).

> +
> +Phase II: Careful integrations
> +------------------------------
> +
> +This phase focuses on ensuring that all index extensions and APIs work
> +well with a sparse-index. This requires significant increases to our test
> +coverage, especially for operations that interact with the working
> +directory outside of the sparse-checkout definition. Some of these
> +behaviors may not be the desirable ones, such as some tests already
> +marked for failure in `t1092-sparse-checkout-compatibility.sh`.
> +
> +The index extensions that may require special integrations are:
> +
> +* FS Monitor
> +* Untracked cache
> +
> +While integrating with these features, we should look for patterns that
> +might lead to better APIs for interacting with the index. Coalescing
> +common usage patterns into an API call can reduce the number of places
> +where sparse-directories need to be handled carefully.

Makes sense.

> +Phase III: Important command speedups
> +-------------------------------------
> +
> +At this point, the patterns for testing and implementing sparse-directory
> +logic should be relatively stable. This phase focuses on updating some of
> +the most common builtins that use the index to operate as O(Populated).
> +Here is a potential list of commands that could be valuable to integrate
> +at this point:
> +
> +* `git commit`
> +* `git checkout`
> +* `git merge`
> +* `git rebase`
> +
> +Along with `git status` and `git add`, these commands cover the majority
> +of users' interactions with the working directory.

Sounds like a good plan as well.

I hope we get to make this specific to the merge-ort backend.  It
localizes the index-related code to (a) a call to unpack_trees()
called from checkout-like code (which would probably automatically be
handled by your updates to git checkout), and (b) a single function
named record_conflicted_index_entries().  I feel it should be pretty
easy to update.

In contrast, the idea of attempting to update merge-recursive with
this kind of change sounds overwhelming.

>  In addition, we can
> +integrate with these commands:
> +
> +* `git grep`
> +* `git rm`
> +
> +These have been proposed as some whose behavior could change when in a
> +repo with a sparse-checkout definition. It would be good to include this
> +behavior automatically when using a sparse-index. Some clarity is needed
> +to make the behavior switch clear to the user.

Is this leftover from before recent events?  I think this portion of
the document should just be stricken.

I argued about how these were buggy as-is due SKIP_WORKTREE always
having been an incomplete implementation of an idea at [1], but didn't
hear a further response from you.  I'm curious if you disagreed with
my reasoning, or it just slipped through the cracks in a busy schedule
and this portion of the document was leftover from before.  In my
opinion, both commands are just buggy and should be fixed for general
sparse-checkout usage cases, not just for sparse-index.

As for git grep, it has options for searching the working tree
(default) OR searching the index (--cached) OR searching an old commit
(passing a REVISION).  But never some combination or more than one of
these.  The fact that it combined some in the cases of SKIP_WORKTREE
entries looks entirely like a bug to me.  For the same reasons I
argued that --untracked and --cached are incompatible[2], we shouldn't
be combining results from searching the working tree and searching the
index.  Luckily, this fix has already been submitted[3] and picked up
in mt/grep-sparse-checkout and is marked in the cooking emails as
"Will merge to next".

As for git rm, I'll quote from my email to Matheus:

"""As far as the longer term discussion about making git rm configurable...
_If_ it comes up again in the future, I will argue that if git rm
should have configuration to delete paths outside the sparsity
specification, then git add should have configuration to add paths
outside the sparsity specification that happen to be present despite
being SKIP_WORKTREE, that git diff with no revision arguments (nor
--cached) should have configuration to diff against paths that are
SKIP_WORKTREE but happen to be present, that git status should have
configuration to report on changes to paths that are SKIP_WORKTREE but
happen to be present, that git checkout should have configuration to
write files to the working tree despite matching sparsity paths, etc.
And I'll argue that you do ALL of those or you're being inconsistent.
I hope that people see these are actually all the same request and
that it is horribly inconsistent to do some of these and not others,
and that at least by the time I get to mentioning checkout that they
realize it's a crazy request.  We should just tell users to extend
their sparsity if they want the working copy (and commands that
interact with the working copy) to handle the additional paths.  Maybe
I'm just really biased, but I don't see how this makes sense.  I would
argue more about it, but no one has responded.  My plan was to just
fix the default behavior, and then see if anyone ever actually cared
enough to come back and ask for more configurability."""

Also, for rm, Matheus has already submitted the fix[4], though at
Junio's request he separated out some fixes for git-add as a separate
preliminary series[5] and then will resubmit the other `add` and `rm`
fixes.

[1] https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/
[2] https://lore.kernel.org/git/xmqqtuql0yfp.fsf@gitster.c.googlers.com/
[3] https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/
[4] https://lore.kernel.org/git/61a77cd5f45ba02c7dff4b7932abdebb17c1667f.1613593946.git.matheus.bernardino@usp.br/
[5] https://lore.kernel.org/git/cover.1614037664.git.matheus.bernardino@usp.br/

Anyway, that's a long way of saying I think this section of your
document is already obsolete.  (Which is a good thing -- less work to
do to get sparse-index working.  Thanks, Matheus!).

> +This phase is the first where parallel work might be possible without too
> +much conflicts between topics.
> +
> +Phase IV: The long tail
> +-----------------------
> +
> +This last phase is less a "phase" and more "the new normal" after all of
> +the previous work.
> +
> +To start, the `command_requires_full_index` option could be removed in
> +favor of expanding only when hitting an API guard.
> +
> +There are many Git commands that could use special attention to operate as
> +O(Populated), while some might be so rare that it is acceptable to leave
> +them with additional overhead when a sparse-index is present.
> +
> +Here are some commands that might be useful to update:
> +
> +* `git sparse-checkout set`
> +* `git am`
> +* `git clean`
> +* `git stash`

Oh, man, git stash is definitely in need of work.  It's still a
minimalistic transliteration of shell to C, complete with lots of
process forking and piping output between various low-level commands.
It might be interesting to rewrite this in terms of the merge
machinery, though its separate stashing of staged stuff, unstaged
stuff, and possibly untracked stuff means that there is a sequence of
two or three merges needed and interesting failure handling to do if
those merges fail, especially if the user uses --index.  But I
digress...


Anyway, overall, very nicely written and planned out.  Thanks for
taking the time to write this all up.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/23/2021 8:13 PM, Elijah Newren wrote:
> On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:>> +This addition of sparse-directory entries violates expectations about the
> 
> Violates current expectations, yes.  Documentation tends to live a
> long time, and I suspect that 2-3 years from now reading this sentence
> might be jarring since we'll have modified the code to have an updated
> set of expectations.  Maybe a simple "As of time of writing, ..." at
> the beginning of the sentence here?  Or maybe I'm just being overly
> worried...

I was hoping that the phrase "this addition of" places this statement in
a moment of time where sparse-directory entries didn't exist, but now they
will. I'm open to alternatives and will give this some thought.

>> +To complete this phase, the commands `git status` and `git add` will be
>> +integrated with the sparse-index so that they operate with O(Populated)
>> +performance. They will be carefully tested for operations within and
>> +outside the sparse-checkout definition.
> 
> Good plan so far, but there's something else bugging me a little here.
> One thing we noticed with our usage of `sparse-checkout` is that
> although unimportant _tracked_ files go away, leftover build files and
> other untracked files stick around.  So, although 'git status'
> shouldn't have to check the tracked files anymore, it is still going
> to have to look at each of the *untracked* files and compare to
> .gitignore files in order to correctly classify each file as ignored
> or just plain untracked.  Our `sparsify` tool has for a long time
> tried to warn about such files when changing the sparsity
> patterns/modules and had an --remove-old-ignores option for clearing
> out ignored files that are found within directories that are sparse
> (Meaning the directories where all files under them are marked
> SKIP_WORKTREE.). I was never sure whether a warning was enough, or if
> pushing that option more made sense, but about a month ago my
> colleagues made the tool just auto-invoke that option from other
> sparsify invocations.  To my knowledge, there have been no complaints
> about that being automatically turned on; but there were
> complaints/confusion before about the directories being left around.
> (Of course, non-ignored files are still left around by that option.)
> 
> I'm worried that since sparse-checkout doesn't do anything to help
> with all these untracked/ignored files, we might not get all the
> performance improvements we want from a `git status` with sparse
> directories.  We'll be dropping from walking O(2*HEAD) files (1 source
> + 1 object file) down to O(HEAD) files (just the object files) rather
> than actually getting down to O(Populated).

This definitely seems like a valuable _enhancement_ to sparse-checkout
that could occur in parallel.

For a workaround in the moment: is "git clean -xdf" sufficient to help
these users?

>> +Phase III: Important command speedups
>> +-------------------------------------
>> +
>> +At this point, the patterns for testing and implementing sparse-directory
>> +logic should be relatively stable. This phase focuses on updating some of
>> +the most common builtins that use the index to operate as O(Populated).
>> +Here is a potential list of commands that could be valuable to integrate
>> +at this point:
>> +
>> +* `git commit`
>> +* `git checkout`
>> +* `git merge`
>> +* `git rebase`
>> +
>> +Along with `git status` and `git add`, these commands cover the majority
>> +of users' interactions with the working directory.
> 
> Sounds like a good plan as well.
> 
> I hope we get to make this specific to the merge-ort backend.  It
> localizes the index-related code to (a) a call to unpack_trees()
> called from checkout-like code (which would probably automatically be
> handled by your updates to git checkout), and (b) a single function
> named record_conflicted_index_entries().  I feel it should be pretty
> easy to update.
> 
> In contrast, the idea of attempting to update merge-recursive with
> this kind of change sounds overwhelming.

Yes, I'm hoping to eventually say "if you are in a sparse-checkout, then
you should use ORT by default." Then, if someone opts to do merge-recursive
instead, then they pay the index expansion cost.

While this is very clear in my head, it might be worth mentioning that
explicitly here.

>>  In addition, we can
>> +integrate with these commands:
>> +
>> +* `git grep`
>> +* `git rm`
>> +
>> +These have been proposed as some whose behavior could change when in a
>> +repo with a sparse-checkout definition. It would be good to include this
>> +behavior automatically when using a sparse-index. Some clarity is needed
>> +to make the behavior switch clear to the user.
> 
> Is this leftover from before recent events?  I think this portion of
> the document should just be stricken.
> 
> I argued about how these were buggy as-is due SKIP_WORKTREE always
> having been an incomplete implementation of an idea at [1], but didn't
> hear a further response from you.  I'm curious if you disagreed with
> my reasoning, or it just slipped through the cracks in a busy schedule
> and this portion of the document was leftover from before.  In my
> opinion, both commands are just buggy and should be fixed for general
> sparse-checkout usage cases, not just for sparse-index.

This is definitely a case of "I've been too busy to read those topics
in detail." I figured that there was something going on that was relevant
to the sparse-checkout and would affect the sparse-index, but I shelved
it in my mind until I had space to think about it directly.

> Anyway, that's a long way of saying I think this section of your
> document is already obsolete.  (Which is a good thing -- less work to
> do to get sparse-index working.  Thanks, Matheus!).

Thank you for your summary, which helps a lot. Thanks, Matheus, too!
If those fixes already include coverage for the behavior, then I'll see
if they also translate to sparse-index tests easily.

I feel like a lot of these later contributions will be more about adding
tests than actually writing a lot of code.

>> +This phase is the first where parallel work might be possible without too
>> +much conflicts between topics.
>> +
>> +Phase IV: The long tail
>> +-----------------------
>> +
>> +This last phase is less a "phase" and more "the new normal" after all of
>> +the previous work.
>> +
>> +To start, the `command_requires_full_index` option could be removed in
>> +favor of expanding only when hitting an API guard.
>> +
>> +There are many Git commands that could use special attention to operate as
>> +O(Populated), while some might be so rare that it is acceptable to leave
>> +them with additional overhead when a sparse-index is present.
>> +
>> +Here are some commands that might be useful to update:
>> +
>> +* `git sparse-checkout set`
>> +* `git am`
>> +* `git clean`
>> +* `git stash`
> 
> Oh, man, git stash is definitely in need of work.  It's still a
> minimalistic transliteration of shell to C, complete with lots of
> process forking and piping output between various low-level commands.
> It might be interesting to rewrite this in terms of the merge
> machinery, though its separate stashing of staged stuff, unstaged
> stuff, and possibly untracked stuff means that there is a sequence of
> two or three merges needed and interesting failure handling to do if
> those merges fail, especially if the user uses --index.  But I
> digress...

I would prefer to leave 'git stash' alone, but it's used by enough
people that I need to care about it eventually.

> Anyway, overall, very nicely written and planned out.  Thanks for
> taking the time to write this all up.

Thanks for your detailed comments!
-Stolee
 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Feb 25, 2021 at 7:29 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 2/23/2021 8:13 PM, Elijah Newren wrote:
> > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:>> +This addition of sparse-directory entries violates expectations about the
> >
> > Violates current expectations, yes.  Documentation tends to live a
> > long time, and I suspect that 2-3 years from now reading this sentence
> > might be jarring since we'll have modified the code to have an updated
> > set of expectations.  Maybe a simple "As of time of writing, ..." at
> > the beginning of the sentence here?  Or maybe I'm just being overly
> > worried...
>
> I was hoping that the phrase "this addition of" places this statement in
> a moment of time where sparse-directory entries didn't exist, but now they
> will. I'm open to alternatives and will give this some thought.

I already listed my only suggestion -- adding a "As of time of
writing," at the beginning.  I'm totally open to other
proposals/suggestions, and it's admittedly a minor point so you can
feel free to just ignore it if we can't come up with wording everyone
likes.

>
> >> +To complete this phase, the commands `git status` and `git add` will be
> >> +integrated with the sparse-index so that they operate with O(Populated)
> >> +performance. They will be carefully tested for operations within and
> >> +outside the sparse-checkout definition.
> >
> > Good plan so far, but there's something else bugging me a little here.
> > One thing we noticed with our usage of `sparse-checkout` is that
> > although unimportant _tracked_ files go away, leftover build files and
> > other untracked files stick around.  So, although 'git status'
> > shouldn't have to check the tracked files anymore, it is still going
> > to have to look at each of the *untracked* files and compare to
> > .gitignore files in order to correctly classify each file as ignored
> > or just plain untracked.  Our `sparsify` tool has for a long time
> > tried to warn about such files when changing the sparsity
> > patterns/modules and had an --remove-old-ignores option for clearing
> > out ignored files that are found within directories that are sparse
> > (Meaning the directories where all files under them are marked
> > SKIP_WORKTREE.). I was never sure whether a warning was enough, or if
> > pushing that option more made sense, but about a month ago my
> > colleagues made the tool just auto-invoke that option from other
> > sparsify invocations.  To my knowledge, there have been no complaints
> > about that being automatically turned on; but there were
> > complaints/confusion before about the directories being left around.
> > (Of course, non-ignored files are still left around by that option.)
> >
> > I'm worried that since sparse-checkout doesn't do anything to help
> > with all these untracked/ignored files, we might not get all the
> > performance improvements we want from a `git status` with sparse
> > directories.  We'll be dropping from walking O(2*HEAD) files (1 source
> > + 1 object file) down to O(HEAD) files (just the object files) rather
> > than actually getting down to O(Populated).
>
> This definitely seems like a valuable _enhancement_ to sparse-checkout
> that could occur in parallel.

Yes, indeed.  Your discussion of performance just reminded me of it,
and since this idea might be important in order to drive the costs
down to O(populated) in practice, I thought I'd mention it.

> For a workaround in the moment: is "git clean -xdf" sufficient to help
> these users?

Not really; that wouldn't remove the ignored stuff (build files) under
sparsified directories which is the point.  (Builds build everything
over here; once you sparsify you have leftover build files from
projects you now don't care about.)  If you convert it to "git clean
-Xdf" then you're closer, but that wouldn't just remove builds info
from sparse projects, it'd force users to rebuild all the stuff
they're interested in.

It's close though; what's wanted is basically a special flag that runs
"git clean -Xf <long list of sparsified directories>", without the
user having to specify 300 directories.

However, for now, since I've got a 'sparsify' script anyway (needed
for determining inter-module dependencies and certain directories that
always need to be present, etc.), it just has a flag for running "git
clean -Xf <long list of sparsified directories>" since it has logic to
compute what all those directories are anyway.

> >> +Phase III: Important command speedups
> >> +-------------------------------------
> >> +
> >> +At this point, the patterns for testing and implementing sparse-directory
> >> +logic should be relatively stable. This phase focuses on updating some of
> >> +the most common builtins that use the index to operate as O(Populated).
> >> +Here is a potential list of commands that could be valuable to integrate
> >> +at this point:
> >> +
> >> +* `git commit`
> >> +* `git checkout`
> >> +* `git merge`
> >> +* `git rebase`
> >> +
> >> +Along with `git status` and `git add`, these commands cover the majority
> >> +of users' interactions with the working directory.
> >
> > Sounds like a good plan as well.
> >
> > I hope we get to make this specific to the merge-ort backend.  It
> > localizes the index-related code to (a) a call to unpack_trees()
> > called from checkout-like code (which would probably automatically be
> > handled by your updates to git checkout), and (b) a single function
> > named record_conflicted_index_entries().  I feel it should be pretty
> > easy to update.
> >
> > In contrast, the idea of attempting to update merge-recursive with
> > this kind of change sounds overwhelming.
>
> Yes, I'm hoping to eventually say "if you are in a sparse-checkout, then
> you should use ORT by default." Then, if someone opts to do merge-recursive
> instead, then they pay the index expansion cost.
>
> While this is very clear in my head, it might be worth mentioning that
> explicitly here.

:+1:

> >>  In addition, we can
> >> +integrate with these commands:
> >> +
> >> +* `git grep`
> >> +* `git rm`
> >> +
> >> +These have been proposed as some whose behavior could change when in a
> >> +repo with a sparse-checkout definition. It would be good to include this
> >> +behavior automatically when using a sparse-index. Some clarity is needed
> >> +to make the behavior switch clear to the user.
> >
> > Is this leftover from before recent events?  I think this portion of
> > the document should just be stricken.
> >
> > I argued about how these were buggy as-is due SKIP_WORKTREE always
> > having been an incomplete implementation of an idea at [1], but didn't
> > hear a further response from you.  I'm curious if you disagreed with
> > my reasoning, or it just slipped through the cracks in a busy schedule
> > and this portion of the document was leftover from before.  In my
> > opinion, both commands are just buggy and should be fixed for general
> > sparse-checkout usage cases, not just for sparse-index.
>
> This is definitely a case of "I've been too busy to read those topics
> in detail." I figured that there was something going on that was relevant
> to the sparse-checkout and would affect the sparse-index, but I shelved
> it in my mind until I had space to think about it directly.
>
> > Anyway, that's a long way of saying I think this section of your
> > document is already obsolete.  (Which is a good thing -- less work to
> > do to get sparse-index working.  Thanks, Matheus!).
>
> Thank you for your summary, which helps a lot. Thanks, Matheus, too!
> If those fixes already include coverage for the behavior, then I'll see
> if they also translate to sparse-index tests easily.
>
> I feel like a lot of these later contributions will be more about adding
> tests than actually writing a lot of code.
>
> >> +This phase is the first where parallel work might be possible without too
> >> +much conflicts between topics.
> >> +
> >> +Phase IV: The long tail
> >> +-----------------------
> >> +
> >> +This last phase is less a "phase" and more "the new normal" after all of
> >> +the previous work.
> >> +
> >> +To start, the `command_requires_full_index` option could be removed in
> >> +favor of expanding only when hitting an API guard.
> >> +
> >> +There are many Git commands that could use special attention to operate as
> >> +O(Populated), while some might be so rare that it is acceptable to leave
> >> +them with additional overhead when a sparse-index is present.
> >> +
> >> +Here are some commands that might be useful to update:
> >> +
> >> +* `git sparse-checkout set`
> >> +* `git am`
> >> +* `git clean`
> >> +* `git stash`
> >
> > Oh, man, git stash is definitely in need of work.  It's still a
> > minimalistic transliteration of shell to C, complete with lots of
> > process forking and piping output between various low-level commands.
> > It might be interesting to rewrite this in terms of the merge
> > machinery, though its separate stashing of staged stuff, unstaged
> > stuff, and possibly untracked stuff means that there is a sequence of
> > two or three merges needed and interesting failure handling to do if
> > those merges fail, especially if the user uses --index.  But I
> > digress...
>
> I would prefer to leave 'git stash' alone, but it's used by enough
> people that I need to care about it eventually.

Oh, it can definitely come later.  And I agree about the desirability
of touching that code; I was avoiding it for a long time, but there
was one important sparse-checkout-related bug recently[1] so I've
already been forced to touch it once.  That might mean I'm
(eventually) on the hook to make it sparse-index friendly, especially
since it might involve using merge-ort to do so...

[1] https://lore.kernel.org/git/pull.919.git.git.1605891222.gitgitgadget@gmail.com/

> > Anyway, overall, very nicely written and planned out.  Thanks for
> > taking the time to write this all up.
>
> Thanks for your detailed comments!
> -Stolee

@@ -0,0 +1,87 @@
#!/bin/sh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Create a test script that takes the default performance test (the Git
> codebase) and multiplies it by 256 using four layers of duplicated
> trees of width four. This results in nearly one million blob entries in
> the index. Then, we can clone this repository with sparse-checkout
> patterns that demonstrate four copies of the initial repository. Each
> clone will use a different index format or mode so peformance can be
> tested across the different options.
>
> Note that the initial repo is stripped of submodules before doing the
> copies. This preserves the expected data shape of the sparse index,
> because directories containing submodules are not collapsed to a sparse
> directory entry.
>
> Run a few Git commands on these clones, especially those that use the
> index (status, add, commit).
>
> Here are the results on my Linux machine:
>
> Test
> --------------------------------------------------------------
> 2000.2: git status (full-index-v3)             0.37(0.30+0.09)
> 2000.3: git status (full-index-v4)             0.39(0.32+0.10)
> 2000.4: git add -A (full-index-v3)             1.42(1.06+0.20)
> 2000.5: git add -A (full-index-v4)             1.26(0.98+0.16)
> 2000.6: git add . (full-index-v3)              1.40(1.04+0.18)
> 2000.7: git add . (full-index-v4)              1.26(0.98+0.17)
> 2000.8: git commit -a -m A (full-index-v3)     1.42(1.11+0.16)
> 2000.9: git commit -a -m A (full-index-v4)     1.33(1.08+0.16)
>
> It is perhaps noteworthy that there is an improvement when using index
> version 4. This is because the v3 index uses 108 MiB while the v4
> index uses 80 MiB. Since the repeated portions of the directories are
> very short (f3/f1/f2, for example) this ratio is less pronounced than in
> similarly-sized real repositories.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/perf/p2000-sparse-operations.sh | 87 +++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
>  create mode 100755 t/perf/p2000-sparse-operations.sh
>
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> new file mode 100755
> index 000000000000..52597683376e
> --- /dev/null
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -0,0 +1,87 @@
> +#!/bin/sh
> +
> +test_description="test performance of Git operations using the index"
> +
> +. ./perf-lib.sh
> +
> +test_perf_default_repo
> +
> +SPARSE_CONE=f2/f4/f1
> +
> +test_expect_success 'setup repo and indexes' '
> +       git reset --hard HEAD &&
> +       # Remove submodules from the example repo, because our
> +       # duplication of the entire repo creates an unlikly data shape.
> +       git config --file .gitmodules --get-regexp "submodule.*.path" >modules &&
> +       rm -f .gitmodules &&
> +       git add .gitmodules &&

Why not `git rm [-f] .gitmodules` instead of these two commands?  Is
there something special about .gitmodules that requires this special
handling?

> +       for module in $(awk "{print \$2}" modules)
> +       do
> +               git rm $module || return 1
> +       done &&
> +       git add . &&

What does the `git add .` do?  I don't see any changes there weren't
already git-add'ed or git-rm'ed.

> +       git commit -m "remove submodules" &&
> +
> +       echo bogus >a &&
> +       cp a b &&
> +       git add a b &&
> +       git commit -m "level 0" &&
> +       BLOB=$(git rev-parse HEAD:a) &&
> +       OLD_COMMIT=$(git rev-parse HEAD) &&
> +       OLD_TREE=$(git rev-parse HEAD^{tree}) &&
> +
> +       for i in $(test_seq 1 4)
> +       do
> +               cat >in <<-EOF &&
> +                       100755 blob $BLOB       a
> +                       040000 tree $OLD_TREE   f1
> +                       040000 tree $OLD_TREE   f2
> +                       040000 tree $OLD_TREE   f3
> +                       040000 tree $OLD_TREE   f4
> +               EOF
> +               NEW_TREE=$(git mktree <in) &&
> +               NEW_COMMIT=$(git commit-tree $NEW_TREE -p $OLD_COMMIT -m "level $i") &&
> +               OLD_TREE=$NEW_TREE &&
> +               OLD_COMMIT=$NEW_COMMIT || return 1
> +       done &&
> +
> +       git sparse-checkout init --cone &&
> +       git branch -f wide $OLD_COMMIT &&
> +       git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v3 &&
> +       (
> +               cd full-index-v3 &&
> +               git sparse-checkout init --cone &&
> +               git sparse-checkout set $SPARSE_CONE &&
> +               git config index.version 3 &&
> +               git update-index --index-version=3
> +       ) &&
> +       git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v4 &&
> +       (
> +               cd full-index-v4 &&
> +               git sparse-checkout init --cone &&
> +               git sparse-checkout set $SPARSE_CONE &&
> +               git config index.version 4 &&
> +               git update-index --index-version=4
> +       )
> +'
> +
> +test_perf_on_all () {
> +       command="$@"
> +       for repo in full-index-v3 full-index-v4
> +       do
> +               test_perf "$command ($repo)" "
> +                       (
> +                               cd $repo &&
> +                               echo >>$SPARSE_CONE/a &&
> +                               $command
> +                       )
> +               "
> +       done
> +}
> +
> +test_perf_on_all git status
> +test_perf_on_all git add -A
> +test_perf_on_all git add .
> +test_perf_on_all git commit -a -m A
> +
> +test_done
> --
> gitgitgadget

Other than the two minor questions, the rest looks good to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/23/2021 9:30 PM, Elijah Newren wrote:
> On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> +test_expect_success 'setup repo and indexes' '
> +       git reset --hard HEAD &&
> +       # Remove submodules from the example repo, because our
> +       # duplication of the entire repo creates an unlikly data shape.
> +       git config --file .gitmodules --get-regexp "submodule.*.path" >modules &&
> +       rm -f .gitmodules &&
> +       git add .gitmodules &&
> Why not `git rm [-f] .gitmodules` instead of these two commands?  Is
> there something special about .gitmodules that requires this special
> handling?

No, I'm just being sloppy. Will clean up.

>> +       for module in $(awk "{print \$2}" modules)
>> +       do
>> +               git rm $module || return 1
>> +       done &&
>> +       git add . &&
> What does the `git add .` do?  I don't see any changes there weren't
> already git-add'ed or git-rm'ed.

Same here. Thanks.

-Stolee

@@ -980,6 +980,7 @@ LIB_OBJS += setup.o
LIB_OBJS += shallow.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Upcoming changes will introduce modifications to the index format that
> allow sparse directories. It will be useful to have a mechanism for
> converting those sparse index files into full indexes by walking the
> tree at those sparse directories. Name this method ensure_full_index()
> as it will guarantee that the index is fully expanded.
>
> This method is not implemented yet, and instead we focus on the
> scaffolding to declare it and call it at the appropriate time.
>
> Add a 'command_requires_full_index' member to struct repo_settings. This
> will be an indicator that we need the index in full mode to do certain
> index operations. This starts as being true for every command, then we
> will set it to false as some commands integrate with sparse indexes.
>
> If 'command_requires_full_index' is true, then we will immediately
> expand a sparse index to a full one upon reading from disk. This
> suffices for now, but we will want to add more callers to
> ensure_full_index() later.

Same as 01/27 of your RFC series; looks good.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Makefile        |  1 +
>  repo-settings.c |  8 ++++++++
>  repository.c    | 11 ++++++++++-
>  repository.h    |  2 ++
>  sparse-index.c  |  8 ++++++++
>  sparse-index.h  |  7 +++++++
>  6 files changed, 36 insertions(+), 1 deletion(-)
>  create mode 100644 sparse-index.c
>  create mode 100644 sparse-index.h
>
> diff --git a/Makefile b/Makefile
> index 5a239cac20e3..3bf61699238d 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -980,6 +980,7 @@ LIB_OBJS += setup.o
>  LIB_OBJS += shallow.o
>  LIB_OBJS += sideband.o
>  LIB_OBJS += sigchain.o
> +LIB_OBJS += sparse-index.o
>  LIB_OBJS += split-index.o
>  LIB_OBJS += stable-qsort.o
>  LIB_OBJS += strbuf.o
> diff --git a/repo-settings.c b/repo-settings.c
> index f7fff0f5ab83..d63569e4041e 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -77,4 +77,12 @@ void prepare_repo_settings(struct repository *r)
>                 UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_KEEP);
>
>         UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_DEFAULT);
> +
> +       /*
> +        * This setting guards all index reads to require a full index
> +        * over a sparse index. After suitable guards are placed in the
> +        * codebase around uses of the index, this setting will be
> +        * removed.
> +        */
> +       r->settings.command_requires_full_index = 1;
>  }
> diff --git a/repository.c b/repository.c
> index c98298acd017..a8acae002f71 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -10,6 +10,7 @@
>  #include "object.h"
>  #include "lockfile.h"
>  #include "submodule-config.h"
> +#include "sparse-index.h"
>
>  /* The main repository */
>  static struct repository the_repo;
> @@ -261,6 +262,8 @@ void repo_clear(struct repository *repo)
>
>  int repo_read_index(struct repository *repo)
>  {
> +       int res;
> +
>         if (!repo->index)
>                 repo->index = xcalloc(1, sizeof(*repo->index));
>
> @@ -270,7 +273,13 @@ int repo_read_index(struct repository *repo)
>         else if (repo->index->repo != repo)
>                 BUG("repo's index should point back at itself");
>
> -       return read_index_from(repo->index, repo->index_file, repo->gitdir);
> +       res = read_index_from(repo->index, repo->index_file, repo->gitdir);
> +
> +       prepare_repo_settings(repo);
> +       if (repo->settings.command_requires_full_index)
> +               ensure_full_index(repo->index);
> +
> +       return res;
>  }
>
>  int repo_hold_locked_index(struct repository *repo,
> diff --git a/repository.h b/repository.h
> index b385ca3c94b6..e06a23015697 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -41,6 +41,8 @@ struct repo_settings {
>         enum fetch_negotiation_setting fetch_negotiation_algorithm;
>
>         int core_multi_pack_index;
> +
> +       unsigned command_requires_full_index:1;
>  };
>
>  struct repository {
> diff --git a/sparse-index.c b/sparse-index.c
> new file mode 100644
> index 000000000000..82183ead563b
> --- /dev/null
> +++ b/sparse-index.c
> @@ -0,0 +1,8 @@
> +#include "cache.h"
> +#include "repository.h"
> +#include "sparse-index.h"
> +
> +void ensure_full_index(struct index_state *istate)
> +{
> +       /* intentionally left blank */
> +}
> diff --git a/sparse-index.h b/sparse-index.h
> new file mode 100644
> index 000000000000..09a20d036c46
> --- /dev/null
> +++ b/sparse-index.h
> @@ -0,0 +1,7 @@
> +#ifndef SPARSE_INDEX_H__
> +#define SPARSE_INDEX_H__
> +
> +struct index_state;
> +void ensure_full_index(struct index_state *istate);
> +
> +#endif
> --
> gitgitgadget
>

@@ -204,6 +204,8 @@ struct cache_entry {
#error "CE_EXTENDED_FLAGS out of range"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> We will mark an in-memory index_state as having sparse directory entries
> with the sparse_index bit. These currently cannot exist, but we will add
> a mechanism for collapsing a full index to a sparse one in a later
> change. That will happen at write time, so we must first allow parsing
> the format before writing it.
>
> Commands or methods that require a full index in order to operate can
> call ensure_full_index() to expand that index in-memory. This requires
> parsing trees using that index's repository.
>
> Sparse directory entries have a specific 'ce_mode' value. The macro
> S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type.
> This ce_mode is not possible with the existing index formats, so we don't
> also verify all properties of a sparse-directory entry, which are:
>
>  1. ce->ce_mode == 0040000
>  2. ce->flags & CE_SKIP_WORKTREE is true
>  3. ce->name[ce->namelen - 1] == '/' (ends in dir separator)
>  4. ce->oid references a tree object.
>
> These are all semi-enforced in ensure_full_index() to some extent. Any
> deviation will cause a warning at minimum or a failure in the worst
> case.

Thanks for spelling these all out; looks good.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  cache.h        |  7 +++-
>  read-cache.c   |  9 +++++
>  sparse-index.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 109 insertions(+), 2 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index d92814961405..1336c8d7435e 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -204,6 +204,8 @@ struct cache_entry {
>  #error "CE_EXTENDED_FLAGS out of range"
>  #endif
>
> +#define S_ISSPARSEDIR(m) ((m) == S_IFDIR)

Much nicer, thanks.  :-)

> +
>  /* Forward structure decls */
>  struct pathspec;
>  struct child_process;
> @@ -319,7 +321,8 @@ struct index_state {
>                  drop_cache_tree : 1,
>                  updated_workdir : 1,
>                  updated_skipworktree : 1,
> -                fsmonitor_has_run_once : 1;
> +                fsmonitor_has_run_once : 1,
> +                sparse_index : 1;
>         struct hashmap name_hash;
>         struct hashmap dir_hash;
>         struct object_id oid;
> @@ -722,6 +725,8 @@ int read_index_from(struct index_state *, const char *path,
>                     const char *gitdir);
>  int is_index_unborn(struct index_state *);
>
> +void ensure_full_index(struct index_state *istate);
> +
>  /* For use with `write_locked_index()`. */
>  #define COMMIT_LOCK            (1 << 0)
>  #define SKIP_IF_UNCHANGED      (1 << 1)
> diff --git a/read-cache.c b/read-cache.c
> index 29144cf879e7..97dbf2434f30 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -101,6 +101,9 @@ static const char *alternate_index_output;
>
>  static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
>  {
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               istate->sparse_index = 1;

A very minor question -- someone who sees "sparse_index" could
probably easily think either "index with missing entries, due to
having a SKIP_WORKTREE directory instead" or perhaps "index when using
the sparse-checkout feature, i.e. it has some SKIP_WORKTREE entries in
it".  From the code here, clearly the former is your intent.  I wonder
if it'd help to have a small comment near the declaration of
sparse_index to mention its intent.

> +
>         istate->cache[nr] = ce;
>         add_name_hash(istate, ce);
>  }
> @@ -2255,6 +2258,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>         trace2_data_intmax("index", the_repository, "read/cache_nr",
>                            istate->cache_nr);
>
> +       if (!istate->repo)
> +               istate->repo = the_repository;
> +       prepare_repo_settings(istate->repo);
> +       if (istate->repo->settings.command_requires_full_index)
> +               ensure_full_index(istate);
> +
>         return istate->cache_nr;
>
>  unmap:
> diff --git a/sparse-index.c b/sparse-index.c
> index 82183ead563b..316cb949b74b 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -1,8 +1,101 @@
>  #include "cache.h"
>  #include "repository.h"
>  #include "sparse-index.h"
> +#include "tree.h"
> +#include "pathspec.h"
> +#include "trace2.h"
> +
> +static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
> +{
> +       ALLOC_GROW(istate->cache, nr + 1, istate->cache_alloc);
> +
> +       istate->cache[nr] = ce;
> +       add_name_hash(istate, ce);
> +}
> +
> +static int add_path_to_index(const struct object_id *oid,
> +                               struct strbuf *base, const char *path,
> +                               unsigned int mode, int stage, void *context)
> +{
> +       struct index_state *istate = (struct index_state *)context;
> +       struct cache_entry *ce;
> +       size_t len = base->len;
> +
> +       if (S_ISDIR(mode))
> +               return READ_TREE_RECURSIVE;
> +
> +       strbuf_addstr(base, path);
> +
> +       ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
> +       ce->ce_flags |= CE_SKIP_WORKTREE;
> +       set_index_entry(istate, istate->cache_nr++, ce);
> +
> +       strbuf_setlen(base, len);
> +       return 0;
> +}
>
>  void ensure_full_index(struct index_state *istate)
>  {
> -       /* intentionally left blank */
> +       int i;
> +       struct index_state *full;
> +
> +       if (!istate || !istate->sparse_index)
> +               return;
> +
> +       if (!istate->repo)
> +               istate->repo = the_repository;
> +
> +       trace2_region_enter("index", "ensure_full_index", istate->repo);
> +
> +       /* initialize basics of new index */
> +       full = xcalloc(1, sizeof(struct index_state));
> +       memcpy(full, istate, sizeof(struct index_state));
> +
> +       /* then change the necessary things */
> +       full->sparse_index = 0;
> +       full->cache_alloc = (3 * istate->cache_alloc) / 2;
> +       full->cache_nr = 0;
> +       ALLOC_ARRAY(full->cache, full->cache_alloc);
> +
> +       for (i = 0; i < istate->cache_nr; i++) {
> +               struct cache_entry *ce = istate->cache[i];
> +               struct tree *tree;
> +               struct pathspec ps;
> +
> +               if (!S_ISSPARSEDIR(ce->ce_mode)) {
> +                       set_index_entry(full, full->cache_nr++, ce);
> +                       continue;
> +               }
> +               if (!(ce->ce_flags & CE_SKIP_WORKTREE))
> +                       warning(_("index entry is a directory, but not sparse (%08x)"),
> +                               ce->ce_flags);
> +
> +               /* recursively walk into cd->name */
> +               tree = lookup_tree(istate->repo, &ce->oid);
> +
> +               memset(&ps, 0, sizeof(ps));
> +               ps.recursive = 1;
> +               ps.has_wildcard = 1;
> +               ps.max_depth = -1;
> +
> +               read_tree_recursive(istate->repo, tree,
> +                                   ce->name, strlen(ce->name),
> +                                   0, &ps,
> +                                   add_path_to_index, full);
> +
> +               /* free directory entries. full entries are re-used */
> +               discard_cache_entry(ce);
> +       }
> +
> +       /* Copy back into original index. */
> +       memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash));
> +       istate->sparse_index = 0;
> +       free(istate->cache);

Thanks for fixing that leak that from the RFC series.

> +       istate->cache = full->cache;
> +       istate->cache_nr = full->cache_nr;
> +       istate->cache_alloc = full->cache_alloc;
> +
> +       free(full);
> +
> +       trace2_region_leave("index", "ensure_full_index", istate->repo);
>  }
> --
> gitgitgadget

Looks good to me.

@@ -45,6 +45,20 @@ To avoid interfering with other worktrees, it first enables the
When `--cone` is provided, the `core.sparseCheckoutCone` setting is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Martin Ågren wrote (reply to this):

On Wed, 24 Feb 2021 at 00:57, Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> +that is not completely understood by other tools. Enabling sparse index
> +enables the `extensions.spareseIndex` config value, which might cause

s/sparese/sparse

> +other tools to stop working with your repository. If you have trouble with
> +this compatibility, then run `git sparse-checkout sparse-index disable` to
> +remove this config and rewrite your index to not be sparse.

While I'm commenting on this..:

There are several "layers" here, for lack of a better term. "Enabling foo
enables bar which may cause baz. If you fail due to baz, try dropping
bar by dropping foo." If I remove any mention of the config variable from
your text, I get the following.

 Enabling sparse index might cause other tools to stop working with your
 repository. If you have trouble with this compatibility, then run `git
 sparse-checkout sparse-index disable` to rewrite your index to not be
 sparse.

I'm tempted to suggest such a rewrite to relieve readers of knowing of
the middle step, which you could say is more of an implementation
detail. But if we think that the symptoms / error messages might involve
"extensions.sparseIndex" or, as would be the case with an older Git
installation,

  fatal: unknown repository extensions found:
          sparseindex

maybe there is some value in mentioning the config item by name. Just
thinking out loud, really, and I don't have any strong opinion. I only
came here to point out the typo in the docs.

Martin

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/24/2021 2:11 PM, Martin Ågren wrote:
> On Wed, 24 Feb 2021 at 00:57, Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> +that is not completely understood by other tools. Enabling sparse index
>> +enables the `extensions.spareseIndex` config value, which might cause
> 
> s/sparese/sparse

Thanks!

 
>> +other tools to stop working with your repository. If you have trouble with
>> +this compatibility, then run `git sparse-checkout sparse-index disable` to
>> +remove this config and rewrite your index to not be sparse.
> 
> While I'm commenting on this..:
> 
> There are several "layers" here, for lack of a better term. "Enabling foo
> enables bar which may cause baz. If you fail due to baz, try dropping
> bar by dropping foo." If I remove any mention of the config variable from
> your text, I get the following.
> 
>  Enabling sparse index might cause other tools to stop working with your
>  repository. If you have trouble with this compatibility, then run `git
>  sparse-checkout sparse-index disable` to rewrite your index to not be
>  sparse.
> 
> I'm tempted to suggest such a rewrite to relieve readers of knowing of
> the middle step, which you could say is more of an implementation
> detail. But if we think that the symptoms / error messages might involve
> "extensions.sparseIndex" or, as would be the case with an older Git
> installation,
> 
>   fatal: unknown repository extensions found:
>           sparseindex
> 
> maybe there is some value in mentioning the config item by name. Just
> thinking out loud, really, and I don't have any strong opinion. I only
> came here to point out the typo in the docs.
 
I agree that the layers are confusing. We could rearrange and have
a similar flow to what you recommend by mentioning the extension at
the end:

**WARNING:** Using a sparse index requires modifying the index in a way
that is not completely understood by other tools. If you have trouble with
this compatibility, then run `git sparse-checkout sparse-index disable` to
rewrite your index to not be sparse. Older versions of Git will not
understand the `sparseIndex` repository extension and may fail to interact
with your repository until it is disabled.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Mar 9, 2021 at 12:52 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 2/24/2021 2:11 PM, Martin Ågren wrote:
> > On Wed, 24 Feb 2021 at 00:57, Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> +that is not completely understood by other tools. Enabling sparse index
> >> +enables the `extensions.spareseIndex` config value, which might cause
> >
> > s/sparese/sparse
>
> Thanks!
>
>
> >> +other tools to stop working with your repository. If you have trouble with
> >> +this compatibility, then run `git sparse-checkout sparse-index disable` to
> >> +remove this config and rewrite your index to not be sparse.
> >
> > While I'm commenting on this..:
> >
> > There are several "layers" here, for lack of a better term. "Enabling foo
> > enables bar which may cause baz. If you fail due to baz, try dropping
> > bar by dropping foo." If I remove any mention of the config variable from
> > your text, I get the following.
> >
> >  Enabling sparse index might cause other tools to stop working with your
> >  repository. If you have trouble with this compatibility, then run `git
> >  sparse-checkout sparse-index disable` to rewrite your index to not be
> >  sparse.
> >
> > I'm tempted to suggest such a rewrite to relieve readers of knowing of
> > the middle step, which you could say is more of an implementation
> > detail. But if we think that the symptoms / error messages might involve
> > "extensions.sparseIndex" or, as would be the case with an older Git
> > installation,
> >
> >   fatal: unknown repository extensions found:
> >           sparseindex
> >
> > maybe there is some value in mentioning the config item by name. Just
> > thinking out loud, really, and I don't have any strong opinion. I only
> > came here to point out the typo in the docs.
>
> I agree that the layers are confusing. We could rearrange and have
> a similar flow to what you recommend by mentioning the extension at
> the end:
>
> **WARNING:** Using a sparse index requires modifying the index in a way
> that is not completely understood by other tools. If you have trouble with
> this compatibility, then run `git sparse-checkout sparse-index disable` to
> rewrite your index to not be sparse. Older versions of Git will not
> understand the `sparseIndex` repository extension and may fail to interact
> with your repository until it is disabled.
>
> Thanks,
> -Stolee

This looks pretty good to me, but could we change the first sentence
to read "...modifying the index in a way that may not yet be
understood by external tools." ?  I'm worried "other tools" might make
people worry about different builtin commands (e.g. fast-export, log).
I also prefer "may" and "yet" because I suspect most external tools
(e.g. git filter-repo just to name a personal example) won't need to
read an index format and will thus be unaffected, and any tools that
do read the index format will probably eventually learn how to work
with the new format.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 3/9/2021 4:03 PM, Elijah Newren wrote:
> On Tue, Mar 9, 2021 at 12:52 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 2/24/2021 2:11 PM, Martin Ågren wrote:
>>> There are several "layers" here, for lack of a better term. "Enabling foo
>>> enables bar which may cause baz. If you fail due to baz, try dropping
>>> bar by dropping foo." If I remove any mention of the config variable from
>>> your text, I get the following.
>>>
>>>  Enabling sparse index might cause other tools to stop working with your
>>>  repository. If you have trouble with this compatibility, then run `git
>>>  sparse-checkout sparse-index disable` to rewrite your index to not be
>>>  sparse.
>>>
>>> I'm tempted to suggest such a rewrite to relieve readers of knowing of
>>> the middle step, which you could say is more of an implementation
>>> detail. But if we think that the symptoms / error messages might involve
>>> "extensions.sparseIndex" or, as would be the case with an older Git
>>> installation,
>>>
>>>   fatal: unknown repository extensions found:
>>>           sparseindex
>>>
>>> maybe there is some value in mentioning the config item by name. Just
>>> thinking out loud, really, and I don't have any strong opinion. I only
>>> came here to point out the typo in the docs.
>>
>> I agree that the layers are confusing. We could rearrange and have
>> a similar flow to what you recommend by mentioning the extension at
>> the end:
>>
>> **WARNING:** Using a sparse index requires modifying the index in a way
>> that is not completely understood by other tools. If you have trouble with
>> this compatibility, then run `git sparse-checkout sparse-index disable` to
>> rewrite your index to not be sparse. Older versions of Git will not
>> understand the `sparseIndex` repository extension and may fail to interact
>> with your repository until it is disabled.
>>
>> Thanks,
>> -Stolee
> 
> This looks pretty good to me, but could we change the first sentence
> to read "...modifying the index in a way that may not yet be
> understood by external tools." ?  I'm worried "other tools" might make
> people worry about different builtin commands (e.g. fast-export, log).
> I also prefer "may" and "yet" because I suspect most external tools
> (e.g. git filter-repo just to name a personal example) won't need to
> read an index format and will thus be unaffected, and any tools that
> do read the index format will probably eventually learn how to work
> with the new format.

I can make the change, but I do want to point out that the current
use of a repository extension _does_ mean that tools that (correctly)
interact with a Git repository should fail even if they don't try to
access the index file. This is only something to make this work until
we introduce a new index file format version and then can drop the
extension.

"git filter-repo" _should_ be safe because it's really just shelling
to Git, right? I'm more concerned about tools like libgit2.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Mar 9, 2021 at 1:10 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/9/2021 4:03 PM, Elijah Newren wrote:
> > On Tue, Mar 9, 2021 at 12:52 PM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 2/24/2021 2:11 PM, Martin Ågren wrote:
> >>> There are several "layers" here, for lack of a better term. "Enabling foo
> >>> enables bar which may cause baz. If you fail due to baz, try dropping
> >>> bar by dropping foo." If I remove any mention of the config variable from
> >>> your text, I get the following.
> >>>
> >>>  Enabling sparse index might cause other tools to stop working with your
> >>>  repository. If you have trouble with this compatibility, then run `git
> >>>  sparse-checkout sparse-index disable` to rewrite your index to not be
> >>>  sparse.
> >>>
> >>> I'm tempted to suggest such a rewrite to relieve readers of knowing of
> >>> the middle step, which you could say is more of an implementation
> >>> detail. But if we think that the symptoms / error messages might involve
> >>> "extensions.sparseIndex" or, as would be the case with an older Git
> >>> installation,
> >>>
> >>>   fatal: unknown repository extensions found:
> >>>           sparseindex
> >>>
> >>> maybe there is some value in mentioning the config item by name. Just
> >>> thinking out loud, really, and I don't have any strong opinion. I only
> >>> came here to point out the typo in the docs.
> >>
> >> I agree that the layers are confusing. We could rearrange and have
> >> a similar flow to what you recommend by mentioning the extension at
> >> the end:
> >>
> >> **WARNING:** Using a sparse index requires modifying the index in a way
> >> that is not completely understood by other tools. If you have trouble with
> >> this compatibility, then run `git sparse-checkout sparse-index disable` to
> >> rewrite your index to not be sparse. Older versions of Git will not
> >> understand the `sparseIndex` repository extension and may fail to interact
> >> with your repository until it is disabled.
> >>
> >> Thanks,
> >> -Stolee
> >
> > This looks pretty good to me, but could we change the first sentence
> > to read "...modifying the index in a way that may not yet be
> > understood by external tools." ?  I'm worried "other tools" might make
> > people worry about different builtin commands (e.g. fast-export, log).
> > I also prefer "may" and "yet" because I suspect most external tools
> > (e.g. git filter-repo just to name a personal example) won't need to
> > read an index format and will thus be unaffected, and any tools that
> > do read the index format will probably eventually learn how to work
> > with the new format.
>
> I can make the change, but I do want to point out that the current
> use of a repository extension _does_ mean that tools that (correctly)
> interact with a Git repository should fail even if they don't try to
> access the index file. This is only something to make this work until
> we introduce a new index file format version and then can drop the
> extension.

Good point, though...

> "git filter-repo" _should_ be safe because it's really just shelling
> to Git, right? I'm more concerned about tools like libgit2.

Yes, libgit2 and jgit and similar tools are clearly going to be
affected and deeply.  Those are of concern, but I suspect most users
when they see "external tools" will be thinking of the large multitude
of scripts out there that just shell out to git under the hood to
provide some higher level wrapper of some sort.  And anything that
operates that way won't be affected directly by the repository
extension.

I'm not sure I'd even mark things that shell out to git as _should_ be
safe.  In general, scripts can make all kinds of assumptions on
interpreting output, and I suspect some of those may become
invalidated by this new feature.  We have a recent guidepost that's
very close to home on that too -- git stash had *3* different bugs in
it once sparse-checkouts were introduced, based on the fact that it
was designed as a just-shell-out-to-low-level-git-commands script and
it made assumptions on how those commands worked together.  See
https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/.
Sure git-stash is a builtin (supposedly, anyway), but external tools
can make similar logical jumps.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Martin Ågren wrote (reply to this):

On Tue, 9 Mar 2021 at 21:52, Derrick Stolee <stolee@gmail.com> wrote:
>
> I agree that the layers are confusing. We could rearrange and have
> a similar flow to what you recommend by mentioning the extension at
> the end:
>
> **WARNING:** Using a sparse index requires modifying the index in a way
> that is not completely understood by other tools. If you have trouble with
> this compatibility, then run `git sparse-checkout sparse-index disable` to
> rewrite your index to not be sparse. Older versions of Git will not
> understand the `sparseIndex` repository extension and may fail to interact
> with your repository until it is disabled.

I like it. I find this easier to read than the previous version. That
said, is `git sparse-index sparse-checkout disable` really the way to do
this? I don't see a "sparse-index" subcommand of git-sparse-checkout.
... Hmm, no, after building and installing your patches, I get

  $ git sparse-checkout sparse-index disable
  usage: git sparse-checkout (init|list|set|add|reapply|disable) <options>

Should that be `git sparse-checkout init --no-sparse-index`? I just
tried that on a fresh, empty repo. It seems to work in the sense that it
drops the config item. I'm guessing re-initing a sparse checkout is a
safe and sane thing to do?

I don't find any tests for this. If re-initing should be ok and in
particular if it should allow toggling the use of sparse index, it might
be good having a test. At a minimum to see that the command passes and
that the config item goes away? And check that the actual index is
rewritten back to the "old" format? (Sorry if you have that already and
I'm just bad at finding it.)

Martin

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 3/14/2021 4:08 PM, Martin Ågren wrote:
> On Tue, 9 Mar 2021 at 21:52, Derrick Stolee <stolee@gmail.com> wrote:
>>
>> I agree that the layers are confusing. We could rearrange and have
>> a similar flow to what you recommend by mentioning the extension at
>> the end:
>>
>> **WARNING:** Using a sparse index requires modifying the index in a way
>> that is not completely understood by other tools. If you have trouble with
>> this compatibility, then run `git sparse-checkout sparse-index disable` to
>> rewrite your index to not be sparse. Older versions of Git will not
>> understand the `sparseIndex` repository extension and may fail to interact
>> with your repository until it is disabled.
> 
> I like it. I find this easier to read than the previous version. That
> said, is `git sparse-index sparse-checkout disable` really the way to do
> this? I don't see a "sparse-index" subcommand of git-sparse-checkout.
> ... Hmm, no, after building and installing your patches, I get
> 
>   $ git sparse-checkout sparse-index disable
>   usage: git sparse-checkout (init|list|set|add|reapply|disable) <options>
> 
> Should that be `git sparse-checkout init --no-sparse-index`? I just
> tried that on a fresh, empty repo. It seems to work in the sense that it
> drops the config item. I'm guessing re-initing a sparse checkout is a
> safe and sane thing to do?

Yes! Sorry I missed updating this instance when changing the
design. Your suggestion is indeed the proper way to disable the
sparse-index.
 
> I don't find any tests for this. If re-initing should be ok and in
> particular if it should allow toggling the use of sparse index, it might
> be good having a test. At a minimum to see that the command passes and
> that the config item goes away? And check that the actual index is
> rewritten back to the "old" format? (Sorry if you have that already and
> I'm just bad at finding it.)

We have tests already that 'git sparse-checkout init' will preserve
existing sparse-checkout patterns.

I should definitely have a test to ensure that '--no-sparse-index'
rewrites the index to be a full one. Thanks!

-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 24, 2021

User Martin Ågren <martin.agren@gmail.com> has been added to the cc: list.

@@ -439,6 +439,9 @@ and "sha256".
GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Add a new 'sparse-index' repo alongside the 'full-checkout' and
> 'sparse-checkout' repos in t1092-sparse-checkout-compatibility.sh. Also
> add run_on_sparse and test_sparse_match helpers. These helpers will be
> used when the sparse index is implemented.
>
> Add GIT_TEST_SPARSE_INDEX environment variable to enable the
> sparse-index by default. This will be intended to use across the entire
> test suite, except that it will only affect cases where the
> sparse-checkout feature is enabled.

This last sentence was a bit awkward to read.  "will be intended to
use" -> "is intended to be used"?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/README                                 |  3 +++
>  t/t1092-sparse-checkout-compatibility.sh | 24 ++++++++++++++++++++----
>  2 files changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/t/README b/t/README
> index 593d4a4e270c..b98bc563aab5 100644
> --- a/t/README
> +++ b/t/README
> @@ -439,6 +439,9 @@ and "sha256".
>  GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
>  'pack.writeReverseIndex' setting.
>
> +GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the
> +sparse-index format by default.
> +
>  Naming Tests
>  ------------
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 3725d3997e70..71d6f9e4c014 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -7,6 +7,7 @@ test_description='compare full workdir to sparse workdir'
>  test_expect_success 'setup' '
>         git init initial-repo &&
>         (
> +               GIT_TEST_SPARSE_INDEX=0 &&
>                 cd initial-repo &&
>                 echo a >a &&
>                 echo "after deep" >e &&
> @@ -87,23 +88,32 @@ init_repos () {
>
>         cp -r initial-repo sparse-checkout &&
>         git -C sparse-checkout reset --hard &&
> -       git -C sparse-checkout sparse-checkout init --cone &&
> +
> +       cp -r initial-repo sparse-index &&
> +       git -C sparse-index reset --hard &&
>
>         # initialize sparse-checkout definitions
> -       git -C sparse-checkout sparse-checkout set deep
> +       git -C sparse-checkout sparse-checkout init --cone &&
> +       git -C sparse-checkout sparse-checkout set deep &&
> +       GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone &&
> +       GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep
>  }
>
>  run_on_sparse () {
>         (
>                 cd sparse-checkout &&
> -               "$@" >../sparse-checkout-out 2>../sparse-checkout-err
> +               GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err
> +       ) &&
> +       (
> +               cd sparse-index &&
> +               GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err
>         )
>  }
>
>  run_on_all () {
>         (
>                 cd full-checkout &&
> -               "$@" >../full-checkout-out 2>../full-checkout-err
> +               GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err
>         ) &&
>         run_on_sparse "$@"
>  }
> @@ -114,6 +124,12 @@ test_all_match () {
>         test_cmp full-checkout-err sparse-checkout-err
>  }
>
> +test_sparse_match () {
> +       run_on_sparse $* &&

Should this be
   run_on_sparse "$@"
in order to allow arguments with spaces?

> +       test_cmp sparse-checkout-out sparse-index-out &&
> +       test_cmp sparse-checkout-err sparse-index-err
> +}
> +
>  test_expect_success 'status with options' '
>         init_repos &&
>         test_all_match git status --porcelain=v2 &&
> --
> gitgitgadget

Other than those minor comments, looks good to me.

@@ -2,35 +2,65 @@
#include "cache.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> This table is helpful for discovering data in the index to ensure it is
> being written correctly, especially as we build and test the
> sparse-index. This table includes an output format similar to 'git
> ls-tree', but should not be compared to that directly. The biggest
> reasons are that 'git ls-tree' includes a tree entry for every
> subdirectory, even those that would not appear as a sparse directory in
> a sparse-index. Further, 'git ls-tree' does not use a trailing directory
> separator for its tree rows.
>
> This does not print the stat() information for the blobs. That could be
> added in a future change with another option. The tests that are added
> in the next few changes care only about the object types and IDs.
>
> To make the option parsing slightly more robust, wrap the string
> comparisons in a loop adapted from test-dir-iterator.c.
>
> Care must be taken with the final check for the 'cnt' variable. We
> continue the expectation that the numerical value is the final argument.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/helper/test-read-cache.c | 50 ++++++++++++++++++++++++++++++--------
>  1 file changed, 40 insertions(+), 10 deletions(-)
>
> diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
> index 244977a29bdf..e4c3492f7d3e 100644
> --- a/t/helper/test-read-cache.c
> +++ b/t/helper/test-read-cache.c
> @@ -2,35 +2,65 @@
>  #include "cache.h"
>  #include "config.h"
>
> +static void print_cache_entry(struct cache_entry *ce)
> +{
> +       printf("%06o ", ce->ce_mode & 0777777);

This constant is curious.  I think it's because you want to strip off
the special in-memory bits of the ce_mode where git stores extra data,
which would be everything beyond the first 16 bits (as noted in a
comment near the beginning of cache.h).  But here you keep the first
18 bits.  Is CE_UPDATE and CE_REMOVE just 0 in the cases you've viewed
so this works (but you really should use 0177777 or 0xFFFF), or am I
off in my guess of what you're trying to do and you do want to see
those two flags?

It also seems surprising to me that this constant isn't already
defined somewhere in cache.h or as some variant of S_IFMT, though I'm
not finding it.

> +
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               printf("tree ");
> +       else if (S_ISGITLINK(ce->ce_mode))
> +               printf("commit ");
> +       else
> +               printf("blob ");

Perhaps make use of the tree_type, commit_type, and blob_type global constants?

> +
> +       printf("%s\t%s\n",
> +              oid_to_hex(&ce->oid),
> +              ce->name);
> +}
> +
> +static void print_cache(struct index_state *cache)
> +{
> +       int i;
> +       for (i = 0; i < the_index.cache_nr; i++)
> +               print_cache_entry(the_index.cache[i]);

Why are you passing cache as a parameter, then ignoring it and using the_index?

> +}
> +
>  int cmd__read_cache(int argc, const char **argv)
>  {
> +       struct repository *r = the_repository;
>         int i, cnt = 1;
>         const char *name = NULL;
> +       int table = 0;
>
> -       if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) {
> -               argc--;
> -               argv++;
> +       for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
> +               if (skip_prefix(*argv, "--print-and-refresh=", &name))
> +                       continue;
> +               if (!strcmp(*argv, "--table"))
> +                       table = 1;
>         }
>
> -       if (argc == 2)
> -               cnt = strtol(argv[1], NULL, 0);
> +       if (argc == 1)
> +               cnt = strtol(argv[0], NULL, 0);
>         setup_git_directory();
>         git_config(git_default_config, NULL);
> +
>         for (i = 0; i < cnt; i++) {
> -               read_cache();
> +               repo_read_index(r);
>                 if (name) {
>                         int pos;
>
> -                       refresh_index(&the_index, REFRESH_QUIET,
> +                       refresh_index(r->index, REFRESH_QUIET,
>                                       NULL, NULL, NULL);
> -                       pos = index_name_pos(&the_index, name, strlen(name));
> +                       pos = index_name_pos(r->index, name, strlen(name));
>                         if (pos < 0)
>                                 die("%s not in index", name);
>                         printf("%s is%s up to date\n", name,
> -                              ce_uptodate(the_index.cache[pos]) ? "" : " not");
> +                              ce_uptodate(r->index->cache[pos]) ? "" : " not");
>                         write_file(name, "%d\n", i);
>                 }
> -               discard_cache();
> +               if (table)
> +                       print_cache(r->index);
> +               discard_index(r->index);
>         }
>         return 0;
>  }
> --
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/25/2021 2:02 AM, Elijah Newren wrote:
> On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> This table is helpful for discovering data in the index to ensure it is
>> being written correctly, especially as we build and test the
>> sparse-index. This table includes an output format similar to 'git
>> ls-tree', but should not be compared to that directly. The biggest
>> reasons are that 'git ls-tree' includes a tree entry for every
>> subdirectory, even those that would not appear as a sparse directory in
>> a sparse-index. Further, 'git ls-tree' does not use a trailing directory
>> separator for its tree rows.
>>
>> This does not print the stat() information for the blobs. That could be
>> added in a future change with another option. The tests that are added
>> in the next few changes care only about the object types and IDs.
>>
>> To make the option parsing slightly more robust, wrap the string
>> comparisons in a loop adapted from test-dir-iterator.c.
>>
>> Care must be taken with the final check for the 'cnt' variable. We
>> continue the expectation that the numerical value is the final argument.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  t/helper/test-read-cache.c | 50 ++++++++++++++++++++++++++++++--------
>>  1 file changed, 40 insertions(+), 10 deletions(-)
>>
>> diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
>> index 244977a29bdf..e4c3492f7d3e 100644
>> --- a/t/helper/test-read-cache.c
>> +++ b/t/helper/test-read-cache.c
>> @@ -2,35 +2,65 @@
>>  #include "cache.h"
>>  #include "config.h"
>>
>> +static void print_cache_entry(struct cache_entry *ce)
>> +{
>> +       printf("%06o ", ce->ce_mode & 0777777);
> 
> This constant is curious.  I think it's because you want to strip off
> the special in-memory bits of the ce_mode where git stores extra data,
> which would be everything beyond the first 16 bits (as noted in a
> comment near the beginning of cache.h).  But here you keep the first
> 18 bits.  Is CE_UPDATE and CE_REMOVE just 0 in the cases you've viewed
> so this works (but you really should use 0177777 or 0xFFFF), or am I
> off in my guess of what you're trying to do and you do want to see
> those two flags?

You're right, 0177777 is what I want. I'm focusing only on the
file permissions bits that are reported by ls-tree.

> It also seems surprising to me that this constant isn't already
> defined somewhere in cache.h or as some variant of S_IFMT, though I'm
> not finding it.

I'm not, either.

>> +
>> +       if (S_ISSPARSEDIR(ce->ce_mode))
>> +               printf("tree ");
>> +       else if (S_ISGITLINK(ce->ce_mode))
>> +               printf("commit ");
>> +       else
>> +               printf("blob ");
> 
> Perhaps make use of the tree_type, commit_type, and blob_type global constants?

Today I Learned...

>> +
>> +       printf("%s\t%s\n",
>> +              oid_to_hex(&ce->oid),
>> +              ce->name);
>> +}
>> +
>> +static void print_cache(struct index_state *cache)
>> +{
>> +       int i;
>> +       for (i = 0; i < the_index.cache_nr; i++)
>> +               print_cache_entry(the_index.cache[i]);
> 
> Why are you passing cache as a parameter, then ignoring it and using the_index?

Good catch!

Thanks,
-Stolee

@@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl)
if (is_index_unborn(r->index))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As we modify the sparse-checkout definition, we perform index operations
> on a pattern_list that only exists in-memory. This allows easy backing
> out in case the index update fails.
>
> However, if the index write itself cares about the sparse-checkout
> pattern set, we need access to that in-memory copy. Place a pointer to
> a 'struct pattern_list' in the index so we can access this on-demand.
> This will be used in the next change which uses the sparse-checkout
> definition to filter out directories that are outsie the sparse cone.

Looks like you still have the "outsie" typo.  ;-)

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/sparse-checkout.c | 17 ++++++++++-------
>  cache.h                   |  2 ++
>  2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 2306a9ad98e0..e00b82af727b 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl)
>         if (is_index_unborn(r->index))
>                 return UPDATE_SPARSITY_SUCCESS;
>
> +       r->index->sparse_checkout_patterns = pl;
> +
>         memset(&o, 0, sizeof(o));
>         o.verbose_update = isatty(2);
>         o.update = 1;
> @@ -138,6 +140,7 @@ static int update_working_directory(struct pattern_list *pl)
>         else
>                 rollback_lock_file(&lock_file);
>
> +       r->index->sparse_checkout_patterns = NULL;
>         return result;
>  }
>
> @@ -517,19 +520,18 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
>  {
>         int result;
>         int changed_config = 0;
> -       struct pattern_list pl;
> -       memset(&pl, 0, sizeof(pl));
> +       struct pattern_list *pl = xcalloc(1, sizeof(*pl));
>
>         switch (m) {
>         case ADD:
>                 if (core_sparse_checkout_cone)
> -                       add_patterns_cone_mode(argc, argv, &pl);
> +                       add_patterns_cone_mode(argc, argv, pl);
>                 else
> -                       add_patterns_literal(argc, argv, &pl);
> +                       add_patterns_literal(argc, argv, pl);
>                 break;
>
>         case REPLACE:
> -               add_patterns_from_input(&pl, argc, argv);
> +               add_patterns_from_input(pl, argc, argv);
>                 break;
>         }
>
> @@ -539,12 +541,13 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
>                 changed_config = 1;
>         }
>
> -       result = write_patterns_and_update(&pl);
> +       result = write_patterns_and_update(pl);
>
>         if (result && changed_config)
>                 set_config(MODE_NO_PATTERNS);
>
> -       clear_pattern_list(&pl);
> +       clear_pattern_list(pl);
> +       free(pl);
>         return result;
>  }
>
> diff --git a/cache.h b/cache.h
> index 1336c8d7435e..d75b352f38d3 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -307,6 +307,7 @@ static inline unsigned int canon_mode(unsigned int mode)
>  struct split_index;
>  struct untracked_cache;
>  struct progress;
> +struct pattern_list;
>
>  struct index_state {
>         struct cache_entry **cache;
> @@ -332,6 +333,7 @@ struct index_state {
>         struct mem_pool *ce_mem_pool;
>         struct progress *progress;
>         struct repository *repo;
> +       struct pattern_list *sparse_checkout_patterns;
>  };
>
>  /* Name hashing */
> --
> gitgitgadget
>

@@ -6,6 +6,7 @@
#include "object-store.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> If we have a full index, then we can convert it to a sparse index by
> replacing directories outside of the sparse cone with sparse directory
> entries. The convert_to_sparse() method does this, when the situation is
> appropriate.
>
> For now, we avoid converting the index to a sparse index if:
>
>  1. the index is split.
>  2. the index is already sparse.
>  3. sparse-checkout is disabled.
>  4. sparse-checkout does not use cone mode.
>
> Finally, we currently limit the conversion to when the
> GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git
> config will be added in a later change.
>
> The trickiest thing about this conversion is that we might not be able
> to mark a directory as a sparse directory just because it is outside the
> sparse cone. There might be unmerged files within that directory, so we
> need to look for those. Also, if there is some strange reason why a file
> is not marked with CE_SKIP_WORKTREE, then we should give up on
> converting that directory. There is still hope that some of its
> subdirectories might be able to convert to sparse, so we keep looking
> deeper.
>
> The conversion process is assisted by the cache-tree extension. This is
> calculated from the full index if it does not already exist. We then
> abandon the cache-tree as it no longer applies to the newly-sparse
> index. Thus, this cache-tree will be recalculated in every
> sparse-full-sparse round-trip until we integrate the cache-tree
> extension with the sparse index.
>
> Some Git commands use the index after writing it. For example, 'git add'
> will update the index, then write it to disk, then read its entries to
> report information. To keep the in-memory index in a full state after
> writing, we re-expand it to a full one after the write. This is wasteful
> for commands that only write the index and do not read from it again,
> but that is only the case until we make those commands "sparse aware."
>
> We can compare the behavior of the sparse-index in
> t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1
> when operating on the 'sparse-index' repo. We can also compare the two
> sparse repos directly, such as comparing their indexes (when expanded to
> full in the case of the 'sparse-index' repo). We also verify that the
> index is actually populated with sparse directory entries.
>
> The 'checkout and reset (mixed)' test is marked for failure when
> comparing a sparse repo to a full repo, but we can compare the two
> sparse-checkout cases directly to ensure that we are not changing the
> behavior when using a sparse index.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  cache-tree.c                             |   3 +
>  cache.h                                  |   2 +
>  read-cache.c                             |  26 ++++-
>  sparse-index.c                           | 139 +++++++++++++++++++++++
>  sparse-index.h                           |   1 +
>  t/t1092-sparse-checkout-compatibility.sh |  61 +++++++++-
>  6 files changed, 227 insertions(+), 5 deletions(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 2fb483d3c083..5f07a39e501e 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -6,6 +6,7 @@
>  #include "object-store.h"
>  #include "replace-object.h"
>  #include "promisor-remote.h"
> +#include "sparse-index.h"
>
>  #ifndef DEBUG_CACHE_TREE
>  #define DEBUG_CACHE_TREE 0
> @@ -442,6 +443,8 @@ int cache_tree_update(struct index_state *istate, int flags)
>         if (i)
>                 return i;
>
> +       ensure_full_index(istate);
> +
>         if (!istate->cache_tree)
>                 istate->cache_tree = cache_tree();
>
> diff --git a/cache.h b/cache.h
> index d75b352f38d3..e8b7d3b4fb33 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode)
>  {
>         if (S_ISLNK(mode))
>                 return S_IFLNK;
> +       if (mode == S_IFDIR)
> +               return S_IFDIR;
>         if (S_ISDIR(mode) || S_ISGITLINK(mode))
>                 return S_IFGITLINK;
>         return S_IFREG | ce_permissions(mode);
> diff --git a/read-cache.c b/read-cache.c
> index 97dbf2434f30..67acbf202f4e 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -25,6 +25,7 @@
>  #include "fsmonitor.h"
>  #include "thread-utils.h"
>  #include "progress.h"
> +#include "sparse-index.h"
>
>  /* Mask for the name length in ce_flags in the on-disk index */
>
> @@ -1002,8 +1003,14 @@ int verify_path(const char *path, unsigned mode)
>
>                         c = *path++;
>                         if ((c == '.' && !verify_dotfile(path, mode)) ||
> -                           is_dir_sep(c) || c == '\0')
> +                           is_dir_sep(c))
>                                 return 0;
> +                       /*
> +                        * allow terminating directory separators for
> +                        * sparse directory enries.

enries -> entries

> +                        */
> +                       if (c == '\0')
> +                               return S_ISDIR(mode);

Yaay, much simpler (than the RFC version).

>                 } else if (c == '\\' && protect_ntfs) {
>                         if (is_ntfs_dotgit(path))
>                                 return 0;
> @@ -3061,6 +3068,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>                                  unsigned flags)
>  {
>         int ret;
> +       int was_full = !istate->sparse_index;
> +
> +       ret = convert_to_sparse(istate);
> +
> +       if (ret) {
> +               warning(_("failed to convert to a sparse-index"));
> +               return ret;
> +       }
>
>         /*
>          * TODO trace2: replace "the_repository" with the actual repo instance
> @@ -3072,6 +3087,9 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>         trace2_region_leave_printf("index", "do_write_index", the_repository,
>                                    "%s", get_lock_file_path(lock));
>
> +       if (was_full)
> +               ensure_full_index(istate);
> +
>         if (ret)
>                 return ret;
>         if (flags & COMMIT_LOCK)
> @@ -3162,9 +3180,10 @@ static int write_shared_index(struct index_state *istate,
>                               struct tempfile **temp)
>  {
>         struct split_index *si = istate->split_index;
> -       int ret;
> +       int ret, was_full = !istate->sparse_index;
>
>         move_cache_to_base_index(istate);
> +       convert_to_sparse(istate);
>
>         trace2_region_enter_printf("index", "shared/do_write_index",
>                                    the_repository, "%s", get_tempfile_path(*temp));
> @@ -3172,6 +3191,9 @@ static int write_shared_index(struct index_state *istate,
>         trace2_region_leave_printf("index", "shared/do_write_index",
>                                    the_repository, "%s", get_tempfile_path(*temp));
>
> +       if (was_full)
> +               ensure_full_index(istate);
> +
>         if (ret)
>                 return ret;
>         ret = adjust_shared_perm(get_tempfile_path(*temp));
> diff --git a/sparse-index.c b/sparse-index.c
> index 316cb949b74b..cb1f85635fbc 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -4,6 +4,145 @@
>  #include "tree.h"
>  #include "pathspec.h"
>  #include "trace2.h"
> +#include "cache-tree.h"
> +#include "config.h"
> +#include "dir.h"
> +#include "fsmonitor.h"
> +
> +static struct cache_entry *construct_sparse_dir_entry(
> +                               struct index_state *istate,
> +                               const char *sparse_dir,
> +                               struct cache_tree *tree)
> +{
> +       struct cache_entry *de;
> +
> +       de = make_cache_entry(istate, S_IFDIR, &tree->oid, sparse_dir, 0, 0);
> +
> +       de->ce_flags |= CE_SKIP_WORKTREE;
> +       return de;
> +}
> +
> +/*
> + * Returns the number of entries "inserted" into the index.
> + */
> +static int convert_to_sparse_rec(struct index_state *istate,
> +                                int num_converted,
> +                                int start, int end,
> +                                const char *ct_path, size_t ct_pathlen,
> +                                struct cache_tree *ct)
> +{
> +       int i, can_convert = 1;
> +       int start_converted = num_converted;
> +       enum pattern_match_result match;
> +       int dtype;
> +       struct strbuf child_path = STRBUF_INIT;
> +       struct pattern_list *pl = istate->sparse_checkout_patterns;
> +
> +       /*
> +        * Is the current path outside of the sparse cone?
> +        * Then check if the region can be replaced by a sparse
> +        * directory entry (everything is sparse and merged).
> +        */
> +       match = path_matches_pattern_list(ct_path, ct_pathlen,
> +                                         NULL, &dtype, pl, istate);
> +       if (match != NOT_MATCHED)
> +               can_convert = 0;

Not sure if you saw my comments on the flow control at
https://lore.kernel.org/git/CABPp-BE9wPwmC0=pA4p1_QSRDHrO8RzqfJQdE2NxYZsYL_Rcig@mail.gmail.com/
(the typos elsewhere seem to still be present).  If you saw it and
decided against it, that's fine, just wanted the idea to at least be
floated.

> +
> +       for (i = start; can_convert && i < end; i++) {
> +               struct cache_entry *ce = istate->cache[i];
> +
> +               if (ce_stage(ce) ||
> +                   !(ce->ce_flags & CE_SKIP_WORKTREE))
> +                       can_convert = 0;
> +       }
> +
> +       if (can_convert) {
> +               struct cache_entry *se;
> +               se = construct_sparse_dir_entry(istate, ct_path, ct);
> +
> +               istate->cache[num_converted++] = se;
> +               return 1;
> +       }
> +
> +       for (i = start; i < end; ) {
> +               int count, span, pos = -1;
> +               const char *base, *slash;
> +               struct cache_entry *ce = istate->cache[i];
> +
> +               /*
> +                * Detect if this is a normal entry oustide of any subtree

s/oustide/outside/

> +                * entry.
> +                */
> +               base = ce->name + ct_pathlen;
> +               slash = strchr(base, '/');
> +
> +               if (slash)
> +                       pos = cache_tree_subtree_pos(ct, base, slash - base);
> +
> +               if (pos < 0) {
> +                       istate->cache[num_converted++] = ce;
> +                       i++;
> +                       continue;
> +               }
> +
> +               strbuf_setlen(&child_path, 0);
> +               strbuf_add(&child_path, ce->name, slash - ce->name + 1);
> +
> +               span = ct->down[pos]->cache_tree->entry_count;
> +               count = convert_to_sparse_rec(istate,
> +                                             num_converted, i, i + span,
> +                                             child_path.buf, child_path.len,
> +                                             ct->down[pos]->cache_tree);
> +               num_converted += count;
> +               i += span;
> +       }
> +
> +       strbuf_release(&child_path);
> +       return num_converted - start_converted;
> +}
> +
> +int convert_to_sparse(struct index_state *istate)
> +{
> +       if (istate->split_index || istate->sparse_index ||
> +           !core_apply_sparse_checkout || !core_sparse_checkout_cone)
> +               return 0;
> +
> +       /*
> +        * For now, only create a sparse index with the
> +        * GIT_TEST_SPARSE_INDEX environment variable. We will relax
> +        * this once we have a proper way to opt-in (and later still,
> +        * opt-out).
> +        */
> +       if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0))
> +               return 0;
> +
> +       if (!istate->sparse_checkout_patterns) {
> +               istate->sparse_checkout_patterns = xcalloc(1, sizeof(struct pattern_list));
> +               if (get_sparse_checkout_patterns(istate->sparse_checkout_patterns) < 0)
> +                       return 0;
> +       }
> +
> +       if (!istate->sparse_checkout_patterns->use_cone_patterns) {
> +               warning(_("attempting to use sparse-index without cone mode"));
> +               return -1;
> +       }
> +
> +       if (cache_tree_update(istate, 0)) {
> +               warning(_("unable to update cache-tree, staying full"));
> +               return -1;
> +       }
> +
> +       remove_fsmonitor(istate);
> +
> +       trace2_region_enter("index", "convert_to_sparse", istate->repo);
> +       istate->cache_nr = convert_to_sparse_rec(istate,
> +                                                0, 0, istate->cache_nr,
> +                                                "", 0, istate->cache_tree);
> +       istate->drop_cache_tree = 1;
> +       istate->sparse_index = 1;
> +       trace2_region_leave("index", "convert_to_sparse", istate->repo);
> +       return 0;
> +}
>
>  static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
>  {
> diff --git a/sparse-index.h b/sparse-index.h
> index 09a20d036c46..64380e121d80 100644
> --- a/sparse-index.h
> +++ b/sparse-index.h
> @@ -3,5 +3,6 @@
>
>  struct index_state;
>  void ensure_full_index(struct index_state *istate);
> +int convert_to_sparse(struct index_state *istate);
>
>  #endif
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 4d789fe86b9d..ca87033d30b0 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -2,6 +2,9 @@
>
>  test_description='compare full workdir to sparse workdir'
>
> +GIT_TEST_CHECK_CACHE_TREE=0

Same question as I posted for the RFC series:

Why do you need to set this?  I vaguely remember needing to mess with
this when working with sparse checkouts because it did weird stuff but
I don't remember details.  But since your patch touches cache_trees, it
seems weird to show up without explanation.

> +GIT_TEST_SPLIT_INDEX=0
> +
>  . ./test-lib.sh
>
>  test_expect_success 'setup' '
> @@ -121,15 +124,49 @@ run_on_all () {
>  test_all_match () {
>         run_on_all "$@" &&
>         test_cmp full-checkout-out sparse-checkout-out &&
> -       test_cmp full-checkout-err sparse-checkout-err
> +       test_cmp full-checkout-out sparse-index-out &&
> +       test_cmp full-checkout-err sparse-checkout-err &&
> +       test_cmp full-checkout-err sparse-index-err
>  }
>
>  test_sparse_match () {
> -       run_on_sparse $* &&
> +       run_on_sparse "$@" &&
>         test_cmp sparse-checkout-out sparse-index-out &&
>         test_cmp sparse-checkout-err sparse-index-err
>  }
>
> +test_expect_success 'sparse-index contents' '
> +       init_repos &&
> +
> +       test-tool -C sparse-index read-cache --table >cache &&
> +       for dir in folder1 folder2 x
> +       do
> +               TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
> +               grep "040000 tree $TREE $dir/" cache \
> +                       || return 1
> +       done &&

Thanks for making the output look more like ls-tree output; it's
easier to parse that way, at least for me.

> +
> +       GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 &&
> +
> +       test-tool -C sparse-index read-cache --table >cache &&
> +       for dir in deep folder2 x
> +       do
> +               TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
> +               grep "040000 tree $TREE $dir/" cache \
> +                       || return 1
> +       done &&
> +
> +       GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 &&
> +
> +       test-tool -C sparse-index read-cache --table >cache &&
> +       for dir in deep/deeper2 folder1 folder2 x
> +       do
> +               TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
> +               grep "040000 tree $TREE $dir/" cache \
> +                       || return 1
> +       done
> +'
> +
>  test_expect_success 'expanded in-memory index matches full index' '
>         init_repos &&
>         test_sparse_match test-tool read-cache --expand --table
> @@ -137,6 +174,7 @@ test_expect_success 'expanded in-memory index matches full index' '
>
>  test_expect_success 'status with options' '
>         init_repos &&
> +       test_sparse_match ls &&
>         test_all_match git status --porcelain=v2 &&
>         test_all_match git status --porcelain=v2 -z -u &&
>         test_all_match git status --porcelain=v2 -uno &&
> @@ -273,6 +311,17 @@ test_expect_failure 'checkout and reset (mixed)' '
>         test_all_match git reset update-folder2
>  '
>
> +# Ensure that sparse-index behaves identically to
> +# sparse-checkout with a full index.
> +test_expect_success 'checkout and reset (mixed) [sparse]' '
> +       init_repos &&
> +
> +       test_sparse_match git checkout -b reset-test update-deep &&
> +       test_sparse_match git reset deepest &&
> +       test_sparse_match git reset update-folder1 &&
> +       test_sparse_match git reset update-folder2
> +'
> +
>  test_expect_success 'merge' '
>         init_repos &&
>
> @@ -309,14 +358,20 @@ test_expect_success 'clean' '
>         test_all_match git status --porcelain=v2 &&
>         test_all_match git clean -f &&
>         test_all_match git status --porcelain=v2 &&
> +       test_sparse_match ls &&
> +       test_sparse_match ls folder1 &&
>
>         test_all_match git clean -xf &&
>         test_all_match git status --porcelain=v2 &&
> +       test_sparse_match ls &&
> +       test_sparse_match ls folder1 &&
>
>         test_all_match git clean -xdf &&
>         test_all_match git status --porcelain=v2 &&
> +       test_sparse_match ls &&
> +       test_sparse_match ls folder1 &&
>
> -       test_path_is_dir sparse-checkout/folder1
> +       test_sparse_match test_path_is_dir folder1
>  '
>
>  test_done
> --
> gitgitgadget

I mostly read over the range-diff since it was much shorter.  You've
addressed a number of questions/comments I had on the RFC version, but
there's still some I didn't see a response to so I reposted them here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/25/2021 2:33 AM, Elijah Newren wrote:
> On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

>> +                       /*
>> +                        * allow terminating directory separators for
>> +                        * sparse directory enries.
> 
> enries -> entries

Thanks.

>> +                        */
>> +                       if (c == '\0')
>> +                               return S_ISDIR(mode);
> 
> Yaay, much simpler (than the RFC version).

>> +       /*
>> +        * Is the current path outside of the sparse cone?
>> +        * Then check if the region can be replaced by a sparse
>> +        * directory entry (everything is sparse and merged).
>> +        */
>> +       match = path_matches_pattern_list(ct_path, ct_pathlen,
>> +                                         NULL, &dtype, pl, istate);
>> +       if (match != NOT_MATCHED)
>> +               can_convert = 0;
> 
> Not sure if you saw my comments on the flow control at
> https://lore.kernel.org/git/CABPp-BE9wPwmC0=pA4p1_QSRDHrO8RzqfJQdE2NxYZsYL_Rcig@mail.gmail.com/
> (the typos elsewhere seem to still be present).  If you saw it and
> decided against it, that's fine, just wanted the idea to at least be
> floated.

Sorry for dropping this one. I _did_ decide against it, and
primarily because the "if (can_convert)" condition contains
a return statement. I like to use 'gotos' for blocks that
will eventually be entered by all paths through the code,
such as "goto cleanup;" but here I find the "can_convert"
check to be clearer.

>> +               /*
>> +                * Detect if this is a normal entry oustide of any subtree
> 
> s/oustide/outside/

Got it.

>> +test_expect_success 'sparse-index contents' '
>> +       init_repos &&
>> +
>> +       test-tool -C sparse-index read-cache --table >cache &&
>> +       for dir in folder1 folder2 x
>> +       do
>> +               TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
>> +               grep "040000 tree $TREE $dir/" cache \
>> +                       || return 1
>> +       done &&
> 
> Thanks for making the output look more like ls-tree output; it's
> easier to parse that way, at least for me.

Excellent.
 
> I mostly read over the range-diff since it was much shorter.  You've
> addressed a number of questions/comments I had on the RFC version, but
> there's still some I didn't see a response to so I reposted them here.
 
Thanks for being diligent!
-Stolee

@@ -746,9 +746,12 @@ static int index_pos_by_traverse_info(struct name_entry *names,
strbuf_make_traverse_path(&name, info, names->path, names->pathlen);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The index_pos_by_traverse_info() currently throws a BUG() when a
> directory entry exists exactly in the index. We need to consider that it
> is possible to have a directory in a sparse index as long as that entry
> is itself marked with the skip-worktree bit.
>
> The negation of the 'pos' variable must be conditioned to only when it
> starts as negative. This is identical behavior as before when the index
> is full.

Same comment on the second paragraph as I made in the RFC series --
https://lore.kernel.org/git/CABPp-BGPJgA4guWHVm3AVS=hM0fTixUpRvJe5i9NnHT-3QJMfw@mail.gmail.com/.
I apologize if I'm repeating stuff you chose to not change, but I
didn't see a response and given the three typos left in previous
patches, I'm unsure whether it was unaddressed on purpose or on
accident.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  unpack-trees.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 4dd99219073a..b324eec2a5d1 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -746,9 +746,12 @@ static int index_pos_by_traverse_info(struct name_entry *names,
>         strbuf_make_traverse_path(&name, info, names->path, names->pathlen);
>         strbuf_addch(&name, '/');
>         pos = index_name_pos(o->src_index, name.buf, name.len);
> -       if (pos >= 0)
> -               BUG("This is a directory and should not exist in index");
> -       pos = -pos - 1;
> +       if (pos >= 0) {
> +               if (!o->src_index->sparse_index ||
> +                   !(o->src_index->cache[pos]->ce_flags & CE_SKIP_WORKTREE))
> +                       BUG("This is a directory and should not exist in index");
> +       } else
> +               pos = -pos - 1;
>         if (pos >= o->src_index->cache_nr ||
>             !starts_with(o->src_index->cache[pos]->name, name.buf) ||
>             (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name.buf)))
> --
> gitgitgadget
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/25/2021 2:40 AM, Elijah Newren wrote:
> On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The index_pos_by_traverse_info() currently throws a BUG() when a
>> directory entry exists exactly in the index. We need to consider that it
>> is possible to have a directory in a sparse index as long as that entry
>> is itself marked with the skip-worktree bit.
>>
>> The negation of the 'pos' variable must be conditioned to only when it
>> starts as negative. This is identical behavior as before when the index
>> is full.
> 
> Same comment on the second paragraph as I made in the RFC series --
> https://lore.kernel.org/git/CABPp-BGPJgA4guWHVm3AVS=hM0fTixUpRvJe5i9NnHT-3QJMfw@mail.gmail.com/.
> I apologize if I'm repeating stuff you chose to not change, but I
> didn't see a response and given the three typos left in previous
> patches, I'm unsure whether it was unaddressed on purpose or on
> accident.

Yes, I dropped this one. How about this?

    The 'pos' variable is assigned a negative value if an exact match is not
    found. Since a directory name can be an exact match, it is no longer an
    error to have a nonnegative 'pos' value.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Mar 9, 2021 at 1:35 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 2/25/2021 2:40 AM, Elijah Newren wrote:
> > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> The index_pos_by_traverse_info() currently throws a BUG() when a
> >> directory entry exists exactly in the index. We need to consider that it
> >> is possible to have a directory in a sparse index as long as that entry
> >> is itself marked with the skip-worktree bit.
> >>
> >> The negation of the 'pos' variable must be conditioned to only when it
> >> starts as negative. This is identical behavior as before when the index
> >> is full.
> >
> > Same comment on the second paragraph as I made in the RFC series --
> > https://lore.kernel.org/git/CABPp-BGPJgA4guWHVm3AVS=hM0fTixUpRvJe5i9NnHT-3QJMfw@mail.gmail.com/.
> > I apologize if I'm repeating stuff you chose to not change, but I
> > didn't see a response and given the three typos left in previous
> > patches, I'm unsure whether it was unaddressed on purpose or on
> > accident.
>
> Yes, I dropped this one. How about this?
>
>     The 'pos' variable is assigned a negative value if an exact match is not
>     found. Since a directory name can be an exact match, it is no longer an
>     error to have a nonnegative 'pos' value.

I like it!

@@ -6,3 +6,10 @@ extensions.objectFormat::
Note that this setting should only be set by linkgit:git-init[1] or
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Previously, we enabled the sparse index format only using
> GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to
> actually select this mode. Further, sparse directory entries are not
> understood by the index formats as advertised.
>
> We _could_ add a new index version that explicitly adds these
> capabilities, but there are nuances to index formats 2, 3, and 4 that
> are still valuable to select as options. For now, create a repo
> extension, "extensions.sparseIndex", that specifies that the tool
> reading this repository must understand sparse directory entries.

This commit is unchanged from the RFC series, but given your comments
in the design document about how you do intend to create an index
format v5 now, do you want to reference that here?

>
> This change only encodes the extension and enables it when
> GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI
> mechanism.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config/extensions.txt |  7 ++++++
>  cache.h                             |  1 +
>  repo-settings.c                     |  7 ++++++
>  repository.h                        |  3 ++-
>  setup.c                             |  3 +++
>  sparse-index.c                      | 38 +++++++++++++++++++++++++----
>  6 files changed, 53 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt
> index 4e23d73cdcad..5c86b3648732 100644
> --- a/Documentation/config/extensions.txt
> +++ b/Documentation/config/extensions.txt
> @@ -6,3 +6,10 @@ extensions.objectFormat::
>  Note that this setting should only be set by linkgit:git-init[1] or
>  linkgit:git-clone[1].  Trying to change it after initialization will not
>  work and will produce hard-to-diagnose issues.
> +
> +extensions.sparseIndex::
> +       When combined with `core.sparseCheckout=true` and
> +       `core.sparseCheckoutCone=true`, the index may contain entries
> +       corresponding to directories outside of the sparse-checkout
> +       definition. Versions of Git that do not understand this extension
> +       do not expect directory entries in the index.

I had a wording suggestion for this paragraph in the RFC series --
https://lore.kernel.org/git/CABPp-BFEJE82k4VgkR=Jf7V7sZxZzo2pHMfAGshhi9_vV6iK0w@mail.gmail.com/.
Let me know if you just decided to leave it out so I don't bug you
about stuff you already considered.

> diff --git a/cache.h b/cache.h
> index e8b7d3b4fb33..eea61fba7568 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1053,6 +1053,7 @@ struct repository_format {
>         int worktree_config;
>         int is_bare;
>         int hash_algo;
> +       int sparse_index;
>         char *work_tree;
>         struct string_list unknown_extensions;
>         struct string_list v1_only_extensions;
> diff --git a/repo-settings.c b/repo-settings.c
> index d63569e4041e..9677d50f9238 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -85,4 +85,11 @@ void prepare_repo_settings(struct repository *r)
>          * removed.
>          */
>         r->settings.command_requires_full_index = 1;
> +
> +       /*
> +        * Initialize this as off.
> +        */
> +       r->settings.sparse_index = 0;
> +       if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value)
> +               r->settings.sparse_index = 1;
>  }
> diff --git a/repository.h b/repository.h
> index e06a23015697..a45f7520fd9e 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -42,7 +42,8 @@ struct repo_settings {
>
>         int core_multi_pack_index;
>
> -       unsigned command_requires_full_index:1;
> +       unsigned command_requires_full_index:1,
> +                sparse_index:1;
>  };
>
>  struct repository {
> diff --git a/setup.c b/setup.c
> index c04cd25a30df..cd8394564613 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -500,6 +500,9 @@ static enum extension_result handle_extension(const char *var,
>                         return error("invalid value for 'extensions.objectformat'");
>                 data->hash_algo = format;
>                 return EXTENSION_OK;
> +       } else if (!strcmp(ext, "sparseindex")) {
> +               data->sparse_index = 1;
> +               return EXTENSION_OK;
>         }
>         return EXTENSION_UNKNOWN;
>  }
> diff --git a/sparse-index.c b/sparse-index.c
> index 14029fafc750..97b0d0c57857 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -102,19 +102,47 @@ static int convert_to_sparse_rec(struct index_state *istate,
>         return num_converted - start_converted;
>  }
>
> +static int enable_sparse_index(struct repository *repo)
> +{
> +       const char *config_path = repo_git_path(repo, "config.worktree");
> +
> +       if (upgrade_repository_format(1) < 0) {
> +               warning(_("unable to upgrade repository format to enable sparse-index"));
> +               return -1;
> +       }
> +       git_config_set_in_file_gently(config_path,
> +                                     "extensions.sparseIndex",
> +                                     "true");
> +
> +       prepare_repo_settings(repo);
> +       repo->settings.sparse_index = 1;
> +       return 0;
> +}
> +
>  int convert_to_sparse(struct index_state *istate)
>  {
>         if (istate->split_index || istate->sparse_index ||
>             !core_apply_sparse_checkout || !core_sparse_checkout_cone)
>                 return 0;
>
> +       if (!istate->repo)
> +               istate->repo = the_repository;
> +
> +       /*
> +        * The GIT_TEST_SPARSE_INDEX environment variable triggers the
> +        * extensions.sparseIndex config variable to be on.
> +        */
> +       if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) {
> +               int err = enable_sparse_index(istate->repo);
> +               if (err < 0)
> +                       return err;
> +       }
> +
>         /*
> -        * For now, only create a sparse index with the
> -        * GIT_TEST_SPARSE_INDEX environment variable. We will relax
> -        * this once we have a proper way to opt-in (and later still,
> -        * opt-out).
> +        * Only convert to sparse if extensions.sparseIndex is set.
>          */
> -       if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0))
> +       prepare_repo_settings(istate->repo);
> +       if (!istate->repo->settings.sparse_index)
>                 return 0;
>
>         if (!istate->sparse_checkout_patterns) {
> --
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/25/2021 2:45 AM, Elijah Newren wrote:
> On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Previously, we enabled the sparse index format only using
>> GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to
>> actually select this mode. Further, sparse directory entries are not
>> understood by the index formats as advertised.
>>
>> We _could_ add a new index version that explicitly adds these
>> capabilities, but there are nuances to index formats 2, 3, and 4 that
>> are still valuable to select as options. For now, create a repo
>> extension, "extensions.sparseIndex", that specifies that the tool
>> reading this repository must understand sparse directory entries.
> 
> This commit is unchanged from the RFC series, but given your comments
> in the design document about how you do intend to create an index
> format v5 now, do you want to reference that here?

I'll insert detail about v5.
 
>> +extensions.sparseIndex::
>> +       When combined with `core.sparseCheckout=true` and
>> +       `core.sparseCheckoutCone=true`, the index may contain entries
>> +       corresponding to directories outside of the sparse-checkout
>> +       definition. Versions of Git that do not understand this extension
>> +       do not expect directory entries in the index.
> 
> I had a wording suggestion for this paragraph in the RFC series --
> https://lore.kernel.org/git/CABPp-BFEJE82k4VgkR=Jf7V7sZxZzo2pHMfAGshhi9_vV6iK0w@mail.gmail.com/.
> Let me know if you just decided to leave it out so I don't bug you
> about stuff you already considered.

I'll take your suggestion, thanks.

-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 25, 2021

User Derrick Stolee <stolee@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2021

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 23, 2021 at 3:49 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > Here is the first full patch series submission coming out of the
> > sparse-index RFC [1].
>
> Wahoo!  I'll be reading these over the next few days.

I finally finished the last five patches today, and didn't spot
anything on those to comment on.

Overall, I find the series well constructed, motivated, and explained.
I've left various comments on individual patches, but they're mostly
all minor things that should be easy to cleanup and/or just comment
on.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 27, 2021

This branch is now known as ds/sparse-index-wip.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 27, 2021

This patch series was integrated into seen via git@84da985.

@gitgitgadget gitgitgadget bot added the seen label Feb 27, 2021
@@ -14,6 +14,7 @@
#include "unpack-trees.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Tue, Feb 23, 2021 at 08:14:26PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> We use 'git sparse-checkout init --cone --sparse-index' to toggle the
> sparse-index feature. It makes sense to also disable it when running
> 'git sparse-checkout disable'. This is particularly important because it
> removes the extensions.sparseIndex config option, allowing other tools
> to use this Git repository again.
> 
> This does mean that 'git sparse-checkout init' will not re-enable the
> sparse-index feature, even if it was previously enabled.
> 
> While testing this feature, I noticed that the sparse-index was not
> being written on the first run, but by a second. This was caught by the
> call to 'test-tool read-cache --table'. This requires adjusting some
> assignments to core_apply_sparse_checkout and pl.use_cone_patterns in
> the sparse_checkout_init() logic.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/sparse-checkout.c          | 10 +++++++++-
>  t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index ca63e2c64e95..585343fa1972 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -280,6 +280,9 @@ static int set_config(enum sparse_checkout_mode mode)
>  				      "core.sparseCheckoutCone",
>  				      mode == MODE_CONE_PATTERNS ? "true" : NULL);
>  
> +	if (mode == MODE_NO_PATTERNS)
> +		set_sparse_index_config(the_repository, 0);
> +
>  	return 0;
>  }
>  
> @@ -341,10 +344,11 @@ static int sparse_checkout_init(int argc, const char **argv)
>  		the_repository->index->updated_workdir = 1;
>  	}
>  
> +	core_apply_sparse_checkout = 1;
> +
>  	/* If we already have a sparse-checkout file, use it. */
>  	if (res >= 0) {
>  		free(sparse_filename);
> -		core_apply_sparse_checkout = 1;
>  		return update_working_directory(NULL);
>  	}
>  
> @@ -366,6 +370,7 @@ static int sparse_checkout_init(int argc, const char **argv)
>  	add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0);
>  	strbuf_addstr(&pattern, "!/*/");
>  	add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0);
> +	pl.use_cone_patterns = init_opts.cone_mode;
>  
>  	return write_patterns_and_update(&pl);
>  }
> @@ -632,6 +637,9 @@ static int sparse_checkout_disable(int argc, const char **argv)
>  	strbuf_addstr(&match_all, "/*");
>  	add_pattern(strbuf_detach(&match_all, NULL), empty_base, 0, &pl, 0);
>  
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.sparse_index = 0;
> +
>  	if (update_working_directory(&pl))
>  		die(_("error while refreshing working directory"));
>  
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index fc64e9ed99f4..ff1ad570a255 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -205,6 +205,19 @@ test_expect_success 'sparse-checkout disable' '
>  	check_files repo a deep folder1 folder2
>  '
>  
> +test_expect_success 'sparse-index enabled and disabled' '
> +	git -C repo sparse-checkout init --cone --sparse-index &&
> +	test_cmp_config -C repo true extensions.sparseIndex &&
> +	test-tool -C repo read-cache --table >cache &&
> +	grep " tree " cache &&
> +
> +	git -C repo sparse-checkout disable &&
> +	test-tool -C repo read-cache --table >cache &&
> +	! grep " tree " cache &&
> +	git -C repo config --list >config &&
> +	! grep extensions.sparseindex config
> +'

This test passes with GIT_TEST_SPLIT_INDEX=1 at the moment, because,
unfortunately, GIT_TEST_SPLIT_INDEX has been broken for the past two
years.  However, if I run it with my WIP fixes for that issue [1],
then it will fail:

  +git -C repo sparse-checkout init --cone --sparse-index
  +test_cmp_config -C repo true extensions.sparseIndex
  +test-tool -C repo read-cache --table
  +grep  tree  cache
  error: last command exited with $?=1
  not ok 16 - sparse-index enabled and disabled

https://travis-ci.com/github/szeder/git-cooking-topics-for-travis-ci/jobs/486702444#L2594

[1] Try to run it with:

      https://github.com/szeder/git split-index-fixes

    The code is, I believe, close to final, the commit messages,
    however, are far from being finished.


> +
>  test_expect_success 'cone mode: init and set' '
>  	git -C repo sparse-checkout init --cone &&
>  	git -C repo config --list >config &&
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/27/2021 7:32 AM, SZEDER Gábor wrote:
> On Tue, Feb 23, 2021 at 08:14:26PM +0000, Derrick Stolee via GitGitGadget wrote:
>> +test_expect_success 'sparse-index enabled and disabled' '
>> +	git -C repo sparse-checkout init --cone --sparse-index &&
>> +	test_cmp_config -C repo true extensions.sparseIndex &&
>> +	test-tool -C repo read-cache --table >cache &&
>> +	grep " tree " cache &&
>> +
>> +	git -C repo sparse-checkout disable &&
>> +	test-tool -C repo read-cache --table >cache &&
>> +	! grep " tree " cache &&
>> +	git -C repo config --list >config &&
>> +	! grep extensions.sparseindex config
>> +'
> 
> This test passes with GIT_TEST_SPLIT_INDEX=1 at the moment, because,
> unfortunately, GIT_TEST_SPLIT_INDEX has been broken for the past two
> years.  However, if I run it with my WIP fixes for that issue [1],
> then it will fail:
> 
>   +git -C repo sparse-checkout init --cone --sparse-index
>   +test_cmp_config -C repo true extensions.sparseIndex
>   +test-tool -C repo read-cache --table
>   +grep  tree  cache
>   error: last command exited with $?=1
>   not ok 16 - sparse-index enabled and disabled
> 
> https://travis-ci.com/github/szeder/git-cooking-topics-for-travis-ci/jobs/486702444#L2594
> 
> [1] Try to run it with:
> 
>       https://github.com/szeder/git split-index-fixes
> 
>     The code is, I believe, close to final, the commit messages,
>     however, are far from being finished.

I'll keep that in mind. I should have added a variable
that disables GIT_TEST_SPLIT_INDEX for this test script,
since the sparse-index is (currently) incompatible with
the split-index. I bet that the test is failing because
it isn't actually writing the sparse-directory entry due
to that short-circuit check.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 3/9/2021 3:20 PM, Derrick Stolee wrote:
> On 2/27/2021 7:32 AM, SZEDER Gábor wrote:
>> On Tue, Feb 23, 2021 at 08:14:26PM +0000, Derrick Stolee via GitGitGadget wrote:
>>> +test_expect_success 'sparse-index enabled and disabled' '
>>> +	git -C repo sparse-checkout init --cone --sparse-index &&
>>> +	test_cmp_config -C repo true extensions.sparseIndex &&
>>> +	test-tool -C repo read-cache --table >cache &&
>>> +	grep " tree " cache &&
>>> +
>>> +	git -C repo sparse-checkout disable &&
>>> +	test-tool -C repo read-cache --table >cache &&
>>> +	! grep " tree " cache &&
>>> +	git -C repo config --list >config &&
>>> +	! grep extensions.sparseindex config
>>> +'
>>
>> This test passes with GIT_TEST_SPLIT_INDEX=1 at the moment, because,
>> unfortunately, GIT_TEST_SPLIT_INDEX has been broken for the past two
>> years.  However, if I run it with my WIP fixes for that issue [1],
>> then it will fail:
>>
>>   +git -C repo sparse-checkout init --cone --sparse-index
>>   +test_cmp_config -C repo true extensions.sparseIndex
>>   +test-tool -C repo read-cache --table
>>   +grep  tree  cache
>>   error: last command exited with $?=1
>>   not ok 16 - sparse-index enabled and disabled
>>
>> https://travis-ci.com/github/szeder/git-cooking-topics-for-travis-ci/jobs/486702444#L2594
>>
>> [1] Try to run it with:
>>
>>       https://github.com/szeder/git split-index-fixes
>>
>>     The code is, I believe, close to final, the commit messages,
>>     however, are far from being finished.
> 
> I'll keep that in mind. I should have added a variable
> that disables GIT_TEST_SPLIT_INDEX for this test script,
> since the sparse-index is (currently) incompatible with
> the split-index. I bet that the test is failing because
> it isn't actually writing the sparse-directory entry due
> to that short-circuit check.

The next version will include GIT_TEST_SPLIT_INDEX=0 at
the start and that will make it work with your branch.

Thanks,
-Stolee

The sparse index extension is used to signal that index writes should be
in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1.

Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that
specifies if the sparse index should be used. It also updates the index
to use the correct format, either way. Add a warning in the
documentation that the use of a repository extension might reduce
compatibility with third-party tools. 'git sparse-checkout init' already
sets extension.worktreeConfig, which places most sparse-checkout users
outside of the scope of most third-party tools.

Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of
GIT_TEST_SPARSE_INDEX=1.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
We use 'git sparse-checkout init --cone --sparse-index' to toggle the
sparse-index feature. It makes sense to also disable it when running
'git sparse-checkout disable'. This is particularly important because it
removes the extensions.sparseIndex config option, allowing other tools
to use this Git repository again.

This does mean that 'git sparse-checkout init' will not re-enable the
sparse-index feature, even if it was previously enabled.

While testing this feature, I noticed that the sparse-index was not
being written on the first run, but by a second. This was caught by the
call to 'test-tool read-cache --table'. This requires adjusting some
assignments to core_apply_sparse_checkout and pl.use_cone_patterns in
the sparse_checkout_init() logic.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.

When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The cache_tree_verify() method is run when GIT_TEST_CHECK_CACHE_TREE
is enabled, which it is by default in the test suite. The logic must
be adjusted for the presence of these directory entries.

For now, leave the test as a simple check for whether the directory
entry is sparse. Do not go any further until needed.

This allows us to re-enable GIT_TEST_CHECK_CACHE_TREE in
t1092-sparse-checkout-compatibility.sh. Further,
p2000-sparse-operations.sh uses the test suite and hence this is enabled
for all tests. We need to integrate with it before we run our
performance tests with a sparse-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
p2000-sparse-operations.sh compares different Git commands in
repositories with many files at HEAD but using sparse-checkout to focus
on a small portion of those files.

Add extra copies of the repository that use the sparse-index format so
we can track how that affects the performance of different commands.

At this point in time, the sparse-index is 100% overhead from the CPU
front, and this is measurable in these tests:

Test
---------------------------------------------------------------
2000.2: git status (full-index-v3)              0.59(0.51+0.12)
2000.3: git status (full-index-v4)              0.59(0.52+0.11)
2000.4: git status (sparse-index-v3)            1.40(1.32+0.12)
2000.5: git status (sparse-index-v4)            1.41(1.36+0.08)
2000.6: git add -A (full-index-v3)              2.32(1.97+0.19)
2000.7: git add -A (full-index-v4)              2.17(1.92+0.14)
2000.8: git add -A (sparse-index-v3)            2.31(2.21+0.15)
2000.9: git add -A (sparse-index-v4)            2.30(2.20+0.13)
2000.10: git add . (full-index-v3)              2.39(2.02+0.20)
2000.11: git add . (full-index-v4)              2.20(1.94+0.16)
2000.12: git add . (sparse-index-v3)            2.36(2.27+0.12)
2000.13: git add . (sparse-index-v4)            2.33(2.21+0.16)
2000.14: git commit -a -m A (full-index-v3)     2.47(2.12+0.20)
2000.15: git commit -a -m A (full-index-v4)     2.26(2.00+0.17)
2000.16: git commit -a -m A (sparse-index-v3)   3.01(2.92+0.16)
2000.17: git commit -a -m A (sparse-index-v4)   3.01(2.94+0.15)

Note that there is very little difference between the v3 and v4 index
formats when the sparse-index is enabled. This is primarily due to the
fact that the relative file sizes are the same, and the command time is
mostly taken up by parsing tree objects to expand the sparse index into
a full one.

With the current file layout, the index file sizes are given by this
table:

       |  full index | sparse index |
       +-------------+--------------+
    v3 |     108 MiB |      1.6 MiB |
    v4 |      80 MiB |      1.2 MiB |

Future updates will improve the performance of Git commands when the
index is sparse.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@gitgitgadget
Copy link

gitgitgadget bot commented Mar 30, 2021

This patch series was integrated into seen via git@5c7be45.

@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 30, 2021

Submitted as pull.883.v5.git.1617109864.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v5

To fetch this version to local tag pr-883/derrickstolee/sparse-index/format-v5:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-883/derrickstolee/sparse-index/format-v5

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 30, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>      @@ repo-settings.c: void prepare_repo_settings(struct repository *r)
>       +	 * Initialize this as off.
>       +	 */
>       +	r->settings.sparse_index = 0;
>      -+	if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value)
>      ++	if (!repo_config_get_bool(r, "index.sparse", &value) && value)
>       +		r->settings.sparse_index = 1;
>        }

It would be helpful to have a way for the repository owner to say
"Even if the version of Git may be capable of handling 'sdir'
extension, and my checkout uses sparse-cone settings, I do not want
to use it", and the other way around, i.e. "Even if my checkout
currently does not use sparse-cone settings, do use 'sdir'
extension".  But for that, .sparse_index member may need to be
tristate (i.e. forbidden, enable-if-needed, use-even-unneeded)?

We have a similar setting in index.version; I believe we always
auto-demote 3 down to 2 when extended flags are not used, and
I think "always auto-demote" would be sufficient (iow,
"use-even-unneeded" may not be necessary, even though that might
help debugging).

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 30, 2021

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 3/30/2021 4:11 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>      @@ repo-settings.c: void prepare_repo_settings(struct repository *r)
>>       +	 * Initialize this as off.
>>       +	 */
>>       +	r->settings.sparse_index = 0;
>>      -+	if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value)
>>      ++	if (!repo_config_get_bool(r, "index.sparse", &value) && value)
>>       +		r->settings.sparse_index = 1;
>>        }
> 
> It would be helpful to have a way for the repository owner to say
> "Even if the version of Git may be capable of handling 'sdir'
> extension, and my checkout uses sparse-cone settings, I do not want
> to use it", and the other way around, i.e. "Even if my checkout
> currently does not use sparse-cone settings, do use 'sdir'
> extension".  But for that, .sparse_index member may need to be
> tristate (i.e. forbidden, enable-if-needed, use-even-unneeded)?

I believe as presented, index.sparse=false will prevent the sdir
extension from being used. If index.sparse=true, then it will only
be used if sparse-checkout is enabled in cone mode.

I don't see the value in using the 'sdir' extension when not using
sparse-checkout in cone mode (and hence there are no sparse directory
entries in the index). What am I missing?

> We have a similar setting in index.version; I believe we always
> auto-demote 3 down to 2 when extended flags are not used, and
> I think "always auto-demote" would be sufficient (iow,
> "use-even-unneeded" may not be necessary, even though that might
> help debugging).

Yes, the same is happening here: we auto-demote to not use 'sdir'
if it the other settings are not configured as well.

There is the rare scenario where these things all occur:

1. index.sparse = true
2. core.sparseCheckout = true
3. core.sparseCheckoutCone = true
4. Every path in the index matches the cone-mode patterns.

In this case, convert_to_sparse() is called and the istate->sparse
bit is set, telling do_write_index() to add the 'sdir' extension.

This seems like a rare occurrence. Is it still worth adding logic
to avoid 'sdir' when these are all true?

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 30, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes:

>> We have a similar setting in index.version; I believe we always
>> auto-demote 3 down to 2 when extended flags are not used, and
>> I think "always auto-demote" would be sufficient (iow,
>> "use-even-unneeded" may not be necessary, even though that might
>> help debugging).
>
> Yes, the same is happening here: we auto-demote to not use 'sdir'
> if it the other settings are not configured as well.
>
> There is the rare scenario where these things all occur:
> ...
> This seems like a rare occurrence. Is it still worth adding logic
> to avoid 'sdir' when these are all true?

You'd be the primary one who will be debugging the system while and
after this goes through the stabilization effort, so whichever you
find is more convenient is good enough for us, I guess.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 30, 2021

This patch series was integrated into seen via git@087b27f.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 31, 2021

There was a status update about the branch ds/sparse-index on the Git mailing list:

Both in-core and on-disk index has been updated to optionally omit
individual entries and replace them with the tree object that
corresponds to the directory that contains them when the "cone"
mode of sparse checkout is in use.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 31, 2021

This patch series was integrated into seen via git@e925d45.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 1, 2021

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Mar 30, 2021 at 6:11 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Here is the first full patch series submission coming out of the
> sparse-index RFC [1].
>
> [1]
> https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/
>
> I won't waste too much space here, because PATCH 1 includes a sizeable
> design document that describes the feature, the reasoning behind it, and my
> plan for getting this implemented widely throughout the codebase.
>
> There are some new things here that were not in the RFC:
>
>  * Design doc and format updates. (Patch 1)
>  * Performance test script. (Patches 2 and 20)
>
> Notably missing in this series from the RFC:
>
>  * The mega-patch inserting ensure_full_index() throughout the codebase.
>    That will be a follow-up series to this one.
>  * The integrations with git status and git add to demonstrate the improved
>    performance. Those will also appear in their own series later.
>
> I plan to keep my latest work in this area in my 'sparse-index/wip' branch
> [2]. It includes all of the work from the RFC right now, updated with the
> work from this series.
>
> [2] https://github.com/derrickstolee/git/tree/sparse-index/wip
>
>
> Updates in V5
> =============
>
> This version is updated to use an index extension instead of a repository
> format extension. Thanks, Szeder! This one change affects the range-diff
> quite a bit, so please review those changes carefully.
>
> In particular: git sparse-checkout init --cone --sparse-index now sets a new
> index.sparse config option as an indicator that we should attempt writing
> the index in sparse form.
>
>
> Updates in V4
> =============
>
>  * Rebased onto the latest copy of ab/read-tree.
>  * Updated the design document as per Junio's comments.
>  * Updated the submodule handling in the performance test.
>  * Followed up on some other review from Ævar, mostly style or commit
>    message things.
>
>
> Updates in V3
> =============
>
> For this version, I took Ævar's latest patches and applied them to v2.31.0
> and rebased this series on top. It uses his new "read_tree_at()" helper and
> the associated changes to the function pointer type.
>
>  * Fixed more typos. Thanks Martin and Elijah!
>  * Updated the test_sparse_match() macro to use "$@" instead of $*
>  * Added a test that git sparse-checkout init --no-sparse-index rewrites the
>    index to be full.
>
>
> Updates in V2
> =============
>
>  * Various typos and awkward grammar is fixed.
>  * Cleaned up unnecessary commands in p2000-sparse-operations.sh
>  * Added a comment to the sparse_index member of struct index_state.
>  * Used tree_type, commit_type, and blob_type in test-read-cache.c.
>
> Thanks, -Stolee
>
> Derrick Stolee (21):
>   sparse-index: design doc and format update
>   t/perf: add performance test for sparse operations
>   t1092: clean up script quoting
>   sparse-index: add guard to ensure full index
>   sparse-index: implement ensure_full_index()
>   t1092: compare sparse-checkout to sparse-index
>   test-read-cache: print cache entries with --table
>   test-tool: don't force full index
>   unpack-trees: ensure full index
>   sparse-checkout: hold pattern list in index
>   sparse-index: add 'sdir' index extension
>   sparse-index: convert from full to sparse
>   submodule: sparse-index should not collapse links
>   unpack-trees: allow sparse directories
>   sparse-index: check index conversion happens
>   sparse-index: add index.sparse config option
>   sparse-checkout: toggle sparse index from builtin
>   sparse-checkout: disable sparse-index
>   cache-tree: integrate with sparse directory entries
>   sparse-index: loose integration with cache_tree_verify()
>   p2000: add sparse-index repos
>
>  Documentation/config/index.txt           |   5 +
>  Documentation/git-sparse-checkout.txt    |  14 ++
>  Documentation/technical/index-format.txt |  19 ++
>  Documentation/technical/sparse-index.txt | 175 ++++++++++++++
>  Makefile                                 |   1 +
>  builtin/sparse-checkout.c                |  44 +++-
>  cache-tree.c                             |  40 ++++
>  cache.h                                  |  18 +-
>  read-cache.c                             |  44 +++-
>  repo-settings.c                          |  15 ++
>  repository.c                             |  11 +-
>  repository.h                             |   3 +
>  sparse-index.c                           | 285 +++++++++++++++++++++++
>  sparse-index.h                           |  11 +
>  t/README                                 |   3 +
>  t/helper/test-read-cache.c               |  66 +++++-
>  t/perf/p2000-sparse-operations.sh        | 101 ++++++++
>  t/t1091-sparse-checkout-builtin.sh       |  13 ++
>  t/t1092-sparse-checkout-compatibility.sh | 143 ++++++++++--
>  unpack-trees.c                           |  17 +-
>  20 files changed, 988 insertions(+), 40 deletions(-)
>  create mode 100644 Documentation/technical/sparse-index.txt
>  create mode 100644 sparse-index.c
>  create mode 100644 sparse-index.h
>  create mode 100755 t/perf/p2000-sparse-operations.sh
>
>
> base-commit: 47957485b3b731a7860e0554d2bd12c0dce1c75a
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v5
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v5
> Pull-Request: https://github.com/gitgitgadget/git/pull/883
>
> Range-diff vs v4:
>
>   1:  6426a5c60e53 !  1:  7b600d536c6e sparse-index: design doc and format update
>      @@ Documentation/technical/sparse-index.txt (new)
>       +The only noticeable change in behavior will be that the serialized index
>       +file contains sparse-directory entries.
>       +
>      -+To start, we use a new repository extension, `extensions.sparseIndex`, to
>      -+allow inserting sparse-directory entries into indexes with file format
>      ++To start, we use a new required index extension, `sdir`, to allow
>      ++inserting sparse-directory entries into indexes with file format
>       +versions 2, 3, and 4. This prevents Git versions that do not understand
>      -+the sparse-index from operating on one, but it also prevents other
>      -+operations that do not use the index at all. A new format, index v5, will
>      -+be introduced that includes sparse-directory entries by default. It might
>      -+also introduce other features that have been considered for improving the
>      ++the sparse-index from operating on one, while allowing tools that do not
>      ++understand the sparse-index to operate on repositories as long as they do
>      ++not interact with the index. A new format, index v5, will be introduced
>      ++that includes sparse-directory entries by default. It might also
>      ++introduce other features that have been considered for improving the
>       +index, as well.
>       +
>       +Next, consumers of the index will be guarded against operating on a
>   2:  7eabc1d0586c =  2:  202253ec82f3 t/perf: add performance test for sparse operations
>   3:  c9e21d78ecba =  3:  437a0f144e57 t1092: clean up script quoting
>   4:  03cdde756563 =  4:  b7e1bf5c55a7 sparse-index: add guard to ensure full index
>   5:  6b3b6d86385d =  5:  e41d55d2cca9 sparse-index: implement ensure_full_index()
>   6:  7f67adba0498 =  6:  7bfbfbd17321 t1092: compare sparse-checkout to sparse-index
>   7:  7ebd9570b1ad =  7:  a1b8135c0fc8 test-read-cache: print cache entries with --table
>   8:  db7bbd06dbcc =  8:  dd84a2a9121b test-tool: don't force full index
>   9:  3ddd5e794b5e =  9:  b276d2ed5323 unpack-trees: ensure full index
>  10:  7308c87697f1 = 10:  c3651e26dc3a sparse-checkout: hold pattern list in index
>   -:  ------------ > 11:  f926cf8b2e01 sparse-index: add 'sdir' index extension
>  11:  7c10d653ca6b = 12:  c870ae5e8749 sparse-index: convert from full to sparse
>  12:  6db36f33e960 = 13:  bcf0da959ef3 submodule: sparse-index should not collapse links
>  13:  d24bd3348d98 = 14:  7191b48237de unpack-trees: allow sparse directories
>  14:  08d9f5f3c0d1 = 15:  57be9b4a728b sparse-index: check index conversion happens
>  15:  6f38cef196b0 ! 16:  c22b4111e49e sparse-index: create extension for compatibility
>      @@ Metadata
>       Author: Derrick Stolee <dstolee@microsoft.com>
>
>        ## Commit message ##
>      -    sparse-index: create extension for compatibility
>      +    sparse-index: add index.sparse config option
>
>      -    Previously, we enabled the sparse index format only using
>      -    GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to
>      -    actually select this mode. Further, sparse directory entries are not
>      -    understood by the index formats as advertised.
>      -
>      -    We _could_ add a new index version that explicitly adds these
>      -    capabilities, but there are nuances to index formats 2, 3, and 4 that
>      -    are still valuable to select as options. Until we add index format
>      -    version 5, create a repo extension, "extensions.sparseIndex", that
>      -    specifies that the tool reading this repository must understand sparse
>      -    directory entries.
>      -
>      -    This change only encodes the extension and enables it when
>      -    GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI
>      -    mechanism.
>      +    When enabled, this config option signals that index writes should
>      +    attempt to use sparse-directory entries.
>
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>      - ## Documentation/config/extensions.txt ##
>      -@@ Documentation/config/extensions.txt: extensions.objectFormat::
>      - Note that this setting should only be set by linkgit:git-init[1] or
>      - linkgit:git-clone[1].  Trying to change it after initialization will not
>      - work and will produce hard-to-diagnose issues.
>      + ## Documentation/config/index.txt ##
>      +@@ Documentation/config/index.txt: index.recordOffsetTable::
>      +  Defaults to 'true' if index.threads has been explicitly enabled,
>      +  'false' otherwise.
>      +
>      ++index.sparse::
>      ++ When enabled, write the index using sparse-directory entries. This
>      ++ has no effect unless `core.sparseCheckout` and
>      ++ `core.sparseCheckoutCone` are both enabled. Defaults to 'false'.
>       +
>      -+extensions.sparseIndex::
>      -+ When combined with `core.sparseCheckout=true` and
>      -+ `core.sparseCheckoutCone=true`, the index may contain entries
>      -+ corresponding to directories outside of the sparse-checkout
>      -+ definition in lieu of containing each path under such directories.
>      -+ Versions of Git that do not understand this extension do not
>      -+ expect directory entries in the index.
>      + index.threads::
>      +  Specifies the number of threads to spawn when loading the index.
>      +  This is meant to reduce index load time on multiprocessor machines.
>
>        ## cache.h ##
>       @@ cache.h: struct repository_format {
>      @@ repo-settings.c: void prepare_repo_settings(struct repository *r)
>       +  * Initialize this as off.
>       +  */
>       + r->settings.sparse_index = 0;
>      -+ if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value)
>      ++ if (!repo_config_get_bool(r, "index.sparse", &value) && value)
>       +         r->settings.sparse_index = 1;
>        }
>
>      @@ repository.h: struct repo_settings {
>
>        struct repository {
>
>      - ## setup.c ##
>      -@@ setup.c: static enum extension_result handle_extension(const char *var,
>      -                  return error("invalid value for 'extensions.objectformat'");
>      -          data->hash_algo = format;
>      -          return EXTENSION_OK;
>      -+ } else if (!strcmp(ext, "sparseindex")) {
>      -+         data->sparse_index = 1;
>      -+         return EXTENSION_OK;
>      -  }
>      -  return EXTENSION_UNKNOWN;
>      - }
>      -
>        ## sparse-index.c ##
>       @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate,
>         return num_converted - start_converted;
>      @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate,
>       +{
>       + const char *config_path = repo_git_path(repo, "config.worktree");
>       +
>      -+ if (upgrade_repository_format(1) < 0) {
>      -+         warning(_("unable to upgrade repository format to enable sparse-index"));
>      -+         return -1;
>      -+ }
>       + git_config_set_in_file_gently(config_path,
>      -+                               "extensions.sparseIndex",
>      ++                               "index.sparse",
>       +                               "true");
>       +
>       + prepare_repo_settings(repo);
>      @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate,
>       +
>       + /*
>       +  * The GIT_TEST_SPARSE_INDEX environment variable triggers the
>      -+  * extensions.sparseIndex config variable to be on.
>      ++  * index.sparse config variable to be on.
>       +  */
>       + if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) {
>       +         int err = enable_sparse_index(istate->repo);
>      @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate,
>       -  * GIT_TEST_SPARSE_INDEX environment variable. We will relax
>       -  * this once we have a proper way to opt-in (and later still,
>       -  * opt-out).
>      -+  * Only convert to sparse if extensions.sparseIndex is set.
>      ++  * Only convert to sparse if index.sparse is set.
>          */
>       - if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0))
>       + prepare_repo_settings(istate->repo);
>  16:  923081e7e079 ! 17:  75fe9b0f57da sparse-checkout: toggle sparse index from builtin
>      @@ Documentation/git-sparse-checkout.txt: To avoid interfering with other worktrees
>       +that is not completely understood by external tools. If you have trouble
>       +with this compatibility, then run `git sparse-checkout init --no-sparse-index`
>       +to rewrite your index to not be sparse. Older versions of Git will not
>      -+understand the `sparseIndex` repository extension and may fail to interact
>      -+with your repository until it is disabled.
>      ++understand the sparse directory entries index extension and may fail to
>      ++interact with your repository until it is disabled.
>
>        'set'::
>         Write a set of patterns to the sparse-checkout file, as given as
>      @@ builtin/sparse-checkout.c: static int sparse_checkout_init(int argc, const char
>
>        ## sparse-index.c ##
>       @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate,
>      +  return num_converted - start_converted;
>      + }
>
>      - static int enable_sparse_index(struct repository *repo)
>      +-static int enable_sparse_index(struct repository *repo)
>      ++static int set_index_sparse_config(struct repository *repo, int enable)
>        {
>       - const char *config_path = repo_git_path(repo, "config.worktree");
>      -+ int res;
>      -
>      -  if (upgrade_repository_format(1) < 0) {
>      -          warning(_("unable to upgrade repository format to enable sparse-index"));
>      -          return -1;
>      -  }
>      +-
>       - git_config_set_in_file_gently(config_path,
>      --                               "extensions.sparseIndex",
>      +-                               "index.sparse",
>       -                               "true");
>      -+ res = git_config_set_gently("extensions.sparseindex", "true");
>      ++ int res;
>      ++ char *config_path = repo_git_path(repo, "config.worktree");
>      ++ res = git_config_set_in_file_gently(config_path,
>      ++                                     "index.sparse",
>      ++                                     enable ? "true" : NULL);
>      ++ free(config_path);
>
>         prepare_repo_settings(repo);
>         repo->settings.sparse_index = 1;
>      @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate,
>       +
>       +int set_sparse_index_config(struct repository *repo, int enable)
>       +{
>      -+ int res;
>      -+
>      -+ if (enable)
>      -+         return enable_sparse_index(repo);
>      -+
>      -+ /* Don't downgrade repository format, just remove the extension. */
>      -+ res = git_config_set_gently("extensions.sparseindex", NULL);
>      ++ int res = set_index_sparse_config(repo, enable);
>       +
>       + prepare_repo_settings(repo);
>      -+ repo->settings.sparse_index = 0;
>      ++ repo->settings.sparse_index = enable;
>       + return res;
>        }
>
>      @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate,
>             !core_apply_sparse_checkout || !core_sparse_checkout_cone)
>                 return 0;
>       @@ sparse-index.c: int convert_to_sparse(struct index_state *istate)
>      -          istate->repo = the_repository;
>      -
>      -  /*
>      --  * The GIT_TEST_SPARSE_INDEX environment variable triggers the
>      --  * extensions.sparseIndex config variable to be on.
>      -+  * If GIT_TEST_SPARSE_INDEX=1, then trigger extensions.sparseIndex
>      -+  * to be fully enabled. If GIT_TEST_SPARSE_INDEX=0 (set explicitly),
>      -+  * then purposefully disable the setting.
>      +   * The GIT_TEST_SPARSE_INDEX environment variable triggers the
>      +   * index.sparse config variable to be on.
>          */
>       - if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) {
>       -         int err = enable_sparse_index(istate->repo);
>      @@ sparse-index.c: int convert_to_sparse(struct index_state *istate)
>       +         set_sparse_index_config(istate->repo, test_env);
>
>         /*
>      -   * Only convert to sparse if extensions.sparseIndex is set.
>      +   * Only convert to sparse if index.sparse is set.
>
>        ## sparse-index.h ##
>       @@ sparse-index.h: struct index_state;
>      @@ t/t1092-sparse-checkout-compatibility.sh: init_repos () {
>       - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone &&
>       - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep
>       + git -C sparse-index sparse-checkout init --cone --sparse-index &&
>      -+ test_cmp_config -C sparse-index true extensions.sparseindex &&
>      ++ test_cmp_config -C sparse-index true index.sparse &&
>       + git -C sparse-index sparse-checkout set deep
>        }
>
>  17:  6f1ad72c390d ! 18:  7f55a232e647 sparse-checkout: disable sparse-index
>      @@ t/t1091-sparse-checkout-builtin.sh: test_expect_success 'sparse-checkout disable
>
>       +test_expect_success 'sparse-index enabled and disabled' '
>       + git -C repo sparse-checkout init --cone --sparse-index &&
>      -+ test_cmp_config -C repo true extensions.sparseIndex &&
>      ++ test_cmp_config -C repo true index.sparse &&
>       + test-tool -C repo read-cache --table >cache &&
>       + grep " tree " cache &&
>       +
>      @@ t/t1091-sparse-checkout-builtin.sh: test_expect_success 'sparse-checkout disable
>       + test-tool -C repo read-cache --table >cache &&
>       + ! grep " tree " cache &&
>       + git -C repo config --list >config &&
>      -+ ! grep extensions.sparseindex config
>      ++ ! grep index.sparse config
>       +'
>       +
>        test_expect_success 'cone mode: init and set' '
>  18:  bd94e6b7d089 = 19:  365901809d9d cache-tree: integrate with sparse directory entries
>  19:  e7190376b806 = 20:  9b068c458898 sparse-index: loose integration with cache_tree_verify()
>  20:  bcf0a58eb38c = 21:  66602733cc95 p2000: add sparse-index repos

I've read through the range-diff and individually read through the new
patch 11.  Perhaps unsurprisingly since you addressed all my feedback
by about round 3, I didn't find any problems with this new version.
Looks good to me.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 1, 2021

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Mar 30, 2021 at 2:31 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/30/2021 4:11 PM, Junio C Hamano wrote:
> > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >>      @@ repo-settings.c: void prepare_repo_settings(struct repository *r)
> >>       +       * Initialize this as off.
> >>       +       */
> >>       +      r->settings.sparse_index = 0;
> >>      -+      if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value)
> >>      ++      if (!repo_config_get_bool(r, "index.sparse", &value) && value)
> >>       +              r->settings.sparse_index = 1;
> >>        }
> >
> > It would be helpful to have a way for the repository owner to say
> > "Even if the version of Git may be capable of handling 'sdir'
> > extension, and my checkout uses sparse-cone settings, I do not want
> > to use it", and the other way around, i.e. "Even if my checkout
> > currently does not use sparse-cone settings, do use 'sdir'
> > extension".  But for that, .sparse_index member may need to be
> > tristate (i.e. forbidden, enable-if-needed, use-even-unneeded)?
>
> I believe as presented, index.sparse=false will prevent the sdir
> extension from being used. If index.sparse=true, then it will only
> be used if sparse-checkout is enabled in cone mode.
>
> I don't see the value in using the 'sdir' extension when not using
> sparse-checkout in cone mode (and hence there are no sparse directory
> entries in the index). What am I missing?
>
> > We have a similar setting in index.version; I believe we always
> > auto-demote 3 down to 2 when extended flags are not used, and
> > I think "always auto-demote" would be sufficient (iow,
> > "use-even-unneeded" may not be necessary, even though that might
> > help debugging).
>
> Yes, the same is happening here: we auto-demote to not use 'sdir'
> if it the other settings are not configured as well.
>
> There is the rare scenario where these things all occur:
>
> 1. index.sparse = true
> 2. core.sparseCheckout = true
> 3. core.sparseCheckoutCone = true
> 4. Every path in the index matches the cone-mode patterns.
>
> In this case, convert_to_sparse() is called and the istate->sparse
> bit is set, telling do_write_index() to add the 'sdir' extension.
>
> This seems like a rare occurrence. Is it still worth adding logic
> to avoid 'sdir' when these are all true?

I'd agree that this would be very rare; probably indicative of someone
either having a bug in their sparsity patterns or making a simplistic
testcase to see how things operate.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 2, 2021

This patch series was integrated into seen via git@af532ec.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 2, 2021

This patch series was integrated into seen via git@5dfbdfe.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 4, 2021

This patch series was integrated into seen via git@deb5ab4.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 6, 2021

This patch series was integrated into seen via git@e85d777.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 6, 2021

This patch series was integrated into seen via git@aad4fc8.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 6, 2021

This patch series was integrated into seen via git@b7392cc.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 6, 2021

There was a status update about the branch ds/sparse-index on the Git mailing list:

Both in-core and on-disk index has been updated to optionally omit
individual entries and replace them with the tree object that
corresponds to the directory that contains them when the "cone"
mode of sparse checkout is in use.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 7, 2021

This patch series was integrated into seen via git@72b7234.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 8, 2021

This patch series was integrated into seen via git@22ed3a9.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 8, 2021

This patch series was integrated into next via git@f1290a7.

@gitgitgadget gitgitgadget bot added the next label Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant