Skip to content

pr-1009/derrickstolee/sparse-index/ignored-files-v5

We launched an experimental release [1] of the sparse-index feature to our
internal users. We immediately discovered a problem due to the isolated way
in which we tested the sparse index: we were never building the project and
changing our sparse-checkout definition.

[1] https://github.com/microsoft/git/releases/tag/v2.32.0.vfs.0.102.exp

Users who ran a build in one project and then moved to another still had
build artifacts in their worktree that lived inside the old directories.
Since the files are marked by the .gitignore patterns, these files were not
removed by the 'git sparse-checkout set' command. However, they make the
sparse-index unusable because every 'git status' command needs to expand the
sparse-directory entries in order to see if the files are tracked or not.
This made the first experimental release actually slower for all users
because of this cost.

The solution we shipped to these customers was to change the way our fork
handles these ignored files. Specifically, instead of Git completely
ignoring the files, we changed Git to understand that with cone-mode
sparse-checkout patterns, the users is asking for entire directories to be
removed from the worktree. The link [1] included earlier has this change.

I believe that this is a reasonable expectation, though I recognize that it
might look like breaking the expectations of how .gitignore files work.

Since feedback demonstrated that this is a desired behavior, v2 includes
this behavior for all "cone mode" repositories.

I'm interested in the community's thoughts about this change, as it seems
like one that we should make carefully and intentionally.

While the rewrite of the t7519 test seems unrelated, it is required to avoid
a test failure with this change that deletes files outside of the cone. By
moving the test into repositories not at $TRASH_DIRECTORY, we gain more
control over the repository structure.

Updates in V5
=============

 * Updated the locality of a cache_entry pointer.

 * Rephrased a comment.

 * Removed the patch adding a config option.

 * I tried, but failed, to create a scenario where the call to
   cache_tree_update() causes a test to fail. I still think this is valuable
   as defensive programming for the reasons mentioned in the patch, which is
   why I didn't remove them here.

Updates in V4
=============

 * Fixed an issue with the split index.

 * The helper methods are used more consistently.

 * The helper method path_in_cone_mode_sparse_checkout() is introduced.

 * Commit messages are edited for clarity.

 * A new config option is added to disable the behavior being added in this
   series.

 * I split the commit that involves cache_tree_update(). I have not yet
   succeeded in creating tests to demonstrate why this is required outside
   of needing it in the Scalar functional tests, which includes a version of
   partial clone. I will continue to investigate recreating this scenario in
   the Git test suite, but I wanted to send this version out to get feedback
   on the things that have changed.

Update in V3
============

 * As promised [2], the helper methods are fixed to work with non-cone-mode
   patterns. A later series will use them to their fullest potential
   (changing git add, git rm, and git mv when interacting with sparse
   entries).

[2]
https://lore.kernel.org/git/bac76c72-955d-1ade-4ecf-778ffc45f297@gmail.com/

Updates in V2
=============

 * This version correctly leaves untracked files alone. If untracked files
   are found, then the directory is left as-is, in case those ignored files
   are important to the user's work resolving those untracked files.

 * This behavior is now enabled by core.sparseCheckoutCone=true.

 * To use a sparse index as an in-memory data structure even when
   index.sparse is disabled, a new patch is included to modify the prototype
   of convert_to_sparse() to include a flags parameter.

 * A few cleanup patches that I was collecting based on feedback from the
   experimental release and intending for my next series were necessary for
   this implementation.

 * Cleaned up the tests (no NEEDSWORK) and the remainders of a previous
   implementation that used run_subcommand().

Thanks, -Stolee

Derrick Stolee (9):
  t7519: rewrite sparse index test
  sparse-index: silently return when not using cone-mode patterns
  unpack-trees: fix nested sparse-dir search
  sparse-index: silently return when cache tree fails
  sparse-index: use WRITE_TREE_MISSING_OK
  sparse-checkout: create helper methods
  attr: be careful about sparse directories
  sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag
  sparse-checkout: clear tracked sparse dirs

 Documentation/git-sparse-checkout.txt | 10 +++
 attr.c                                | 15 +++++
 builtin/add.c                         |  7 +-
 builtin/sparse-checkout.c             | 94 +++++++++++++++++++++++++++
 dir.c                                 | 52 +++++++++++++++
 dir.h                                 |  8 +++
 read-cache.c                          |  4 +-
 sparse-index.c                        | 76 ++++++++++++----------
 sparse-index.h                        |  3 +-
 t/t1091-sparse-checkout-builtin.sh    | 59 +++++++++++++++++
 t/t7519-status-fsmonitor.sh           | 38 ++++++-----
 unpack-trees.c                        |  8 ++-
 12 files changed, 312 insertions(+), 62 deletions(-)

base-commit: 80b8d6c56b8a5f5db1d5c2a0159fd808e8a7fc4f

Submitted-As: https://lore.kernel.org/git/pull.1009.v5.git.1631065353.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1009.git.1627579637.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1009.v2.git.1628625013.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1009.v3.git.1629206602.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1009.v4.git.1629841904.gitgitgadget@gmail.com
Assets 2