Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
base: 670b81a890388c60b7032a4f5b879f2ece8c4558
Choose a base ref
...
head repository: git/git
compare: e5ca291076a8a936283bb2c57433c4393d3f80c2
Choose a head ref
  • 16 commits
  • 10 files changed
  • 1 contributor

Commits on Jul 14, 2021

  1. sparse-index: skip indexes with unmerged entries

    The sparse-index format is designed to be compatible with merge
    conflicts, even those outside the sparse-checkout definition. The reason
    is that when converting a full index to a sparse one, a cache entry with
    nonzero stage will not be collapsed into a sparse directory entry.
    
    However, this behavior was not tested, and a different behavior within
    convert_to_sparse() fails in this scenario. Specifically,
    cache_tree_update() will fail when unmerged entries exist.
    convert_to_sparse_rec() uses the cache-tree data to recursively walk the
    tree structure, but also to compute the OIDs used in the
    sparse-directory entries.
    
    Add an index scan to convert_to_sparse() that will detect if these merge
    conflict entries exist and skip the conversion before trying to update
    the cache-tree. This is marked as NEEDSWORK because this can be removed
    with a suitable update to cache_tree_update() or a similar method that
    can construct a cache-tree with invalid nodes, but still allow creating
    the nodes necessary for creating sparse directory entries.
    
    It is possible that in the future we will not need to make such an
    update, since if we do not expand a sparse-index into a full one, this
    conversion does not need to happen. Thus, this can be deferred until the
    merge machinery is made to integrate with the sparse-index.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    fc6609d View commit details
    Browse the repository at this point in the history
  2. sparse-index: include EXTENDED flag when expanding

    When creating a full index from a sparse one, we create cache entries
    for every blob within a given sparse directory entry. These are
    correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
    flag is not included. The CE_EXTENDED flag would exist if we loaded a
    full index from disk with these entries marked with CE_SKIP_WORKTREE, so
    we can add the flag here to be consistent. This allows us to directly
    compare the flags present in cache entries when testing the sparse-index
    feature, but has no significance to its correctness in the user-facing
    functionality.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    4741077 View commit details
    Browse the repository at this point in the history
  3. t1092: replace incorrect 'echo' with 'cat'

    This fixes the test data shape to be as expected, allowing rename
    detection to work properly now that the 'larger-content' file actually
    has meaningful lines.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    3d814b5 View commit details
    Browse the repository at this point in the history
  4. t1092: expand repository data shape

    As more features integrate with the sparse-index feature, more and more
    special cases arise that require different data shapes within the tree
    structure of the repository in order to demonstrate those cases.
    
    Add several interesting special cases all at once instead of sprinkling
    them across several commits. The interesting cases being added here are:
    
    * Add sparse-directory entries on both sides of directories within the
      sparse-checkout definition.
    
    * Add directories outside the sparse-checkout definition who have only
      one entry and are the first entry of a directory with multiple
      entries.
    
    * Add filenames adjacent to a sparse directory entry that sort before
      and after the trailing slash.
    
    Later tests will take advantage of these shapes, but they also deepen
    the tests that already exist.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    e669ffb View commit details
    Browse the repository at this point in the history
  5. t1092: add tests for status/add and sparse files

    Before moving to update 'git status' and 'git add' to work with sparse
    indexes, add an explicit test that ensures the sparse-index works the
    same as a normal sparse-checkout when the worktree contains directories
    and files outside of the sparse cone.
    
    Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
    not in the sparse cone. When 'folder1/a' is modified, the file is not
    shown as modified and adding it will fail. This is new behavior as of
    a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
    2021-04-08). Before that change, these adds would be silently ignored.
    
    Untracked files are fine: adding new files both with 'git add .' and
    'git add folder1/' works just as in a full checkout. This may not be
    entirely desirable, but we are not intending to change behavior at the
    moment, only document it. A future change could alter the behavior to
    be more sensible, and this test could be modified to satisfy the new
    expected behavior.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    bf26c06 View commit details
    Browse the repository at this point in the history
  6. unpack-trees: preserve cache_bottom

    The cache_bottom member of 'struct unpack_trees_options' is used to
    track the range of index entries corresponding to a node of the cache
    tree. While recursing with traverse_by_cache_tree(), this value is
    preserved on the call stack using a local and then restored as that
    method returns.
    
    The mark_ce_used() method normally modifies the cache_bottom member when
    it refers to the marked cache entry. However, sparse directory entries
    are stored as nodes in the cache-tree data structure as of 2de37c5
    (cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
    the cache_bottom will be modified as the cache-tree walk advances. Do
    not update it as well within mark_ce_used().
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    17a1bb5 View commit details
    Browse the repository at this point in the history
  7. unpack-trees: compare sparse directories correctly

    As we further integrate the sparse-index into unpack-trees, we need to
    ensure that we compare sparse directory entries correctly with other
    entries. This affects searching for an exact path as well as sorting
    index entries.
    
    Sparse directory entries contain the trailing directory separator. This
    is important for the sorting, in particular. Thus, within
    do_compare_entry() we stop using S_IFREG in all cases, since sparse
    directories should use S_IFDIR to indicate that the comparison should
    treat the entry name as a dirctory.
    
    Within compare_entry(), it first calls do_compare_entry() to check the
    leading portion of the name. When the input path is a directory name, we
    could match exactly already. Thus, we should return 0 if we have an
    exact string match on a sparse directory entry. The final check is a
    length comparison between the strings.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    cd807a5 View commit details
    Browse the repository at this point in the history
  8. unpack-trees: rename unpack_nondirectories()

    In the next change, we will use this method to unpack a sparse directory
    entry, so change the name to unpack_single_entry() so these entries
    apply. The new name reflects that we will not recurse into trees in
    order to resolve the conflicts.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    bd6a3fd View commit details
    Browse the repository at this point in the history
  9. unpack-trees: unpack sparse directory entries

    During unpack_callback(), index entries are compared against tree
    entries. These are matched according to names and types. One goal is to
    decide if we should recurse into subtrees or simply operate on one index
    entry.
    
    In the case of a sparse-directory entry, we do not want to recurse into
    that subtree and instead simply compare the trees. In some cases, we
    might want to perform a merge operation on the entry, such as during
    'git checkout <commit>' which wants to replace a sparse tree entry with
    the tree for that path at the target commit. We extend the logic within
    unpack_single_entry() to create a sparse-directory entry in this case,
    and then that is sent to call_unpack_fn().
    
    There are some subtleties in this process. For instance, we need to
    update find_cache_entry() to allow finding a sparse-directory entry that
    exactly matches a given path. Use the new helper method
    sparse_dir_matches_path() for this. We also need to ignore conflict
    markers in the case that the entries correspond to directories and we
    already have a sparse directory entry.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    523506d View commit details
    Browse the repository at this point in the history
  10. dir.c: accept a directory as part of cone-mode patterns

    When we have sparse directory entries in the index, we want to compare
    that directory against sparse-checkout patterns. Those pattern matching
    algorithms are built expecting a file path, not a directory path. This
    is especially important in the "cone mode" patterns which will match
    files that exist within the "parent directories" as well as the
    recursive directory matches.
    
    If path_matches_pattern_list() is given a directory, we can add a fake
    filename ("-") to the directory and get the same results as before,
    assuming we are in cone mode. Since sparse index requires cone mode
    patterns, this is an acceptable assumption.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    69bdbdb View commit details
    Browse the repository at this point in the history
  11. diff-lib: handle index diffs with sparse dirs

    While comparing an index to a tree, we may see a sparse directory entry.
    In this case, we should compare that portion of the tree to the tree
    represented by that entry. This could include a new tree which needs to
    be expanded to a full list of added files. It could also include an
    existing tree, in which case all of the changes inside are important to
    describe, including the modifications, additions, and deletions. Note
    that the case where the tree has a path and the index does not remains
    identical to before: the lack of a cache entry is the same with a sparse
    index.
    
    Use diff_tree_oid() appropriately to compute the diff.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    9eb00af View commit details
    Browse the repository at this point in the history
  12. status: skip sparse-checkout percentage with sparse-index

    'git status' began reporting a percentage of populated paths when
    sparse-checkout is enabled in 051df3c (wt-status: show sparse
    checkout status as well, 2020-07-18). This percentage is incorrect when
    the index has sparse directories. It would also be expensive to
    calculate as we would need to parse trees to count the total number of
    possible paths.
    
    Avoid the expensive computation by simplifying the output to only report
    that a sparse checkout exists, without the percentage.
    
    This change is the reason we use 'git status --porcelain=v2' in
    t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
    this message is equal across both modes, but instead just the important
    information about staged, modified, and untracked files are compared.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    bf48e5a View commit details
    Browse the repository at this point in the history
  13. status: use sparse-index throughout

    By testing 'git -c core.fsmonitor= status -uno', we can check for the
    simplest index operations that can be made sparse-aware. The necessary
    implementation details are already integrated with sparse-checkout, so
    modify command_requires_full_index to be zero for cmd_status().
    
    In refresh_index(), we loop through the index entries to refresh their
    stat() information. However, sparse directories have no stat()
    information to populate. Ignore these entries.
    
    This allows 'git status' to no longer expand a sparse index to a full
    one. This is further tested by dropping the "-uno" option and adding an
    untracked file into the worktree.
    
    The performance test p2000-sparse-checkout-operations.sh demonstrates
    these improvements:
    
    Test                                  HEAD~1           HEAD
    -----------------------------------------------------------------------------
    2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
    2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
    2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
    2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%
    
    Note that since HEAD~1 was expanding the sparse index by parsing trees,
    it was artificially slower than the full index case. Thus, the 98%
    improvement is misleading, and instead we should celebrate the 0.34s to
    0.05s improvement of 85%. This is more indicative of the peformance
    gains we are expecting by using a sparse index.
    
    Note: we are dropping the assignment of core.fsmonitor here. This is not
    necessary for the test script as we are not altering the config any
    other way. Correct integration with FS Monitor will be validated in
    later changes.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    d76723e View commit details
    Browse the repository at this point in the history
  14. wt-status: expand added sparse directory entries

    It is difficult, but possible, to get into a state where we intend to
    add a directory that is outside of the sparse-checkout definition. Add a
    test to t1092-sparse-checkout-compatibility.sh that demonstrates this
    using a combination of 'git reset --mixed' and 'git checkout --orphan'.
    
    This test failed before because the output of 'git status
    --porcelain=v2' would not match on the lines for folder1/:
    
    * The sparse-checkout repo (with a full index) would output each path
      name that is intended to be added.
    
    * The sparse-index repo would only output that "folder1/" is staged for
      addition.
    
    The status should report the full list of files to be added, and so this
    sparse-directory entry should be expanded to a full list when reaching
    it inside the wt_status_collect_changes_initial() method. Use
    read_tree_at() to assist.
    
    Somehow, this loop over the cache entries was not guarded by
    ensure_full_index() as intended.
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    fe0d576 View commit details
    Browse the repository at this point in the history
  15. fsmonitor: integrate with sparse index

    If we need to expand a sparse-index into a full one, then the FS Monitor
    bitmap is going to be incorrect. Ensure that we start fresh at such an
    event.
    
    While this is currently a performance drawback, the eventual hope of the
    sparse-index feature is that these expansions will be rare and hence we
    will be able to keep the FS Monitor data accurate across multiple Git
    commands.
    
    These tests are added to demonstrate that the behavior is the same
    across a full index and a sparse index, but also that file modifications
    to a tracked directory outside of the sparse cone will trigger
    ensure_full_index().
    
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    f8fe49e View commit details
    Browse the repository at this point in the history
  16. t1092: document bad sparse-checkout behavior

    There are several situations where a repository with sparse-checkout
    enabled will act differently than a normal repository, and in ways that
    are not intentional. The test t1092-sparse-checkout-compatibility.sh
    documents some of these deviations, but a casual reader might think
    these are intentional behavior changes.
    
    Add comments on these tests that make it clear that these behaviors
    should be updated. Using 'NEEDSWORK' helps contributors find that these
    are potential areas for improvement.
    
    Helped-by: Elijah Newren <newren@gmail.com>
    Reviewed-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    e5ca291 View commit details
    Browse the repository at this point in the history