Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
base: e5ca291076a8a936283bb2c57433c4393d3f80c2
Choose a base ref
...
head repository: git/git
compare: e05cdb17e89a2d3257533d47350b3138bfce8737
Choose a head ref
  • 7 commits
  • 7 files changed
  • 1 contributor

Commits on Jul 14, 2021

  1. p2000: add 'git checkout -' test and decrease depth

    As we increase our list of commands to test in
    p2000-sparse-operations.sh, we will want to have a slightly smaller test
    repository. Reduce the size by a factor of four by reducing the depth of
    the step that creates a big index around a moderately-sized repository.
    
    Also add a step to run 'git checkout -' on repeat. This requires having
    a previous location in the reflog, so add that to the initialization
    steps.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    0d53d19 View commit details
    Browse the repository at this point in the history
  2. p2000: compress repo names

    By using shorter names for the test repos, we will get a slightly more
    compressed performance summary without comprimising clarity.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    11042ab View commit details
    Browse the repository at this point in the history
  3. commit: integrate with sparse-index

    Update 'git commit' to allow using the sparse-index in memory without
    expanding to a full one. The only place that had an ensure_full_index()
    call was in cache_tree_update(). The recursive algorithm for
    update_one() was already updated in 2de37c5 (cache-tree: integrate
    with sparse directory entries, 2021-03-03) to handle sparse directory
    entries in the index.
    
    Most of this change involves testing different command-line options that
    allow specifying which on-disk changes should be included in the commit.
    This includes no options (only take currently-staged changes), -a (take
    all tracked changes), and --include (take a list of specific changes).
    To simplify testing that these options do not expand the index, update
    the test that previously verified that 'git status' does not expand the
    index with a helper method, ensure_not_expanded().
    
    This allows 'git commit' to operate much faster when the sparse-checkout
    cone is much smaller than the full list of files at HEAD.
    
    Here are the relevant lines from p2000-sparse-operations.sh:
    
    Test                                      HEAD~1           HEAD
    ----------------------------------------------------------------------------------
    2000.14: git commit -a -m A (full-v3)     0.35(0.26+0.06)  0.36(0.28+0.07) +2.9%
    2000.15: git commit -a -m A (full-v4)     0.32(0.26+0.05)  0.34(0.28+0.06) +6.3%
    2000.16: git commit -a -m A (sparse-v3)   0.63(0.59+0.06)  0.04(0.05+0.05) -93.7%
    2000.17: git commit -a -m A (sparse-v4)   0.64(0.59+0.08)  0.04(0.04+0.04) -93.8%
    
    It is important to compare the full-index case to the sparse-index case,
    so the improvement for index version v4 is actually an 88% improvement in
    this synthetic example.
    
    In a real repository with over two million files at HEAD and 60,000
    files in the sparse-checkout definition, the time for 'git commit -a'
    went from 2.61 seconds to 134ms. I compared this to the result if the
    index only contained the paths in the sparse-checkout definition and
    found the theoretical optimum to be 120ms, so the out-of-cone paths only
    add a 12% overhead.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    daa1ace View commit details
    Browse the repository at this point in the history
  4. sparse-index: recompute cache-tree

    When some commands run with command_requires_full_index=1, then the
    index can get in a state where the in-memory cache tree is actually
    equal to the sparse index's cache tree instead of the full one.
    
    This results in incorrect entry_count values. By clearing the cache
    tree before converting to sparse, we avoid this issue.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    f934f1b View commit details
    Browse the repository at this point in the history
  5. checkout: stop expanding sparse indexes

    Previous changes did the necessary improvements to unpack-trees.c and
    diff-lib.c in order to modify a sparse index based on its comparision
    with a tree. The only remaining work is to remove some
    ensure_full_index() calls and add tests that verify that the index is
    not expanded in our interesting cases. Include 'switch' and 'restore' in
    these tests, as they share a base implementation with 'checkout'.
    
    Here are the relevant performance results from
    p2000-sparse-operations.sh:
    
    Test                                     HEAD~1           HEAD
    --------------------------------------------------------------------------------
    2000.18: git checkout -f - (full-v3)     0.49(0.43+0.03)  0.47(0.39+0.05) -4.1%
    2000.19: git checkout -f - (full-v4)     0.45(0.37+0.06)  0.42(0.37+0.05) -6.7%
    2000.20: git checkout -f - (sparse-v3)   0.76(0.71+0.07)  0.04(0.03+0.04) -94.7%
    2000.21: git checkout -f - (sparse-v4)   0.75(0.72+0.04)  0.05(0.06+0.04) -93.3%
    
    It is important to compare the full index case to the sparse index case,
    as the previous results for the sparse index were inflated by the index
    expansion. For index v4, this is an 88% improvement.
    
    On an internal repository with over two million paths at HEAD and a
    sparse-checkout definition containing ~60,000 of those paths, 'git
    checkout' went from 3.5s to 297ms with this change. The theoretical
    optimum where only those ~60,000 paths exist was 275ms, so the extra
    sparse directory entries contribute a 22ms overhead.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 14, 2021
    Copy the full SHA
    1ba5f45 View commit details
    Browse the repository at this point in the history

Commits on Jul 20, 2021

  1. t1092: document bad 'git checkout' behavior

    Add new branches to the test repo that demonstrate directory/file
    conflicts in different ways. Since the directory 'folder1/' has
    adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes
    searches for 'folder1/' to land in a different place in the index than a
    search for 'folder1'. This causes a change in behavior when working with
    the df-conflict-1 and df-conflict-2 branches, whose only difference is
    that the first uses 'folder1' as the conflict and the other uses
    'folder2' which does not have these adjacent files.
    
    We can extend two tests that compare the behavior across different 'git
    checkout' commands, and we see already that the behavior will be
    different in some cases and not in others. The difference between the
    two test loops is that one uses 'git reset --hard' between iterations.
    
    Further, we isolate the behavior of creating a staged change within a
    directory and then checking out a branch where that directory is
    replaced with a file. A full checkout behaves differently across these
    two cases, while a sparse-checkout cone behaves consistently. In both
    cases, the behavior is wrong. In one case, the staged change is dropped
    entirely. The other case the staged change is kept, replacing the file
    at that location, but none of the other files in the directory are kept.
    
    Likely, the correct behavior in this case is to reject the checkout and
    report the conflict, leaving HEAD in its previous location. None of the
    cases behave this way currently. Use comments to demonstrate that the
    tested behavior is only a documentation of the current, incorrect
    behavior to ensure we do not _accidentally_ change it. Instead, we would
    prefer to change it on purpose with a future change.
    
    At this point, the sparse-index does not handle these 'git checkout'
    commands correctly. Or rather, it _does_ reject the 'git checkout' when
    we have the staged change, but for the wrong reason. It also rejects the
    'git checkout' commands when there is no staged change and we want to
    replace a directory with a file. A fix for that unstaged case will
    follow in the next change, but that will make the sparse-index agree
    with the full checkout case in these documented incorrect behaviors.
    
    Helped-by: Elijah Newren <newren@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 20, 2021
    Copy the full SHA
    70569fa View commit details
    Browse the repository at this point in the history
  2. unpack-trees: resolve sparse-directory/file conflicts

    When running unpack_trees() with a sparse index, we attempt to operate
    on the index without expanding the sparse directory entries. Thus, we
    operate by manipulating entire directories and passing them to the
    unpack function. In the case of the 'git checkout' command, this is the
    twoway_merge() function.
    
    There are several cases in twoway_merge() that handle different
    situations. One new one to add is the case of a directory/file conflict
    where the directory is sparse. Before the sparse index, such a conflict
    would appear as a list of file additions and deletions. Now,
    twoway_merge() initializes 'current', 'oldtree', and 'newtree' from
    src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is
    equal to the df_conflict_entry. The way to determine that we have a
    directory/file conflict is to test that 'current' and 'newtree' disagree
    on being sparse directory entries.
    
    When we are in this case, we want to resolve the situation by calling
    merged_entry(). This allows replacing the 'current' entry with the
    'newtree' entry. This is important for cases where we want to run 'git
    checkout' across the conflict and have the new HEAD represent the new
    file type at that path. The first NEEDSWORK comment dropped in t1092
    demonstrates this necessary behavior.
    
    However, we still are in a confusing state when 'current' corresponds to
    a staged change within a sparse directory that is not present at HEAD.
    This should be atypical, because it requires adding a change outside of
    the sparse-checkout cone, but it is possible. Since we are unable to
    determine that this is a staged change within twoway_merge(), we cannot
    add a case to reject the merge at this point. I believe this is due to
    the use of df_conflict_entry in the place of 'oldtree' instead of using
    the valud at HEAD, which would provide some perspective to this
    decision. Any change that would allow this differentiation for staged
    entries would need to involve information further up in unpack_trees().
    
    That work should be done, sometime, because we are further confusing the
    behavior of a directory/file conflict when staging a change in the
    directory. The two cases 'checkout behaves oddly with df-conflict-?' in
    t1092 demonstrate that even without a sparse-checkout, Git is not
    consistent in its behavior. Neither of the two options seems correct,
    either. This change makes the sparse-index behave differently than the
    typcial sparse-checkout case, but it does match the full checkout
    behavior in the df-conflict-2 case.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 20, 2021
    Copy the full SHA
    e05cdb1 View commit details
    Browse the repository at this point in the history