Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add range-diff, a tbdiff lookalike #1

Closed
wants to merge 21 commits into from

Conversation

dscho
Copy link
Member

@dscho dscho commented Jun 5, 2018

The incredibly useful git-tbdiff tool to compare patch series (say, to see what changed between two iterations sent to the Git mailing list) is slightly less useful for this developer due to the fact that it requires the hungarian and numpy Python packages which are for some reason really hard to build in MSYS2. So hard that I even had to give up, because it was simply easier to re-implement the whole shebang as a builtin command.

The project at https://github.com/trast/tbdiff seems to be dormant, anyway. Funny (and true) story: I looked at the open Pull Requests to see how active that project is, only to find to my surprise that I had submitted one in August 2015, and that it was still unanswered let alone merged.

While at it, I forward-ported AEvar's patch to force --decorate=no because git -p tbdiff would fail otherwise.

Side note: I work on implementing range-diff not only to make life easier for reviewers who have to suffer through v2, v3, ... of my patch series, but also to verify my changes before submitting a new iteration. And also, maybe even more importantly, I plan to use it to verify my merging-rebases of Git for
Windows (for which I previously used to redirect the pre-rebase/post-rebase diffs vs upstream and then compare them using git diff --no-index). And of course any interested person can see what changes were necessary e.g. in the merging-rebase of Git for Windows onto v2.17.0 by running a command like:

        base=^{/Start.the.merging-rebase}
        tag=v2.17.0.windows.1
        pre=$tag$base^2
        git range-diff $pre$base..$pre $tag$base..$tag

The command uses what it calls the "dual color mode" (can be disabled via --no-dual-color) which helps identifying what actually changed: it prefixes lines with a - (and red background) that correspond to the first commit range, and with a + (and green background) that correspond to the second range. The rest of the lines will be colored according to the original diffs.

Changes since v4:

  • Fixed a typo in the commit message of "range-diff: add tests" that was introduced in v4.
  • White-space fixes.
  • Fixed the length of the first header underline in the man page.
  • Changed the preprocessor guard in linear-assignment.h to reflect the new name (instead of the old name, which was hungarian.h).
  • Likewise, changed the preprocessor guards in range-diff.h to hide the history of the thrice-renamed command.
  • Fixed indentation in the completion.
  • Instead of trying to paper over white-space error handling that does not apply to "diffs of diffs", dual color mode now simply disables all white-space warnings.
  • When showing the "single arg must be symmetric range" error message, git range-diff now also shows the usage.
  • Adjusted the commit message of "range-diff: adjust the output of the commit pairs" to avoid the surprise of the reviewer when onelines are printed all of a sudden, too.
  • "range-diff: adjust the output of the commit pairs" is now using a simpler way to print onelines.
  • We are now sandwiching the diff_opt_parse() loop between two parse_options(), to make sure that we caught all options, and that the -- separator is handled.
  • Adjusted the lookup_commit_reference() call to the newest master (it now takes a the_repository parameter).

Changes since v3:

  • The cover letter was adjusted to reflect the new reality (the command is called range-diff now, not branch-diff, and --dual-color is the default).
  • The documentation was adjusted a bit more in the patch that makes --dual-color the default.
  • Clarified the calculation of the cost matrix, as per Stefan Beller's request.
  • The man page now spells out that merge commits are ignored in the commit ranges (not merges per se).
  • The code in linear-assignment.c was adjusted to use the SWAP() macro.
  • The commit message of the patch introducing the first rudimentary implementation no longer talks about the "Hungarian" algorithm, but about the "linear assignment algorithm" instead.
  • A bogus indentation change was backed out from the patch introducing the first rudimentary implementation.
  • Instead of merely warning about missing .. in the 2-parameter invocation, we now exit with the error message.
  • The diff_opt_parse() function is allowed to return a value larger than 1, indicating that more than just one command-line parameter was parsed. We now advance by the indicated value instead of always advancing exactly 1 (which is still correct much of the time).
  • A lengthy if...else if...else if...else was simplified (from a logical point of view) by reordering it.
  • The unnecessarily static variable dashes was turned into a local variable of the caller.
  • The commit message talking about the new man page still referred to git branch --diff, which has been fixed.
  • A forgotten t7910 reference was changed to t3206.
  • An unbalanced double-tick was fixed in the man page.
  • Fixed grammar both of the commit message and the description of the --no-dual-color option.
  • To fix the build, a blank man page is now introduced together with the new range-diff command, even if it is populated for real only at a later patch (i.e. at the same time as before).
  • The headaches Junio fears would be incurred by that simple workaround to avoid bogus white-space error reporting are fended off: a more complex patch is now in place that adds (and uses) a new white-space flag. Sadly, as is all too common when Junio "encourages" me to replace a simple workaround by something "proper", it caused all kinds of headaches to get this right, so I am rather less certain that the "proper" fix will cause us less headaches than the simple workaround would have done. But whatever.
  • The dual color mode now also dims the changes that are exclusively in the first specified commit range, and uses bold face on the changes exclusively in the second one. This matches the intuition when using range-diff to compare an older iteration of a patch series to a newer one: the changes from the previous iteration that were replaced by new ones "fade", while the changes that replace them are "shiny new".

Changes since v2:

  • Right-aligned the patch numbers in the commit pairs.
  • Used ALLOC_ARRAY() in hungarian.c instead of xmalloc(sizeof()*size).
  • Changed compute_assignment()s return type from int to void, as it always succeeds.
  • Changed the Hungarian Algorithm to use an integer cost matrix.
  • Changed the --creation-weight option to --creation-factor where is an integer.
  • Retitled 1/19 and 2/19 to better conform with the current conventions, as pointed out (and suggested) by Junio.
  • Shut up Coverity, and at the same time avoided passing the unnecessary i and j parameters to output_pair_header().
  • Removed support for the --no-patches option: we inherit diff_options' support for -s already (and much more).
  • Removed the ugly _INV enum values, and introduced a beautiful GIT_COLOR_REVERSE instead. This way, whatever the user configured as color.diff.new (or .old) will be used in reverse in the dual color mode.
  • Instead of overriding the fragment header color, the dual color mode will now reverse the "outer" fragment headers, too.
  • Turned the stand-alone branch-diff command into the --diff option of git branch. Adjusted pretty much all commit messages to account for this. This change should no longer be visible: see below.
  • Pretty much re-wrote the completion, to support the new --diff mode of git-branch. See below: it was reverted for range-diff.
  • Renamed t7910 to t3206, to be closer to the git-branch tests.
  • Ensured that git_diff_ui_config() gets called, and therefore color.diff.* respected.
  • Avoided leaking four_spaces.
  • Fixed a declaration in a for (;;) statement (which Junio had as a fixup! that I almost missed).
  • Renamed branch --diff, which had been renamed from branch-diff (which was picked to avoid re-using tbdiff) to range-diff.
  • Renamed hungarian.c and its header to linear-assignment.c
  • Made --dual-color the default, and changed it to still auto-detect whether color should be used rather than forcing it

@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 5, 2018
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 7, 2018

An error occurred while submitting:

Error: Branch f8c4e30c63bbfa7efec44ff0f6d0404326723d35 is not rebased to upstream/master

@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 8, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 8, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 8, 2018
dscho pushed a commit that referenced this pull request Jun 20, 2018
Change "fetch" to treat "+" in refspecs (aka --force) to mean we
should clobber a local tag of the same name.

This changes the long-standing behavior of "fetch" added in
853a369 ("[PATCH] Multi-head fetch.", 2005-08-20), before this
change all tag fetches effectively had --force enabled. The original
rationale in that change was:

    > Tags need not be pointing at commits so there is no way to
    > guarantee "fast-forward" anyway.

That comment and the rest of the history of "fetch" shows that the
"+" (--force) part of refpecs was only conceived for branch updates,
while tags have accepted any changes from upstream unconditionally and
clobbered the local tag object. Changing this behavior has been
discussed as early as 2011[1].

I the current behavior doesn't make sense, it easily results in local
tags accidentally being clobbered. Ideally we'd namespace our tags
per-remote, but as with my 97716d2 ("fetch: add a --prune-tags
option and fetch.pruneTags config", 2018-02-09) it's easier to work
around the current implementation than to fix the root cause, so this
implements suggestion #1 from [1], "fetch" now only clobbers the tag
if either "+" is provided as part of the refspec, or if "--force" is
provided on the command-line.

This also makes it nicely symmetrical with how "tag" itself
works. We'll now refuse to clobber any existing tags unless "--force"
is supplied, whether that clobbering would happen by clobbering a
local tag with "tag", or by fetching it from the remote with "fetch".

It's still not at all nicely symmetrical with how "git push" works, as
discussed in the updated pull-fetch-param.txt documentation, but this
change brings them more into line with one another. I don't think
there's any reason "fetch" couldn't fully converge with the behavior
used by "push", but that's a topic for another change.

One of the tests added in 31b808a ("clone --single: limit the fetch
refspec to fetched branch", 2012-09-20) is being changed to use
--force where a clone would clobber a tag. This changes nothing about
the existing behavior of the test.

1. https://public-inbox.org/git/20111123221658.GA22313@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@dscho dscho force-pushed the branch-diff branch 2 times, most recently from 79ce337 to 4a68b95 Compare June 30, 2018 21:36
@dscho
Copy link
Member Author

dscho commented Jul 3, 2018

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2018

An error occurred while submitting:

Error: git tag -F - -a pr-1/dscho/branch-diff-v3 4a68b95 failed: 128,
fatal: tag 'pr-1/dscho/branch-diff-v3' already exists

@dscho
Copy link
Member Author

dscho commented Jul 3, 2018

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2018

Submitted as pull.1.v3.git.gitgitgadget@gmail.com

@dscho dscho force-pushed the branch-diff branch 3 times, most recently from ebf3fea to d4e27c6 Compare July 9, 2018 20:39
@dscho dscho force-pushed the branch-diff branch 4 times, most recently from 5a2cf0c to d8498fb Compare July 21, 2018 21:40
@dscho
Copy link
Member Author

dscho commented Jul 21, 2018

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 21, 2018

Submitted as pull.1.v4.git.gitgitgadget@gmail.com

dscho pushed a commit that referenced this pull request Jun 1, 2023
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Reviewed-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Reviewed-by: 依云 <lilydjwg@gmail.com>
Reviewed-by: pan93412 <pan93412@gmail.com>
Reviewed0by: Fangyi Zhou <me@fangyi.io>
dscho pushed a commit that referenced this pull request Jun 1, 2023
* 'tl/zh_CN_2.41.0_rnd1' of github.com:dyrone/git:
  l10n: zh_CN: Git 2.41.0 round #1
dscho pushed a commit that referenced this pull request Jun 1, 2023
l10n-2.41.0-2

* tag 'l10n-2.41.0-2' of https://github.com/git-l10n/git-po:
  l10n: zh_TW.po: Git 2.41.0
  l10n: sv.po: Update Swedish translation (5515t0f0u)
  l10n: Update Catalan translation
  l10n: Update German translation
  l10n: po-id for 2.41 (round 1)
  l10n: Update Catalan translation
  l10n: tr: Update Turkish translations for 2.41.0
  l10n: fr.po v2.41.0 rnd2
  l10n: fr.po v2.41.0 rnd1
  l10n: fr: fix translation of stash save help
  l10n: zh_CN: Git 2.41.0 round #1
  l10n: bg.po: Updated Bulgarian translation (5515t)
  l10n: update uk localization
  l10n: uk: remove stale lines
  l10n: uk: add initial translation
  l10n: TEAMS: Update pt_PT repo link
dscho pushed a commit that referenced this pull request Jul 27, 2023
When "update-index --unresolve $path" cannot find the resolve-undo
record for the path the user requested to unresolve, it stuffs the
blobs from HEAD and MERGE_HEAD to stage #2 and stage #3 as a
fallback.  For this reason, the operation does not even start unless
both "HEAD" and "MERGE_HEAD" exist.

This is suboptimal in a few ways:

 * It does not recreate stage #1.  Even though it is a correct
   design decision not to do so (because it is impossible to
   recreate in general cases, without knowing how we got there,
   including what merge strategy was used), it is much less useful
   not to have that information in the index.

 * It limits the "unresolve" operation only during a conflicted "git
   merge" and nothing else.  Other operations like "rebase",
   "cherry-pick", and "switch -m" may result in conflicts, and the
   user may want to unresolve the conflict that they incorrectly
   resolved in order to redo the resolution, but the fallback would
   not kick in.

 * Most importantly, the entire "unresolve" operation is disabled
   after a conflicted merge is committed and MERGE_HEAD is removed,
   even though the index has perfectly usable resolve-undo records.

By lazily reading the HEAD and MERGE_HEAD only when we need to go to
the fallback codepath, we will allow cases where resolve-undo
records are available (which is 100% of the time, unless the user is
reading from an index file created by Git more than 10 years ago) to
proceed even after a conflicted merge was committed, during other
mergy operations that do not use MERGE_HEAD, or after the result of
such mergy operations has been committed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Jul 31, 2023
When "update-index --unresolve $path" cannot find the resolve-undo
record for the path the user requested to unresolve, it stuffs the
blobs from HEAD and MERGE_HEAD to stage #2 and stage #3 as a
fallback.  For this reason, the operation does not even start unless
both "HEAD" and "MERGE_HEAD" exist.

This is suboptimal in a few ways:

 * It does not recreate stage #1.  Even though it is a correct
   design decision not to do so (because it is impossible to
   recreate in general cases, without knowing how we got there,
   including what merge strategy was used), it is much less useful
   not to have that information in the index.

 * It limits the "unresolve" operation only during a conflicted "git
   merge" and nothing else.  Other operations like "rebase",
   "cherry-pick", and "switch -m" may result in conflicts, and the
   user may want to unresolve the conflict that they incorrectly
   resolved in order to redo the resolution, but the fallback would
   not kick in.

 * Most importantly, the entire "unresolve" operation is disabled
   after a conflicted merge is committed and MERGE_HEAD is removed,
   even though the index has perfectly usable resolve-undo records.

By lazily reading the HEAD and MERGE_HEAD only when we need to go to
the fallback codepath, we will allow cases where resolve-undo
records are available (which is 100% of the time, unless the user is
reading from an index file created by Git more than 10 years ago) to
proceed even after a conflicted merge was committed, during other
mergy operations that do not use MERGE_HEAD, or after the result of
such mergy operations has been committed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Aug 24, 2023
When t5583-push-branches.sh was originally introduced via 425b4d7
(push: introduce '--branches' option, 2023-05-06), it was not leak-free.
In fact, the test did not even run correctly until 022fbb6 (t5583:
fix shebang line, 2023-05-12), but after applying that patch, we see a
failure at t5583.8:

    ==2529087==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 384 byte(s) in 1 object(s) allocated from:
        #0 0x7fb536330986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x55e07606cbf9 in xrealloc wrapper.c:140
        #2 0x55e075fb6cb3 in prio_queue_put prio-queue.c:42
        #3 0x55e075ec81cb in get_reachable_subset commit-reach.c:917
        #4 0x55e075fe9cce in add_missing_tags remote.c:1518
        #5 0x55e075fea1e4 in match_push_refs remote.c:1665
        #6 0x55e076050a8e in transport_push transport.c:1378
        #7 0x55e075e2eb74 in push_with_options builtin/push.c:401
        #8 0x55e075e2edb0 in do_push builtin/push.c:458
        #9 0x55e075e2ff7a in cmd_push builtin/push.c:702
        #10 0x55e075d8aaf0 in run_builtin git.c:452
        #11 0x55e075d8af08 in handle_builtin git.c:706
        #12 0x55e075d8b12c in run_argv git.c:770
        #13 0x55e075d8b6a0 in cmd_main git.c:905
        #14 0x55e075e81f07 in main common-main.c:60
        #15 0x7fb5360ab6c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
        #16 0x7fb5360ab784 in __libc_start_main_impl ../csu/libc-start.c:360
        #17 0x55e075d88f40 in _start (git+0x1ff40) (BuildId: 38ad998b85a535e786129979443630d025ec2453)

    SUMMARY: LeakSanitizer: 384 byte(s) leaked in 1 allocation(s).

This leak was addressed independently via 68b5117 (commit-reach: fix
memory leak in get_reachable_subset(), 2023-06-03), which makes t5583
leak-free.

But t5583 was not in the tree when 68b5117 was written, and the two
only met after the latter was merged back in via 693bde4 (Merge
branch 'mh/commit-reach-get-reachable-plug-leak', 2023-06-20).

At that point, t5583 was leak-free. Let's mark it as such accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Aug 29, 2023
When t5583-push-branches.sh was originally introduced via 425b4d7
(push: introduce '--branches' option, 2023-05-06), it was not leak-free.
In fact, the test did not even run correctly until 022fbb6 (t5583:
fix shebang line, 2023-05-12), but after applying that patch, we see a
failure at t5583.8:

    ==2529087==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 384 byte(s) in 1 object(s) allocated from:
        #0 0x7fb536330986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x55e07606cbf9 in xrealloc wrapper.c:140
        #2 0x55e075fb6cb3 in prio_queue_put prio-queue.c:42
        #3 0x55e075ec81cb in get_reachable_subset commit-reach.c:917
        #4 0x55e075fe9cce in add_missing_tags remote.c:1518
        #5 0x55e075fea1e4 in match_push_refs remote.c:1665
        #6 0x55e076050a8e in transport_push transport.c:1378
        #7 0x55e075e2eb74 in push_with_options builtin/push.c:401
        #8 0x55e075e2edb0 in do_push builtin/push.c:458
        #9 0x55e075e2ff7a in cmd_push builtin/push.c:702
        #10 0x55e075d8aaf0 in run_builtin git.c:452
        #11 0x55e075d8af08 in handle_builtin git.c:706
        #12 0x55e075d8b12c in run_argv git.c:770
        #13 0x55e075d8b6a0 in cmd_main git.c:905
        #14 0x55e075e81f07 in main common-main.c:60
        #15 0x7fb5360ab6c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
        #16 0x7fb5360ab784 in __libc_start_main_impl ../csu/libc-start.c:360
        #17 0x55e075d88f40 in _start (git+0x1ff40) (BuildId: 38ad998b85a535e786129979443630d025ec2453)

    SUMMARY: LeakSanitizer: 384 byte(s) leaked in 1 allocation(s).

This leak was addressed independently via 68b5117 (commit-reach: fix
memory leak in get_reachable_subset(), 2023-06-03), which makes t5583
leak-free.

But t5583 was not in the tree when 68b5117 was written, and the two
only met after the latter was merged back in via 693bde4 (Merge
branch 'mh/commit-reach-get-reachable-plug-leak', 2023-06-20).

At that point, t5583 was leak-free. Let's mark it as such accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot pushed a commit that referenced this pull request Sep 15, 2023
gitgitgadget bot pushed a commit that referenced this pull request Sep 15, 2023
gitgitgadget bot pushed a commit that referenced this pull request Sep 15, 2023
gitgitgadget bot pushed a commit that referenced this pull request Sep 15, 2023
gitgitgadget bot pushed a commit that referenced this pull request Sep 15, 2023
gitgitgadget bot pushed a commit that referenced this pull request Sep 15, 2023
gitgitgadget bot pushed a commit that referenced this pull request Sep 15, 2023
gitgitgadget bot pushed a commit that referenced this pull request Sep 15, 2023
gitgitgadget bot pushed a commit that referenced this pull request Sep 15, 2023
gitgitgadget bot pushed a commit that referenced this pull request Sep 15, 2023
gitgitgadget bot pushed a commit that referenced this pull request Oct 4, 2023
gitgitgadget bot pushed a commit that referenced this pull request Nov 2, 2023
gitgitgadget bot pushed a commit that referenced this pull request Nov 24, 2023
It is tempting to think of "files and directories" of the current
directory as valid inputs to the add and set subcommands of git
sparse-checkout.  However, in non-cone mode, they often aren't and using
them as potential completions leads to *many* forms of confusion:

Issue #1. It provides the *wrong* files and directories.

For
    git sparse-checkout add
we always want to add files and directories not currently in our sparse
checkout, which means we want file and directories not currently present
in the current working tree.  Providing the files and directories
currently present is thus always wrong.

For
    git sparse-checkout set
we have a similar problem except in the subset of cases where we are
trying to narrow our checkout to a strict subset of what we already
have.  That is not a very common scenario, especially since it often
does not even happen to be true for the first use of the command; for
years we required users to create a sparse-checkout via
    git sparse-checkout init
    git sparse-checkout set <args...>
(or use a clone option that did the init step for you at clone time).
The init command creates a minimal sparse-checkout with just the
top-level directory present, meaning the set command has to be used to
expand the checkout.  Thus, only in a special and perhaps unusual cases
would any of the suggestions from normal file and directory completion
be appropriate.

Issue #2: Suggesting patterns that lead to warnings is unfriendly.

If the user specifies any regular file and omits the leading '/', then
the sparse-checkout command will warn the user that their command is
problematic and suggest they use a leading slash instead.

Issue #3: Completion gets confused by leading '/', and provides wrong paths.

Users often want to anchor their patterns to the toplevel of the
repository, especially when listing individual files.  There are a
number of reasons for this, but notably even sparse-checkout encourages
them to do so (as noted above).  However, if users do so (via adding a
leading '/' to their pattern), then bash completion will interpret the
leading slash not as a request for a path at the toplevel of the
repository, but as a request for a path at the root of the filesytem.
That means at best that completion cannot help with such paths, and if
it does find any completions, they are almost guaranteed to be wrong.

Issue #4: Suggesting invalid patterns from subdirectories is unfriendly.

There is no per-directory equivalent to .gitignore with
sparse-checkouts.  There is only a single worktree-global
$GIT_DIR/info/sparse-checkout file.  As such, paths to files must be
specified relative to the toplevel of a repository.  Providing
suggestions of paths that are relative to the current working directory,
as bash completion defaults to, is wrong when the current working
directory is not the worktree toplevel directory.

Issue #5: Paths with special characters will be interpreted incorrectly

The entries in the sparse-checkout file are patterns, not paths.  While
most paths also qualify as patterns (though even in such cases it would
be better for users to not use them directly but prefix them with a
leading '/'), there are a variety of special characters that would need
special escaping beyond the normal shell escaping: '*', '?', '\', '[',
']', and any leading '#' or '!'.  If completion suggests any such paths,
users will likely expect them to be treated as an exact path rather than
as a pattern that might match some number of files other than 1.

Because of the combination of the above issues, turn completion off for
the `set` and `add` subcommands of `sparse-checkout` when in non-cone
mode, but leave a NEEDSWORK comment specifying what could theoretically
be done if someone wanted to provide completion rules that were more
helpful than harmful.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot pushed a commit that referenced this pull request Nov 27, 2023
It is tempting to think of "files and directories" of the current
directory as valid inputs to the add and set subcommands of git
sparse-checkout.  However, in non-cone mode, they often aren't and using
them as potential completions leads to *many* forms of confusion:

Issue #1. It provides the *wrong* files and directories.

For
    git sparse-checkout add
we always want to add files and directories not currently in our sparse
checkout, which means we want file and directories not currently present
in the current working tree.  Providing the files and directories
currently present is thus always wrong.

For
    git sparse-checkout set
we have a similar problem except in the subset of cases where we are
trying to narrow our checkout to a strict subset of what we already
have.  That is not a very common scenario, especially since it often
does not even happen to be true for the first use of the command; for
years we required users to create a sparse-checkout via
    git sparse-checkout init
    git sparse-checkout set <args...>
(or use a clone option that did the init step for you at clone time).
The init command creates a minimal sparse-checkout with just the
top-level directory present, meaning the set command has to be used to
expand the checkout.  Thus, only in a special and perhaps unusual cases
would any of the suggestions from normal file and directory completion
be appropriate.

Issue #2: Suggesting patterns that lead to warnings is unfriendly.

If the user specifies any regular file and omits the leading '/', then
the sparse-checkout command will warn the user that their command is
problematic and suggest they use a leading slash instead.

Issue #3: Completion gets confused by leading '/', and provides wrong paths.

Users often want to anchor their patterns to the toplevel of the
repository, especially when listing individual files.  There are a
number of reasons for this, but notably even sparse-checkout encourages
them to do so (as noted above).  However, if users do so (via adding a
leading '/' to their pattern), then bash completion will interpret the
leading slash not as a request for a path at the toplevel of the
repository, but as a request for a path at the root of the filesytem.
That means at best that completion cannot help with such paths, and if
it does find any completions, they are almost guaranteed to be wrong.

Issue #4: Suggesting invalid patterns from subdirectories is unfriendly.

There is no per-directory equivalent to .gitignore with
sparse-checkouts.  There is only a single worktree-global
$GIT_DIR/info/sparse-checkout file.  As such, paths to files must be
specified relative to the toplevel of a repository.  Providing
suggestions of paths that are relative to the current working directory,
as bash completion defaults to, is wrong when the current working
directory is not the worktree toplevel directory.

Issue #5: Paths with special characters will be interpreted incorrectly

The entries in the sparse-checkout file are patterns, not paths.  While
most paths also qualify as patterns (though even in such cases it would
be better for users to not use them directly but prefix them with a
leading '/'), there are a variety of special characters that would need
special escaping beyond the normal shell escaping: '*', '?', '\', '[',
']', and any leading '#' or '!'.  If completion suggests any such paths,
users will likely expect them to be treated as an exact path rather than
as a pattern that might match some number of files other than 1.

Because of the combination of the above issues, turn completion off for
the `set` and `add` subcommands of `sparse-checkout` when in non-cone
mode, but leave a NEEDSWORK comment specifying what could theoretically
be done if someone wanted to provide completion rules that were more
helpful than harmful.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot pushed a commit that referenced this pull request Dec 9, 2023
It is tempting to think of "files and directories" of the current
directory as valid inputs to the add and set subcommands of git
sparse-checkout.  However, in non-cone mode, they often aren't and using
them as potential completions leads to *many* forms of confusion:

Issue #1. It provides the *wrong* files and directories.

For
    git sparse-checkout add
we always want to add files and directories not currently in our sparse
checkout, which means we want file and directories not currently present
in the current working tree.  Providing the files and directories
currently present is thus always wrong.

For
    git sparse-checkout set
we have a similar problem except in the subset of cases where we are
trying to narrow our checkout to a strict subset of what we already
have.  That is not a very common scenario, especially since it often
does not even happen to be true for the first use of the command; for
years we required users to create a sparse-checkout via
    git sparse-checkout init
    git sparse-checkout set <args...>
(or use a clone option that did the init step for you at clone time).
The init command creates a minimal sparse-checkout with just the
top-level directory present, meaning the set command has to be used to
expand the checkout.  Thus, only in a special and perhaps unusual cases
would any of the suggestions from normal file and directory completion
be appropriate.

Issue #2: Suggesting patterns that lead to warnings is unfriendly.

If the user specifies any regular file and omits the leading '/', then
the sparse-checkout command will warn the user that their command is
problematic and suggest they use a leading slash instead.

Issue #3: Completion gets confused by leading '/', and provides wrong paths.

Users often want to anchor their patterns to the toplevel of the
repository, especially when listing individual files.  There are a
number of reasons for this, but notably even sparse-checkout encourages
them to do so (as noted above).  However, if users do so (via adding a
leading '/' to their pattern), then bash completion will interpret the
leading slash not as a request for a path at the toplevel of the
repository, but as a request for a path at the root of the filesytem.
That means at best that completion cannot help with such paths, and if
it does find any completions, they are almost guaranteed to be wrong.

Issue #4: Suggesting invalid patterns from subdirectories is unfriendly.

There is no per-directory equivalent to .gitignore with
sparse-checkouts.  There is only a single worktree-global
$GIT_DIR/info/sparse-checkout file.  As such, paths to files must be
specified relative to the toplevel of a repository.  Providing
suggestions of paths that are relative to the current working directory,
as bash completion defaults to, is wrong when the current working
directory is not the worktree toplevel directory.

Issue #5: Paths with special characters will be interpreted incorrectly

The entries in the sparse-checkout file are patterns, not paths.  While
most paths also qualify as patterns (though even in such cases it would
be better for users to not use them directly but prefix them with a
leading '/'), there are a variety of special characters that would need
special escaping beyond the normal shell escaping: '*', '?', '\', '[',
']', and any leading '#' or '!'.  If completion suggests any such paths,
users will likely expect them to be treated as an exact path rather than
as a pattern that might match some number of files other than 1.

However, despite the first four issues, we can note that _if_ users are
using tab completion, then they are probably trying to specify a path in
the index.  As such, we transform their argument into a top-level-rooted
pattern that matches such a file.  For example, if they type:
   git sparse-checkout add Make<TAB>
we could "complete" to
   git sparse-checkout add /Makefile
or, if they ran from the Documentation/technical/ subdirectory:
   git sparse-checkout add m<TAB>
we could "complete" it to:
   git sparse-checkout add /Documentation/technical/multi-pack-index.txt
Note in both cases I use "complete" in quotes, because we actually add
characters both before and after the argument in question, so we are
kind of abusing "bash completions" to be "bash completions AND
beginnings".

The fifth issue is a bit stickier, especially when you consider that we
not only need to deal with escaping issues because of special meanings
of patterns in sparse-checkout & gitignore files, but also that we need
to consider escaping issues due to ls-files needing to sometimes quote
or escape characters, and because the shell needs to escape some
characters.  The multiple interacting forms of escaping could get ugly;
this patch makes no attempt to do so and simply documents that we
decided to not deal with those corner cases for now but at least get the
common cases right.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot pushed a commit that referenced this pull request Dec 9, 2023
gitgitgadget bot pushed a commit that referenced this pull request Dec 15, 2023
When reusing objects from a pack, we keep track of a set of one or more
`reused_chunk`s, corresponding to sections of one or more object(s) from
a source pack that we are reusing. Each chunk contains two pieces of
information:

  - the offset of the first object in the source pack (relative to the
    beginning of the source pack)
  - the difference between that offset, and the corresponding offset in
    the pack we're generating

The purpose of keeping track of these is so that we can patch an
OFS_DELTAs that cross over a section of the reuse pack that we didn't
take.

For instance, consider a hypothetical pack as shown below:

                                                (chunk #2)
                                                __________...
                                               /
                                              /
      +--------+---------+-------------------+---------+
  ... | <base> | <other> |      (unused)     | <delta> | ...
      +--------+---------+-------------------+---------+
       \                /
        \______________/
           (chunk #1)

Suppose that we are sending objects "base", "other", and "delta", and
that the "delta" object is stored as an OFS_DELTA, and that its base is
"base". If we don't send any objects in the "(unused)" range, we can't
copy the delta'd object directly, since its delta offset includes a
range of the pack that we didn't copy, so we have to account for that
difference when patching and reassembling the delta.

In order to compute this value correctly, we need to know not only where
we are in the packfile we're assembling (with `hashfile_total(f)`) but
also the position of the first byte of the packfile that we are
currently reusing. Currently, this works just fine, since when reusing
only a single pack those two values are always identical (because
verbatim reuse is the first thing pack-objects does when enabled after
writing the pack header).

But when reusing multiple packs which have one or more gaps, we'll need
to account for these two values diverging.

Together, these two allow us to compute the reused chunk's offset
difference relative to the start of the reused pack, as desired.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot pushed a commit that referenced this pull request Jan 3, 2024
gitgitgadget bot pushed a commit that referenced this pull request Jan 5, 2024
The t5309 script triggers a racy false positive with SANITIZE=leak on a
multi-core system. Running with "--stress --run=6" usually fails within
10 seconds or so for me, complaining with something like:

    + git index-pack --fix-thin --stdin
    fatal: REF_DELTA at offset 46 already resolved (duplicate base 01d7713666f4de822776c7622c10f1b07de280dc?)

    =================================================================
    ==3904583==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 32 byte(s) in 1 object(s) allocated from:
        #0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
        #2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
        #3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
        #4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
        #5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
        #6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
        #7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

    SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).
    Aborted

What happens is this:

  0. We construct a bogus pack with a duplicate object in it and trigger
     index-pack.

  1. We spawn a bunch of worker threads to resolve deltas (on my system
     it is 16 threads).

  2. One of the threads sees the duplicate object and bails by calling
     exit(), taking down all of the threads. This is expected and is the
     point of the test.

  3. At the time exit() is called, we may still be spawning threads from
     the main process via pthread_create(). LSan hooks thread creation
     to update its book-keeping; it has to know where each thread's
     stack is (so it can find entry points for reachable memory). So it
     calls pthread_getattr_np() to get information about the new thread.
     That may allocate memory that must be freed with a matching call to
     pthread_attr_destroy(). Probably LSan does that immediately, but
     if you're unlucky enough, the exit() will happen while it's between
     those two calls, and the allocated pthread_attr_t appears as a
     leak.

This isn't a real leak. It's not even in our code, but rather in the
LSan instrumentation code. So we could just ignore it. But the false
positive can cause people to waste time tracking it down.

It's possibly something that LSan could protect against (e.g., cover the
getattr/destroy pair with a mutex, and then in the final post-exit()
check for leaks try to take the same mutex). But I don't know enough
about LSan to say if that's a reasonable approach or not (or if my
analysis is even completely correct).

In the meantime, it's pretty easy to avoid the race by making creation
of the worker threads "atomic". That is, we'll spawn all of them before
letting any of them start to work. That's easy to do because we already
have a work_lock() mutex for handing out that work. If the main process
takes it, then all of the threads will immediately block until we've
finished spawning and released it.

This shouldn't make any practical difference for non-LSan runs. The
thread spawning is quick, and could happen before any worker thread gets
scheduled anyway.

Probably other spots that use threads are subject to the same issues.
But since we have to manually insert locking (and since this really is
kind of a hack), let's not bother with them unless somebody experiences
a similar racy false-positive in practice.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot pushed a commit that referenced this pull request Feb 3, 2024
gitgitgadget bot pushed a commit that referenced this pull request Mar 4, 2024
gitgitgadget bot pushed a commit that referenced this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants