Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Commits on Apr 7, 2014
  1. tree-diff: rework diff_tree() to generate diffs for multiparent cases…

    Kirill Smelkov authored committed
    … as well
    
    Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
    generating diff_filepair(s) for two trees t1 and t2, and that was
    usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
    a commit introduces.
    
    In Git, however, we have fundamentally built flexibility in that a
    commit can have many parents - 1 for a plain commit, 2 for a simple merge,
    but also more than 2 for merging several heads at once.
    
    For merges there is a so called combine-diff, which shows diff, a merge
    introduces by itself, omitting changes done by any parent. That works
    through first finding paths, that are different to all parents, and then
    showing generalized diff, with separate columns for +/- for each parent.
    The code lives in combine-diff.c .
    
    There is an impedance mismatch, however, in that a commit could
    generally have any number of parents, and that while diffing trees, we
    divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
    is no special casing for multiple parents commits in e.g.
    revision-walker .
    
    That impedance mismatch *hurts* *performance* *badly* for generating
    combined diffs - in "combine-diff: optimize combine_diff_path
    sets intersection" I've already removed some slowness from it, but from
    the timings provided there, it could be seen, that combined diffs still
    cost more than an order of magnitude more cpu time, compared to diff for
    usual commits, and that would only be an optimistic estimate, if we take
    into account that for e.g. linux.git there is only one merge for several
    dozens of plain commits.
    
    That slowness comes from the fact that currently, while generating
    combined diff, a lot of time is spent computing diff(commit,commit^2)
    just to only then intersect that huge diff to almost small set of files
    from diff(commit,commit^1).
    
    That's because at present, to compute combine-diff, for first finding
    paths, that "every parent touches", we use the following combine-diff
    property/definition:
    
    D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn)      (w.r.t. paths)
    
    where
    
    D(A,P1...Pn) is combined diff between commit A, and parents Pi
    
    and
    
    D(A,Pi) is usual two-tree diff Pi..A
    
    So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
    1-parent diffs and intersecting results will be slow.
    
    And usually, for linux.git and other topic-based workflows, that
    D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
    of merges (from A, via first parent) below, that D(A,P2) will be diffing
    sum of merges from several subsystems to 1 subsystem.
    
    The solution is to avoid computing n 1-parent diffs, and to find
    changed-to-all-parents paths via scanning A's and all Pi's trees
    simultaneously, at each step comparing their entries, and based on that
    comparison, populate paths result, and deduce we could *skip*
    *recursing* into subdirectories, if at least for 1 parent, sha1 of that
    dir tree is the same as in A. That would save us from doing significant
    amount of needless work.
    
    Such approach is very similar to what diff_tree() does, only there we
    deal with scanning only 2 trees simultaneously, and for n+1 tree, the
    logic is a bit more complex:
    
    D(T,P1...Pn) calculation scheme
    -------------------------------
    
    D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn)	(regarding resulting paths set)
    
        D(T,Pj)		- diff between T..Pj
        D(T,P1...Pn)	- combined diff from T to parents P1,...,Pn
    
    We start from all trees, which are sorted, and compare their entries in
    lock-step:
    
         T     P1       Pn
         -     -        -
        |t|   |p1|     |pn|
        |-|   |--| ... |--|      imin = argmin(p1...pn)
        | |   |  |     |  |
        |-|   |--|     |--|
        |.|   |. |     |. |
         .     .        .
         .     .        .
    
    at any time there could be 3 cases:
    
        1)  t < p[imin];
        2)  t > p[imin];
        3)  t = p[imin].
    
    Schematic deduction of what every case means, and what to do, follows:
    
    1)  t < p[imin]  ->  ∀j t ∉ Pj  ->  "+t" ∈ D(T,Pj)  ->  D += "+t";  t↓
    
    2)  t > p[imin]
    
        2.1) ∃j: pj > p[imin]  ->  "-p[imin]" ∉ D(T,Pj)  ->  D += ø;  ∀ pi=p[imin]  pi↓
        2.2) ∀i  pi = p[imin]  ->  pi ∉ T  ->  "-pi" ∈ D(T,Pi)  ->  D += "-p[imin]";  ∀i pi↓
    
    3)  t = p[imin]
    
        3.1) ∃j: pj > p[imin]  ->  "+t" ∈ D(T,Pj)  ->  only pi=p[imin] remains to investigate
        3.2) pi = p[imin]  ->  investigate δ(t,pi)
         |
         |
         v
    
        3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø  ->
    
                          ⎧δ(t,pi)  - if pi=p[imin]
                 ->  D += ⎨
                          ⎩"+t"     - if pi>p[imin]
    
        in any case t↓  ∀ pi=p[imin]  pi↓
    
    ~
    
    For comparison, here is how diff_tree() works:
    
    D(A,B) calculation scheme
    -------------------------
    
        A     B
        -     -
       |a|   |b|    a < b   ->  a ∉ B   ->   D(A,B) +=  +a    a↓
       |-|   |-|    a > b   ->  b ∉ A   ->   D(A,B) +=  -b    b↓
       | |   | |    a = b   ->  investigate δ(a,b)            a↓ b↓
       |-|   |-|
       |.|   |.|
        .     .
        .     .
    
    ~~~~~~~~
    
    This patch generalizes diff tree-walker to work with arbitrary number of
    parents as described above - i.e. now there is a resulting tree t, and
    some parents trees tp[i] i=[0..nparent). The generalization builds on
    the fact that usual diff
    
    D(A,B)
    
    is by definition the same as combined diff
    
    D(A,[B]),
    
    so if we could rework the code for common case and make it be not slower
    for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
    multiparent diff tree-walker would greatly benefit generating
    combine-diff.
    
    What we do is as follows:
    
    1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
       a paths generator (new name diff_tree_paths()), with each generated path
       being `struct combine_diff_path` with info for path, new sha1,mode and for
       every parent which sha1,mode it was in it.
    
    2) From that info, we can still generate usual diff queue with
       struct diff_filepairs, via "exporting" generated
       combine_diff_path, if we know we run for nparent=1 case.
       (see emit_diff() which is now named emit_diff_first_parent_only())
    
    3) In order for diff_can_quit_early(), which checks
    
           DIFF_OPT_TST(opt, HAS_CHANGES))
    
       to work, that exporting have to be happening not in bulk, but
       incrementally, one diff path at a time.
    
       For such consumers, there is a new callback in diff_options
       introduced:
    
           ->pathchange(opt, struct combine_diff_path *)
    
       which, if set to !NULL, is called for every generated path.
    
       (see new compat ll_diff_tree_sha1() wrapper around new paths
        generator for setup)
    
    4) The paths generation itself, is reworked from previous
       ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
       scheme" provided above:
    
       On the start we allocate [nparent] arrays in place what was
       earlier just for one parent tree.
    
       then we just generalize loops, and comparison according to the
       algorithm.
    
    Some notes(*):
    
    1) alloca(), for small arrays, is used for "runs not slower for
       nparent=1 case than before" goal - if we change it to xmalloc()/free()
       the timings get ~1% worse. For alloca() we use just-introduced
       xalloca/xalloca_free compatibility wrappers, so it should not be a
       portability problem.
    
    2) For every parent tree, we need to keep a tag, whether entry from that
       parent equals to entry from minimal parent. For performance reasons I'm
       keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
       Not doing so, we'd need to alloca another [nparent] array, which hurts
       performance.
    
    3) For emitted paths, memory could be reused, if we know the path was
       processed via callback and will not be needed later. We use efficient
       hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
       of potential additional slowdown.
    
    4) goto(s) are used in several places, as the code executes a little bit
       faster with lowered register pressure.
    
    Also
    
    - we should now check for FIND_COPIES_HARDER not only when two entries
      names are the same, and their hashes are equal, but also for a case,
      when a path was removed from some of all parents having it.
    
      The reason is, if we don't, that path won't be emitted at all (see
      "a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
      all paths - with diff or without - to be emitted, to be later analyzed
      for being copies sources.
    
      The new check is only necessary for nparent >1, as for nparent=1 case
      xmin_eqtotal always =1 =nparent, and a path is always added to diff as
      removal.
    
    ~~~~~~~~
    
    Timings for
    
        # without -c, i.e. testing only nparent=1 case
        `git log --raw --no-abbrev --no-renames`
    
    before and after the patch are as follows:
    
                    navy.git        linux.git v3.10..v3.11
    
        before      0.611s          1.889s
        after       0.619s          1.907s
        slowdown    1.3%            0.9%
    
    This timings show we did no harm to usual diff(tree1,tree2) generation.
    From the table we can see that we actually did ~1% slowdown, but I think
    I've "earned" that 1% in the previous patch ("tree-diff: reuse base
    str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
    net timings stays approximately the same.
    
    The output also stayed the same.
    
    (*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
        we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
       "do no harm for nparent=1 case" rule.
    
    For linux.git, combined diff will run an order of magnitude faster and
    appropriate timings will be provided in the next commit, as we'll be
    taking advantage of the new diff tree-walker for combined-diff
    generation there.
    
    P.S. and combined diff is not some exotic/for-play-only stuff - for
    example for a program I write to represent Git archives as readonly
    filesystem, there is initial scan with
    
        `git log --reverse --raw --no-abbrev --no-renames -c`
    
    to extract log of what was created/changed when, as a result building a
    map
    
        {}  sha1    ->  in which commit (and date) a content was added
    
    that `-c` means also show combined diff for merges, and without them, if
    a merge is non-trivial (merges changes from two parents with both having
    separate changes to a file), or an evil one, the map will not be full,
    i.e. some valid sha1 would be absent from it.
    
    That case was my initial motivation for combined diffs speedup.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Mar 27, 2014
  1. tree-diff: reuse base str(buf) memory on sub-tree recursion

    Kirill Smelkov authored committed
    Instead of allocating it all the time for every subtree in
    ll_diff_tree_sha1, let's allocate it once in diff_tree_sha1, and then
    all callee just use it in stacking style, without memory allocations.
    
    This should be faster, and for me this change gives the following
    slight speedups for
    
        git log --raw --no-abbrev --no-renames --format='%H'
    
                    navy.git    linux.git v3.10..v3.11
    
        before      0.618s      1.903s
        after       0.611s      1.889s
        speedup     1.1%        0.7%
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. tree-diff: no need to call "full" diff_tree_sha1 from show_path()

    Kirill Smelkov authored committed
    As described in previous commit, when recursing into sub-trees, we can
    use lower-level tree walker, since its interface is now sha1 based.
    
    The change is ok, because diff_tree_sha1() only invokes
    ll_diff_tree_sha1(), and also, if base is empty, try_to_follow_renames().
    But base is not empty here, as we have added a path and '/' before
    recursing.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  3. tree-diff: rework diff_tree interface to be sha1 based

    Kirill Smelkov authored committed
    In the next commit this will allow to reduce intermediate calls, when
    recursing into subtrees - at that stage we know only subtree sha1, and
    it is natural for tree walker to start from that phase. For now we do
    
        diff_tree
            show_path
                diff_tree_sha1
                    diff_tree
                        ...
    
    and the change will allow to reduce it to
    
        diff_tree
            show_path
                diff_tree
    
    Also, it will allow to omit allocating strbuf for each subtree, and just
    reuse the common strbuf via playing with its len.
    
    The above-mentioned improvements go in the next 2 patches.
    
    The downside is that try_to_follow_renames(), if active, we cause
    re-reading of 2 initial trees, which was negligible based on my timings,
    and which is outweighed cogently by the upsides.
    
    NOTE To keep with the current interface and semantics, I needed to
    rename the function from diff_tree() to diff_tree_sha1(). As
    diff_tree_sha1() was already used, and the function we are talking here
    is its more low-level helper, let's use convention for prefixing
    such helpers with "ll_". So the final renaming is
    
        diff_tree() -> ll_diff_tree_sha1()
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Mar 26, 2014
  1. tree-diff: diff_tree() should now be static

    Kirill Smelkov authored committed
    We reworked all its users to use the functionality through
    diff_tree_sha1 variant in recent patches (see "tree-diff: allow
    diff_tree_sha1 to accept NULL sha1" and what comes next).
    
    diff_tree() is now not used outside tree-diff.c - make it static.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. tree-diff: remove special-case diff-emitting code for empty-tree cases

    Kirill Smelkov authored committed
    While walking trees, we iterate their entries from lowest to highest in
    sort order, so empty tree means all entries were already went over.
    
    If we artificially assign +infinity value to such tree "entry", it will
    go after all usual entries, and through the usual driver loop we will be
    taking the same actions, which were hand-coded for special cases, i.e.
    
        t1 empty, t2 non-empty
            pathcmp(+∞, t2) -> +1
            show_path(/*t1=*/NULL, t2);     /* = t1 > t2 case in main loop */
    
        t1 non-empty, t2-empty
            pathcmp(t1, +∞) -> -1
            show_path(t1, /*t2=*/NULL);     /* = t1 < t2 case in main loop */
    
    In other words when we have t1 and t2, we return a sign that tells the
    caller to indicate the "earlier" one to be emitted, and by returning the
    sign that causes the non-empty side to be emitted, we will automatically
    cause the entries from the remaining side to be emitted, without
    attempting to touch the empty side at all.  We can teach
    tree_entry_pathcmp() to pretend that an empty tree has an element that
    sorts after anything else to achieve this.
    
    Right now we never go to when compared tree descriptors are both
    infinity, as this condition is checked in the loop beginning as
    finishing criteria, but will do so in the future, when there will be
    several parents iterated simultaneously, and some pair of them would run
    to the end.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Mar 20, 2014
  1. tree-diff: simplify tree_entry_pathcmp

    Kirill Smelkov authored committed
    Since an earlier "Finally switch over tree descriptors to contain a
    pre-parsed entry", we can safely access all tree_desc->entry fields
    directly instead of first "extracting" them through
    tree_entry_extract.
    
    Use it. The code generated stays the same - only it now visually looks
    cleaner.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. tree-diff: show_path prototype is not needed anymore

    Kirill Smelkov authored committed
    We moved all action-taking code below show_path() in recent HEAD~~
    (tree-diff: move all action-taking code out of compare_tree_entry).
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  3. tree-diff: rename compare_tree_entry -> tree_entry_pathcmp

    Kirill Smelkov authored committed
    Since previous commit, this function does not compare entry hashes, and
    mode are compared fully outside of it. So what it does is compare entry
    names and DIR bit in modes. Reflect this in its name.
    
    Add documentation stating the semantics, and move the note about
    files/dirs comparison to it.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  4. tree-diff: move all action-taking code out of compare_tree_entry()

    Kirill Smelkov authored committed
    - let it do only comparison.
    
    This way the code is cleaner and more structured - cmp function only
    compares, and the driver takes action based on comparison result.
    
    There should be no change in performance, as effectively, we just move
    if series from on place into another, and merge it to was-already-there
    same switch/if, so the result is maybe a little bit faster.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  5. tree-diff: don't assume compare_tree_entry() returns -1,0,1

    Kirill Smelkov authored committed
    It does, but we'll be reworking it in the next patch after it won't, and
    besides it is better to stick to standard
    strcmp/memcmp/base_name_compare/etc... convention, where comparison
    function returns <0, =0, >0
    
    Regarding performance, comparing for <0, =0, >0 should be a little bit
    faster, than switch, because it is just 1 test-without-immediate
    instruction and then up to 3 conditional branches, and in switch you
    have up to 3 tests with immediate and up to 3 conditional branches.
    
    No worry, that update_tree_entry(t2) is duplicated for =0 and >0 - it
    will be good after we'll be adding support for multiparent walker and
    will stay that way.
    
    =0 case goes first, because it happens more often in real diffs - i.e.
    paths are the same.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  6. tree-diff: consolidate code for emitting diffs and recursion in one p…

    Kirill Smelkov authored committed
    …lace
    
    Currently both compare_tree_entry() and show_entry() invoke opt diff
    callbacks (opt->add_remove() and opt->change()), and also they both have
    code which decides whether to recurse into sub-tree, and whether to emit
    a tree as separate entry if DIFF_OPT_TREE_IN_RECURSIVE is set.
    
    I.e. we have code duplication and logic scattered on two places.
    
    Let's consolidate it - all diff emiting code and recurion logic moves
    to show_entry, which is now named as show_path, because it shows diff
    for a path, based on up to two tree entries, with actual diff emitting
    code being kept in new helper emit_diff() for clarity.
    
    What we have as the result, is that compare_tree_entry is now free from
    code with logic for diff generation, and also performance is not
    affected as timings for
    
        `git log --raw --no-abbrev --no-renames`
    
    for navy.git and `linux.git v3.10..v3.11`, just like in previous patch,
    stay the same.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Mar 4, 2014
  1. tree-diff: show_tree() is not needed

    Kirill Smelkov authored committed
    We don't need special code for showing added/removed subtree, because we
    can do the same via diff_tree_sha1, just passing NULL for absent tree.
    
    And compared to show_tree(), which was calling show_entry() for every
    tree entry, that would lead to the same show_entry() callings:
    
        show_tree(t):
            for e in t.entries:
                show_entry(e)
    
        diff_tree_sha1(NULL, new):  /* the same applies to (old, NULL) */
            diff_tree(t1=NULL, t2)
                ...
                if (!t1->size)
                    show_entry(t2)
                ...
    
    and possible overhead is negligible, since after the patch, timing for
    
        `git log --raw --no-abbrev --no-renames`
    
    for navy.git and `linux.git v3.10..v3.11` is practically the same.
    
    So let's say goodbye to show_tree() - it removes some code, but also,
    and what is important, consolidates more code for showing/recursing into
    trees into one place.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Feb 24, 2014
  1. tree-diff: no need to pass match to skip_uninteresting()

    Kirill Smelkov authored committed
    It is neither used there as input, nor the output written through it, is
    used outside.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. tree-diff: no need to manually verify that there is no mode change fo…

    Kirill Smelkov authored committed
    …r a path
    
    Because if there is, such two tree entries would never be compared as
    equal - the code in base_name_compare() explicitly compares modes, if
    there is a change for dir bit, even for equal paths, entries would
    compare as different.
    
    The code I'm removing here is from 2005 April 262e82b (Fix diff-tree
    recursion), which pre-dates base_name_compare() introduction in 958ba6c
    (Introduce "base_name_compare()" helper function) by a month.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Feb 5, 2014
  1. tree-diff: convert diff_root_tree_sha1() to just call diff_tree_sha1 …

    Kirill Smelkov authored committed
    …with old=NULL
    
    Now since diff_tree_sha1 understands NULL for both old and new, we could
    indicate an empty tree for root commit by providing just NULL for old
    sha1.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. tree-diff: allow diff_tree_sha1 to accept NULL sha1

    Kirill Smelkov authored committed
    which would mean that corresponding tree - old or new - is empty.
    
    As followup patches will show, that functionality was already needed in
    several places of Git codebase, but there, we were preparing empty
    tree_desc objects by hand, with some code duplication.
    
    For handling sha1 = NULL case, let's reuse fill_tree_descriptor() which
    returns just empty tree_desc in that case.
    
    Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Oct 28, 2013
  1. Nguyễn Thái Ngọc Duy

    pathspec: stop --*-pathspecs impact on internal parse_pathspec() uses

    pclouds authored committed
    Normally parse_pathspec() is used on command line arguments where it
    can do fancy thing like parsing magic on each argument or adding magic
    for all pathspecs based on --*-pathspecs options.
    
    There's another use of parse_pathspec(), where pathspec is needed, but
    the input is known to be pure paths. In this case we usually don't
    want --*-pathspecs to interfere. And we definitely do not want to
    parse magic in these paths, regardless of --literal-pathspecs.
    
    Add new flag PATHSPEC_LITERAL_PATH for this purpose. When it's set,
    --*-pathspecs are ignored, no magic is parsed. And if the caller
    allows PATHSPEC_LITERAL (i.e. the next calls can take literal magic),
    then PATHSPEC_LITERAL will be set.
    
    This fixes cases where git chokes when GIT_*_PATHSPECS are set because
    parse_pathspec() indicates it won't take any magic. But
    GIT_*_PATHSPECS add them anyway. These are
    
       export GIT_LITERAL_PATHSPECS=1
       git blame -- something
       git log --follow something
       git log --merge
    
    "git ls-files --with-tree=path" (aka parse_pathspec() in
    overlay_tree_on_cache()) is safe because the input is empty, and
    producing one pathspec due to PATHSPEC_PREFER_CWD does not take any
    magic into account.
    
    Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Acked-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Jul 15, 2013
  1. Nguyễn Thái Ngọc Duy

    pathspec: support :(literal) syntax for noglob pathspec

    pclouds authored committed
    Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. Nguyễn Thái Ngọc Duy

    tree-diff: remove the use of pathspec's raw[] in follow-rename codepath

    pclouds authored committed
    Put a checkpoint to guard unsupported pathspec features in future.
    
    Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  3. Nguyễn Thái Ngọc Duy

    remove init_pathspec() in favor of parse_pathspec()

    pclouds authored committed
    While at there, move free_pathspec() to pathspec.c
    
    Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  4. Nguyễn Thái Ngọc Duy

    remove diff_tree_{setup,release}_paths

    pclouds authored committed
    Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  5. Nguyễn Thái Ngọc Duy

    guard against new pathspec magic in pathspec matching code

    pclouds authored committed
    GUARD_PATHSPEC() marks pathspec-sensitive code, basically all those
    that touch anything in 'struct pathspec' except fields "nr" and
    "original". GUARD_PATHSPEC() is not supposed to fail. It's mainly to
    help the designers catch unsupported codepaths.
    
    Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  6. Nguyễn Thái Ngọc Duy

    parse_pathspec: add special flag for max_depth feature

    pclouds authored committed
    match_pathspec_depth() and tree_entry_interesting() check max_depth
    field in order to support "git grep --max-depth". The feature
    activation is tied to "recursive" field, which led to some unwanted
    activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec
    matching in unpack_trees - 2012-01-15).
    
    This patch decouples the activation from "recursive" field, puts it in
    "magic" field instead. This makes sure that only "git grep" can
    activate this feature. And because parse_pathspec knows when the
    feature is not used, it does not need to sort pathspec (required for
    max_depth to work correctly). A small win for non-grep cases.
    
    Even though a new magic flag is introduced, no magic syntax is. The
    magic can be only enabled by parse_pathspec() caller. We might someday
    want to support ":(maxdepth:10)src." It all depends on actual use
    cases.
    
    max_depth feature cannot be enabled via init_pathspec() anymore. But
    that's ok because init_pathspec() is on its way to /dev/null.
    
    Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Aug 27, 2012
  1. Merge branch 'jk/maint-null-in-trees'

    authored
    We do not want a link to 0{40} object stored anywhere in our objects.
    
    * jk/maint-null-in-trees:
      fsck: detect null sha1 in tree entries
      do not write null sha1s to on-disk index
      diff: do not use null sha1 as a sentinel value
Commits on Aug 3, 2012
  1. trast

    diff_setup_done(): return void

    trast authored committed
    diff_setup_done() has historically returned an error code, but lost
    the last nonzero return in 943d5b7 (allow diff.renamelimit to be set
    regardless of -M/-C, 2006-08-09).  The callers were in a pretty
    confused state: some actually checked for the return code, and some
    did not.
    
    Let it return void, and patch all callers to take this into account.
    This conveniently also gets rid of a handful of different(!) error
    messages that could never be triggered anyway.
    
    Note that the function can still die().
    
    Signed-off-by: Thomas Rast <trast@student.ethz.ch>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Jul 29, 2012
  1. Jeff King

    diff: do not use null sha1 as a sentinel value

    peff authored committed
    The diff code represents paths using the diff_filespec
    struct. This struct has a sha1 to represent the sha1 of the
    content at that path, as well as a sha1_valid member which
    indicates whether its sha1 field is actually useful. If
    sha1_valid is not true, then the filespec represents a
    working tree file (e.g., for the no-index case, or for when
    the index is not up-to-date).
    
    The diff_filespec is only used internally, though. At the
    interfaces to the diff subsystem, callers feed the sha1
    directly, and we create a diff_filespec from it. It's at
    that point that we look at the sha1 and decide whether it is
    valid or not; callers may pass the null sha1 as a sentinel
    value to indicate that it is not.
    
    We should not typically see the null sha1 coming from any
    other source (e.g., in the index itself, or from a tree).
    However, a corrupt tree might have a null sha1, which would
    cause "diff --patch" to accidentally diff the working tree
    version of a file instead of treating it as a blob.
    
    This patch extends the edges of the diff interface to accept
    a "sha1_valid" flag whenever we accept a sha1, and to use
    that flag when creating a filespec. In some cases, this
    means passing the flag through several layers, making the
    code change larger than would be desirable.
    
    One alternative would be to simply die() upon seeing
    corrupted trees with null sha1s. However, this fix more
    directly addresses the problem (while bogus sha1s in a tree
    are probably a bad thing, it is really the sentinel
    confusion sending us down the wrong code path that is what
    makes it devastating). And it means that git is more capable
    of examining and debugging these corrupted trees. For
    example, you can still "diff --raw" such a tree to find out
    when the bogus entry was introduced; you just cannot do a
    "--patch" diff (just as you could not with any other
    corrupted tree, as we do not have any content to diff).
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Dec 16, 2011
  1. Jeff King

    use custom rename score during --follow

    peff authored committed
    If you provide a custom rename score on the command line,
    like:
    
      git log -M50 --follow foo.c
    
    it is completely ignored, and there is no way to --follow
    with a looser rename score. Instead, let's use the same
    rename score that will be used for generating diffs. This is
    convenient, and mirrors what we do with the break-score.
    
    You can see an example of it being useful in git.git:
    
      $ git log --oneline --summary --follow \
    	    Documentation/technical/api-string-list.txt
      86d4b52 string-list: Add API to remove an item from an unsorted list
      1d2f80f string_list: Fix argument order for string_list_append
      e242148 string-list: add unsorted_string_list_lookup()
      0dda1d1 Fix two leftovers from path_list->string_list
      c455c87 Rename path_list to string_list
       create mode 100644 Documentation/technical/api-string-list.txt
    
      $ git log --oneline --summary -M40 --follow \
    	  Documentation/technical/api-string-list.txt
      86d4b52 string-list: Add API to remove an item from an unsorted list
      1d2f80f string_list: Fix argument order for string_list_append
      e242148 string-list: add unsorted_string_list_lookup()
      0dda1d1 Fix two leftovers from path_list->string_list
      c455c87 Rename path_list to string_list
       rename Documentation/technical/{api-path-list.txt => api-string-list.txt} (47%)
      328a475 path-list documentation: document all functions and data structures
      530e741 Start preparing the API documents.
       create mode 100644 Documentation/technical/api-path-list.txt
    
    You could have two separate rename scores, one for following
    and one for diff. But almost nobody is going to want that,
    and it would just be unnecessarily confusing. Besides which,
    we re-use the diff results from try_to_follow_renames for
    the actual diff output, which means having them as separate
    scores is actively wrong. E.g., with the current code, you
    get:
    
      $ git log --oneline --diff-filter=R --name-status \
                -M90 --follow git.spec.in
      27dedf0 GIT 0.99.9j aka 1.0rc3
      R084    git-core.spec.in        git.spec.in
      f85639c Rename the RPM from "git" to "git-core"
      R098    git.spec.in     git-core.spec.in
    
    The first one should not be considered a rename by the -M
    score we gave, but we print it anyway, since we blindly
    re-use the diff information from the follow (which uses the
    default score). So this could also be considered simply a
    bug-fix, as with the current code "-M" is completely ignored
    when using "--follow".
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Oct 27, 2011
  1. Nguyễn Thái Ngọc Duy

    tree_entry_interesting(): give meaningful names to return values

    pclouds authored committed
    It is a basic code hygiene to avoid magic constants that are unnamed.
    Besides, this helps extending the value later on for "interesting, but
    cannot decide if the entry truely matches yet" (ie. prefix matches)
    
    Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. Nguyễn Thái Ngọc Duy

    tree-walk.c: do not leak internal structure in tree_entry_len()

    pclouds authored committed
    tree_entry_len() does not simply take two random arguments and return
    a tree length. The two pointers must point to a tree item structure,
    or struct name_entry. Passing random pointers will return incorrect
    value.
    
    Force callers to pass struct name_entry instead of two pointers (with
    hope that they don't manually construct struct name_entry themselves)
    
    Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Jun 6, 2011
  1. Merge branch 'jk/diff-not-so-quick'

    authored
    * jk/diff-not-so-quick:
      diff: futureproof "stop feeding the backend early" logic
      diff_tree: disable QUICK optimization with diff filter
    
    Conflicts:
    	diff.c
Commits on May 31, 2011
  1. diff: futureproof "stop feeding the backend early" logic

    authored
    Refactor the "do not stop feeding the backend early" logic into a small
    helper function and use it in both run_diff_files() and diff_tree() that
    has the stop-early optimization. We may later add other types of diffcore
    transformation that require to look at the whole result like diff-filter
    does, and having the logic in a single place is essential for longer term
    maintainability.
    
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. Jeff King

    diff_tree: disable QUICK optimization with diff filter

    peff authored committed
    We stop looking for changes early with QUICK, so our diff
    queue contains only a subset of the changes. However, we
    don't apply diff filters until later; it will appear at that
    point as though there are no changes matching our filter,
    when in reality we simply didn't keep looking for changes
    long enough.
    
    Commit 2cfe8a6 (diff --quiet: disable optimization when
    --diff-filter=X is used, 2011-03-16) fixes this in some
    cases by disabling the optimization when a filter is
    present. However, it only tweaked run_diff_files, missing
    the similar case in diff_tree. Thus the fix worked only for
    diffing the working tree and index, but not between trees.
    
    Noticed by Yasushi SHOJI.
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on May 6, 2011
  1. Merge branch 'nd/struct-pathspec'

    authored
    * nd/struct-pathspec:
      pathspec: rename per-item field has_wildcard to use_wildcard
      Improve tree_entry_interesting() handling code
      Convert read_tree{,_recursive} to support struct pathspec
      Reimplement read_tree_recursive() using tree_entry_interesting()
Commits on Mar 25, 2011
  1. Nguyễn Thái Ngọc Duy

    Improve tree_entry_interesting() handling code

    pclouds authored committed
    t_e_i() can return -1 or 2 to early shortcut a search. Current code
    may use up to two variables to handle it. One for saving return value
    from t_e_i temporarily, one for saving return code 2.
    
    The second variable is not needed. If we make sure the first variable
    does not change until the next t_e_i() call, then we can do something
    like this:
    
    int ret = 0;
    
    while (...) {
    	if (ret != 2) {
    		ret = t_e_i();
    		if (ret < 0) /* no longer interesting */
    			break;
    		if (ret == 0) /* skip this round */
    			continue;
    	}
    	/* ret > 0, interesting */
    }
    
    Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Something went wrong with that request. Please try again.