Skip to content

Commit 2bbc86f

Browse files
committed
merge-ort: fix corner case in recursive submodule/directory conflict handling
At GitHub, a few repositories were triggering errors of the form: git: merge-ort.c:3037: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed. Aborted (core dumped) While these may look similar to both a562d90 (merge-ort: fix failing merges in special corner case, 2025-11-03) and f6ecb60 (merge-ort: fix directory rename on top of source of other rename/delete, 2025-08-06) the cause is different and in this case the problem is not an over-conservative assertion, but a bug before the assertion where we did not update all relevant state appropriately. It sadly took me a really long time to figure out how to get a simple reproducer for this one. It doesn't really have that many moving parts, but there are multiple pieces of background information needed to understand it. First of all, when we have two files added at the same path, merge-ort does a two-way merge of those files. If we have two directories added at the same path, we basically do the same thing (taking the union of files, and two-way merging files with the same name). But two-way merging requires components of the same type. We can't merge the contents of a regular file with a directory, or with a symlink, or with a submodule. Nor can any of those other types be merged with each other, e.g. merging a submodule with a directory is a bad idea. When two paths have the same name but their types do not match, merge-ort is forced to move one of them to an alternate filename (using the unique_path() function). Second, if two commits being merged have more than one merge-base, merge-ort will merge the merge-bases to create a virtual merge-base, and use that as the base commit. Third, one of the really important optimizations in merge-ort is trivial tree-level resolution (roughly meaning merging trees without recursing into them). This optimization has some nuance to it that is important to the current bug, and to understand it, it helps to first look at the high-level overview of how merge-ort runs; there are basically three high-level functions that the work is divided between: collect_merge_info() - walks the top-level trees getting individual paths of interest detect_renames() - detect renames between paths in order to match up paths for three-way merging process_entries() - does a few things of interest: * three-way merging of files, * other special handling (e.g. adjusting paths with conflicting types to avoid path collisions) * as it finishes handling all the files within a subdirectory, writes out a new tree object for that directory If it were not for renames, we could just always do tree-level merging whenever the tree on at least one side was unmodified. Unfortunately, we need to recurse into trees to determine whether there are renames. However, we can also do tree-level merging so long as there aren't any *relevant* renames (another merge-ort optimization), which we can determine without recursing into trees. We would also be able to do tree-level merging if we somehow a priori knew what renames existed, by only recursing into the trees which we could otherwise trivially merge if they contained files involved in renames. That might not seem useful, because we need to find out the renames and we have to recurse into trees to do so, but when you find out that the process_entries() step is more computationally expensive than the collect_merge_info() step, it yields an interesting strategy: * run collect_merge_info() * run detect_renames() * cache the renames() * restart -- rerun collect_merge_info(), using the cached renames to only recurse into the needed trees * we already have the renames cached so no need to re-detect * run process_entries() on the reduced list of paths which was implemented back in 7bee6c1 (merge-ort: avoid recursing into directories when we don't need to, 2021-07-16). Crucially, this restarting only occurs if the number of paths we could skip recursing into exceeds the number we still need to recurse into by some safety factor (wanted_factor in handle_deferred_entries()); forgetting this fact is a great way to repeatedly fail to create a minimal testcase for several days and go down alternate wrong paths. Now, I earlier summarized this optimization as "merging trees without recursing into them", but this optimization does not require that all three sides of history has a directory at a given path. So long as the tree on one side matches the tree in the base version, we can decide to resolve in favor of whatever the other side of history has at that path -- be it a directory, a file, a submodule, or a symlink. Unfortunately, the code in question didn't fully realize this, and was written assuming the base version and both sides would have a directory at the given path, as can be seen by the "ci->filemask == 0" comment in resolve_trivial_directory_merge() that was added as part of 7bee6c1 (merge-ort: avoid recursing into directories when we don't need to, 2021-07-16). A few additional lines of code are needed to handle cases where we have something other than a directory on the other side of history. But, knowing that resolve_trivial_directory_merge() doesn't have sufficient state updating logic doesn't show us how to trigger a bug without combining with the other bits of information we provided above. Here's a relevant testcase: * branches A & B * commit A1: adds "folder" as a directory with files tracked under it * commit B1: adds "folder" as a submodule * commit A2: merges B1 into A1, keeping "folder" as a directory (and in fact, with no changes to "folder" since A1), discarding the submodule * commit B2: merges A1 into B1, keeping "folder" as a submodule (and in fact, with no changes to "folder" since B1), discarding the directory Here, if we try to merge A2 & B2, the logic proceeds as follows: * we have multiple merge-bases: A1 & B1. So we have to merge those to get a virtual merge base. * due to "folder" as a directory and "folder" as a submodule, the path collision logic triggers and renames "folder" as a submodule to "folder~Temporary merge branch 2" so we can keep it alongside "folder" as a directory. * we now have a virtual merge base (containing both "folder" directory and a "folder~Temporary merge branch 2" submodule) and can now do the outer merge * in the first step of the outer merge, we attempt to defer recursing into folder/ as a directory, but find we need to for rename detection. * in rename detection, we note that "folder~Temporary merge branch 2" has the same hash as "folder" as a submodule in B2, which means we have an exact rename. * after rename detection, we discover no path in folder/ is needed for renames, and so we can cache renames and restart. * after restarting, we avoid recursing into "folder/" and realize we can resolve it trivially since it hasn't been modified. The resolution removes "folder/", leaving us only "folder" as a submodule from commit B2. * After this point, we should have a rename/delete conflict on "folder~Temporary merge branch 2" -> "folder", but our marking of the merge of "folder" as clean broke our ability to handle that and in fact triggers an assertion in process_renames(). When there was a df_conflict (directory/"file" conflict, where "file" could be submodule or regular file or symlink), ensure resolve_trivial_directory_merge() handles it properly. In particular: * do not preemptively mark the path as cleanly merged if the remaining path is a file; allow it to be processed in process_entries() later to determine if it was clean * clear the parts of dirmask or filemask corresponding to the matching sides of history, since we are resolving those away * clear the df_conflict bit afterwards; since we cleared away the two matching sides and only have one side left, that one side can't have a directory/file conflict with itself. Also add the above minimal testcase showcasing this bug to t6422, **with a sufficient number of paths under the folder/ directory to actually trigger it**. (I wish I could have all those days back from all the wrong paths I went down due to not having enough files under that directory...) I know this commit has a very high ratio of lines in the commit message to lines of comments, and a relatively high ratio of comments to actual code, but given how long it took me to track down, on the off chance that we ever need to further modify this logic, I wanted it thoroughly documented for future me and for whatever other poor soul might end up needing to read this commit message. Signed-off-by: Elijah Newren <newren@gmail.com>
1 parent b31ab93 commit 2bbc86f

File tree

2 files changed

+120
-1
lines changed

2 files changed

+120
-1
lines changed

merge-ort.c

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1502,11 +1502,44 @@ static void resolve_trivial_directory_merge(struct conflict_info *ci, int side)
15021502
VERIFY_CI(ci);
15031503
assert((side == 1 && ci->match_mask == 5) ||
15041504
(side == 2 && ci->match_mask == 3));
1505+
1506+
/*
1507+
* Since ci->stages[0] matches ci->stages[3-side], resolve merge in
1508+
* favor of ci->stages[side].
1509+
*/
15051510
oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
15061511
ci->merged.result.mode = ci->stages[side].mode;
15071512
ci->merged.is_null = is_null_oid(&ci->stages[side].oid);
1513+
1514+
/*
1515+
* Because we resolved in favor of "side", we are no longer
1516+
* considering the paths which matched (i.e. had the same hash) any
1517+
* more. Strip the matching paths from both dirmask & filemask.
1518+
* Another consequence of merging in favor of side is that we can no
1519+
* longer have a directory/file conflict either..but there's a slight
1520+
* nuance we consider before clearing it.
1521+
*
1522+
* In most cases, resolving in favor of the other side means there's
1523+
* no conflict at all, but if we had a directory/file conflict to
1524+
* start, and the directory is resolved away, the remaining file could
1525+
* still be part of a rename. If the remaining file is part of a
1526+
* rename, then it may also be part of a rename conflict (e.g.
1527+
* rename/delete or rename/rename(1to2)), so we can't
1528+
* mark it as a clean merge if we started with a directory/file
1529+
* conflict and still have a file left.
1530+
*
1531+
* In contrast, if we started with a directory/file conflict and
1532+
* still have a directory left, no file under that directory can be
1533+
* part of a rename, otherwise we would have had to recurse into the
1534+
* directory and would have never ended up within
1535+
* resolve_trivial_directory_merge() for that directory.
1536+
*/
1537+
ci->dirmask &= (~ci->match_mask);
1538+
ci->filemask &= (~ci->match_mask);
1539+
assert(!ci->filemask || !ci->dirmask);
15081540
ci->match_mask = 0;
1509-
ci->merged.clean = 1; /* (ci->filemask == 0); */
1541+
ci->merged.clean = !ci->df_conflict || ci->dirmask;
1542+
ci->df_conflict = 0;
15101543
}
15111544

15121545
static int handle_deferred_entries(struct merge_options *opt,

t/t6422-merge-rename-corner-cases.sh

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1439,4 +1439,90 @@ test_expect_success 'rename/rename(1to2) with a binary file' '
14391439
)
14401440
'
14411441

1442+
# Testcase preliminary submodule/directory conflict and submodule rename
1443+
# Commit O: <empty, or additional irrelevant stuff>
1444+
# Commit A1: introduce "folder" (as a tree)
1445+
# Commit B1: introduce "folder" (as a submodule)
1446+
# Commit A2: merge B1 into A1, but keep folder as a tree
1447+
# Commit B2: merge A1 into B1, but keep folder as a submodule
1448+
# Merge A2 & B2
1449+
test_setup_submodule_directory_preliminary_conflict () {
1450+
git init submodule_directory_preliminary_conflict &&
1451+
(
1452+
cd submodule_directory_preliminary_conflict &&
1453+
1454+
# Trying to do the A2 and B2 merges above is slightly more
1455+
# challenging with a local submodule (because checking out
1456+
# another commit has the submodule in the way). Instead,
1457+
# first create the commits with the wrong parents but right
1458+
# trees, in the order A1, A2, B1, B2...
1459+
#
1460+
# Then go back and create new A2 & B2 with the correct
1461+
# parents and the same trees.
1462+
1463+
git commit --allow-empty -m orig &&
1464+
1465+
git branch A &&
1466+
git branch B &&
1467+
1468+
git checkout B &&
1469+
mkdir folder &&
1470+
echo A>folder/A &&
1471+
echo B>folder/B &&
1472+
echo C>folder/C &&
1473+
echo D>folder/D &&
1474+
echo E>folder/E &&
1475+
git add folder &&
1476+
git commit -m B1 &&
1477+
1478+
git commit --allow-empty -m B2 &&
1479+
1480+
git checkout A &&
1481+
git init folder &&
1482+
(
1483+
cd folder &&
1484+
>Z &&
1485+
>Y &&
1486+
git add Z Y &&
1487+
git commit -m "original submodule commit"
1488+
) &&
1489+
git add folder &&
1490+
git commit -m A1 &&
1491+
1492+
git commit --allow-empty -m A2 &&
1493+
1494+
NewA2=$(git commit-tree -p A^ -p B^ -m "Merge B into A" A^{tree}) &&
1495+
NewB2=$(git commit-tree -p B^ -p A^ -m "Merge A into B" B^{tree}) &&
1496+
git update-ref refs/heads/A $NewA2 &&
1497+
git update-ref refs/heads/B $NewB2
1498+
)
1499+
}
1500+
1501+
test_expect_success 'submodule/directory preliminary conflict' '
1502+
test_setup_submodule_directory_preliminary_conflict &&
1503+
(
1504+
cd submodule_directory_preliminary_conflict &&
1505+
1506+
git checkout A^0 &&
1507+
1508+
test_expect_code 1 git merge B^0 &&
1509+
1510+
# Make sure the index has the right number of entries
1511+
git ls-files -s >actual &&
1512+
test_line_count = 2 actual &&
1513+
1514+
# The "folder" as directory should have been resolved away
1515+
# as part of the merge. The "folder" as submodule got
1516+
# renamed to "folder~Temporary merge branch 2" in the
1517+
# virtual merge base, resulting in a
1518+
# "folder~Temporary merge branch 2" -> "folder"
1519+
# rename in the outermerge for the submodule, which then
1520+
# becomes part of a rename/delete conflict (because "folder"
1521+
# as a submodule was deleted in A2).
1522+
submod=$(git rev-parse A:folder) &&
1523+
printf "160000 $submod 1\tfolder\n160000 $submod 2\tfolder\n" >expect &&
1524+
test_cmp expect actual
1525+
)
1526+
'
1527+
14421528
test_done

0 commit comments

Comments
 (0)