Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subtree: Fix handling of complex history #493

Open
wants to merge 7 commits into
base: maint
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
183 changes: 151 additions & 32 deletions contrib/subtree/git-subtree.sh
Expand Up @@ -9,12 +9,15 @@ then
set -- -h
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Tom,

On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:

> @@ -796,20 +810,60 @@ cmd_add_commit () {
>  }
>
>  cmd_map () {
> -	oldrev="$1"
> -	newrev="$2"
>
> -	if test -z "$oldrev"
> +	if test -z "$1"

I'd like to keep the nice name. Maybe if it is `local`, there is no longer
a need to replace `$oldrev` by `$1`?

>  	then
>  		die "You must provide a revision to map"
>  	fi
>
> +	oldrev=$(git rev-parse --revs-only "$1") || exit $?
> +	newrev=
> +
> +	if test -n "$2"
> +	then
> +		newrev=$(git rev-parse --revs-only "$2") || exit $?
> +	fi
> +

Would it not make more sense to validate the parameters before calling
`cmd_map`?

In any case, this strikes me like a subject for another commit.

Thanks,
Dscho

P.S.: I'll have to stop reviewing here for the moment, not sure whether
I'll come back to it later today or maybe tomorrow.


>  	cache_setup || exit $?
>  	cache_set "$oldrev" "$newrev"
>
>  	say "Mapped $oldrev => $newrev"
>  }
>
> +cmd_ignore () {
> +	revs=$(git rev-parse $default --revs-only "$@") || exit $?
> +	ensure_single_rev $revs
> +
> +	say "Ignoring $revs"
> +
> +	cache_setup || exit $?
> +
> +	git rev-list $revs |
> +	while read rev
> +	do
> +		cache_set "$rev" ""
> +	done
> +
> +	echo "$revs" >>"$cachedir/processed"
> +}
> +
> +cmd_use () {
> +	revs=$(git rev-parse $default --revs-only "$@") || exit $?
> +	ensure_single_rev $revs
> +
> +	say "Using existing subtree $revs"
> +
> +	cache_setup || exit $?
> +
> +	git rev-list $revs |
> +	while read rev
> +	do
> +		cache_set "$rev" "$rev"
> +	done
> +
> +	echo "$revs" >>"$cachedir/processed"
> +}
> +
>  cmd_split () {
>  	debug "Splitting $dir..."
>  	cache_setup || exit $?
> @@ -827,7 +881,7 @@ cmd_split () {
>  		done
>  	fi
>
> -	unrevs="$(find_existing_splits "$dir" "$revs")"
> +	unrevs="$(find_existing_splits "$dir" "$revs") $(exclude_processed_refs)"
>
>  	mainline="$(find_mainline_ref "$dir" "$revs")"
>  	if test -n "$mainline"
> --
> gitgitgadget
>
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi,

On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:

> From: Tom Clarkson <tom@tqclarkson.com>
>
> Prevent a mainline commit without $dir being treated as a subtree
> commit and pulling in the entire mainline history. Any valid subtree
> commit will have only valid subtree commits as parents, which will be
> unchanged by check_parents.

I feel like this is only half the picture because I have a hard time
stitching these two sentences together.

After studying the code and your patch a bit, it appears to me that
`process_split_commit()` calls `check_parents()` first, which will call
`process_split_commit()` for all as yet unmapped parents. So basically, it
recurses until it found a commit all of whose parents are already mapped,
then permeates that information all the way back.

Doesn't this cause serious issues with stack overflows and all for long
commit histories?

> Signed-off-by: Tom Clarkson <tom@tqclarkson.com>
> ---
>  contrib/subtree/git-subtree.sh | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index e56621a986..fa6293b372 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -224,8 +224,6 @@ cache_setup () {
>  	fi
>  	mkdir -p "$cachedir" ||
>  		die "Can't create new cachedir: $cachedir"
> -	mkdir -p "$cachedir/notree" ||
> -		die "Can't create new cachedir: $cachedir/notree"

It might make sense to talk about this a bit in the commit message.
Essentially, you are replacing the `notree/<rev>` files by mapping `<rev>`
to the empty string.

This makes me wonder, again, whether the file system layout of the cache
can hold up to the demands. If a main project were to merge a subtree
with, say, 10 million commits, wouldn't that mean that `git subtree` would
now fill one directory with 10 million files? I cannot imagine that this
performs well, still.

>  	debug "Using cachedir: $cachedir" >&2
>  }
>
> @@ -255,18 +253,11 @@ check_parents () {
>  	local indent=$(($2 + 1))
>  	for miss in $missed
>  	do
> -		if ! test -r "$cachedir/notree/$miss"
> -		then
> -			debug "  unprocessed parent commit: $miss ($indent)"
> -			process_split_commit "$miss" "" "$indent"
> -		fi
> +		debug "  unprocessed parent commit: $miss ($indent)"
> +		process_split_commit "$miss" "" "$indent"

That makes sense to me, as the `missed` variable only contains as yet
unmapped commits, therefore we do not have to have an equivalent `test -r`
check.

Ciao,
Dscho

>  	done
>  }
>
> -set_notree () {
> -	echo "1" > "$cachedir/notree/$1"
> -}
> -
>  cache_set () {
>  	oldrev="$1"
>  	newrev="$2"
> @@ -719,11 +710,18 @@ process_split_commit () {
>  	# vs. a mainline commit?  Does it matter?
>  	if test -z "$tree"
>  	then
> -		set_notree "$rev"
>  		if test -n "$newparents"
>  		then
> -			cache_set "$rev" "$rev"
> +			if test "$newparents" = "$parents"
> +			then
> +				# if all parents were subtrees, this can be a subtree commit
> +				cache_set "$rev" "$rev"
> +			else
> +				# a mainline commit with tree missing is equivalent to the initial commit
> +				cache_set "$rev" ""
> +			fi
>  		else
> +			# no parents with valid subtree mappings means a commit prior to subtree add
>  			cache_set "$rev" ""
>  		fi
>  		return
> --
> gitgitgadget
>
>

fi
OPTS_SPEC="\
git subtree add --prefix=<prefix> <commit>
git subtree add --prefix=<prefix> <repository> <ref>
git subtree merge --prefix=<prefix> <commit>
git subtree pull --prefix=<prefix> <repository> <ref>
git subtree push --prefix=<prefix> <repository> <ref>
git subtree split --prefix=<prefix> <commit>
git subtree add --prefix=<prefix> <commit>
git subtree add --prefix=<prefix> <repository> <ref>
git subtree merge --prefix=<prefix> <commit>
git subtree pull --prefix=<prefix> <repository> <ref>
git subtree push --prefix=<prefix> <repository> <ref>
git subtree split --prefix=<prefix> <commit>
git subtree map --prefix=<prefix> <mainline> <subtree>
git subtree ignore --prefix=<prefix> <commit>
git subtree use --prefix=<prefix> <commit>
--
h,help show the help
q quiet
Expand All @@ -27,6 +30,7 @@ b,branch= create a new branch from the split subtree
ignore-joins ignore prior --rejoin commits
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Tom,

On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:

> @@ -48,6 +49,7 @@ annotate=
>  squash=
>  message=
>  prefix=
> +clearcache=

It might be more consistent to call it `clear_cache` (i.e. with an
underscore), just like `ignore_joins`.

>
>  debug () {
>  	if test -n "$debug"
> @@ -131,6 +133,9 @@ do
>  	--no-rejoin)
>  		rejoin=
>  		;;
> +	--clear-cache)
> +		clearcache=1
> +		;;
>  	--ignore-joins)
>  		ignore_joins=1
>  		;;
> @@ -206,9 +211,13 @@ debug "opts: {$*}"
>  debug
>
>  cache_setup () {
> -	cachedir="$GIT_DIR/subtree-cache/$$"
> -	rm -rf "$cachedir" ||
> -		die "Can't delete old cachedir: $cachedir"
> +	cachedir="$GIT_DIR/subtree-cache/$prefix"

Excellent, the `prefix` should be "unique enough".

> +	if test -n "$clearcache"
> +	then
> +		debug "Clearing cache"
> +		rm -rf "$cachedir" ||
> +			die "Can't delete old cachedir: $cachedir"
> +	fi
>  	mkdir -p "$cachedir" ||
>  		die "Can't create new cachedir: $cachedir"
>  	mkdir -p "$cachedir/notree" ||
> @@ -266,6 +275,16 @@ cache_set () {
>  	echo "$newrev" >"$cachedir/$oldrev"
>  }
>
> +cache_set_if_unset () {
> +	oldrev="$1"
> +	newrev="$2"

`local`? ;-)

> +	if test -e "$cachedir/$oldrev"
> +	then
> +		return
> +	fi
> +	echo "$newrev" >"$cachedir/$oldrev"

So that directory contains commit mappings, a file for each mapped
revision.

Thinking back to patch 2/11, I am now no longer that sure that it makes
sense to fill it up with every commit in that commit range: performance
suffers when directories contain too many files.

For example, I had a case in the past where it took a minute just to
enumerate a directory, and even looking whether a file existed in that
directory was not exactly fun.

In any case, I would write it slightly shorter:

	test -e "$cachedir/$oldrev" ||
	echo "$newrev" >"$cachedir/$oldrev"

> +}
> +
>  rev_exists () {
>  	if git rev-parse "$1" >/dev/null 2>&1
>  	then
> @@ -375,13 +394,13 @@ find_existing_splits () {
>  			then
>  				# squash commits refer to a subtree
>  				debug "  Squash: $sq from $sub"
> -				cache_set "$sq" "$sub"
> +				cache_set_if_unset "$sq" "$sub"
>  			fi
>  			if test -n "$main" -a -n "$sub"
>  			then
>  				debug "  Prior: $main -> $sub"
> -				cache_set $main $sub
> -				cache_set $sub $sub
> +				cache_set_if_unset $main $sub
> +				cache_set_if_unset $sub $sub
>  				try_remove_previous "$main"
>  				try_remove_previous "$sub"
>  			fi
> @@ -688,6 +707,8 @@ process_split_commit () {
>  		if test -n "$newparents"
>  		then
>  			cache_set "$rev" "$rev"
> +		else
> +			cache_set "$rev" ""

Was this hunk intended to be snuck in here? I can understand the
s/cache_set/cache_set_if_unset/ changes, of course, but not this hunk.

>  		fi
>  		return
>  	fi
> @@ -785,7 +806,7 @@ cmd_split () {
>  			# the 'onto' history is already just the subdir, so
>  			# any parent we find there can be used verbatim
>  			debug "  cache: $rev"
> -			cache_set "$rev" "$rev"
> +			cache_set_if_unset "$rev" "$rev"
>  		done
>  	fi
>
> @@ -798,7 +819,7 @@ cmd_split () {
>  		git rev-list --topo-order --skip=1 $mainline |
>  		while read rev
>  		do
> -			cache_set "$rev" ""
> +			cache_set_if_unset "$rev" ""

Okay. A quite interesting question now would be: are there any callers of
`cache_set` left? If so, why?

Thanks,
Dscho

>  		done || exit $?
>  	fi
>
> --
> gitgitgadget
>
>

onto= try connecting new tree to an existing one
rejoin merge the new branch back into HEAD
clear-cache reset the subtree mapping cache
options for 'add', 'merge', and 'pull'
squash merge subtree changes as a single commit
"
Expand All @@ -48,6 +52,7 @@ annotate=
squash=
message=
prefix=
clearcache=

debug () {
if test -n "$debug"
Expand Down Expand Up @@ -131,6 +136,9 @@ do
--no-rejoin)
rejoin=
;;
--clear-cache)
clearcache=1
;;
--ignore-joins)
ignore_joins=1
;;
Expand All @@ -156,7 +164,7 @@ command="$1"
shift

case "$command" in
add|merge|pull)
add|merge|pull|map|ignore|use)
default=
;;
split|push)
Expand Down Expand Up @@ -187,7 +195,8 @@ dir="$(dirname "$prefix/.")"

if test "$command" != "pull" &&
test "$command" != "add" &&
test "$command" != "push"
test "$command" != "push" &&
test "$command" != "map"
then
revs=$(git rev-parse $default --revs-only "$@") || exit $?
dirs=$(git rev-parse --no-revs --no-flags "$@") || exit $?
Expand All @@ -206,13 +215,15 @@ debug "opts: {$*}"
debug

cache_setup () {
cachedir="$GIT_DIR/subtree-cache/$$"
rm -rf "$cachedir" ||
die "Can't delete old cachedir: $cachedir"
cachedir="$GIT_DIR/subtree-cache/$prefix"
if test -n "$clearcache"
then
debug "Clearing cache"
rm -rf "$cachedir" ||
die "Can't delete old cachedir: $cachedir"
fi
mkdir -p "$cachedir" ||
die "Can't create new cachedir: $cachedir"
mkdir -p "$cachedir/notree" ||
die "Can't create new cachedir: $cachedir/notree"
debug "Using cachedir: $cachedir" >&2
}

Expand All @@ -238,22 +249,15 @@ cache_miss () {
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ed Maste wrote (reply to this):

On Tue, 6 Oct 2020 at 18:05, Tom Clarkson via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Tom Clarkson <tom@tqclarkson.com>
>
> Signed-off-by: Tom Clarkson <tom@tqclarkson.com>
Reviewed-by: Ed Maste <emaste@FreeBSD.org>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Tom,

On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:

> From: Tom Clarkson <tom@tqclarkson.com>
>
> Include recursion depth in debug logs so we can see when the recursion is
> getting out of hand.
>
> Making the cache handle null mappings correctly and adding older commits
> to the cache allows the recursive algorithm to terminate at any point on
> mainline rather than needing to reach either the add point or the initial
> commit.

Makes sense.

> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index 9867718503..160bad95c1 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -244,7 +244,7 @@ check_parents () {
>  	do
>  		if ! test -r "$cachedir/notree/$miss"
>  		then
> -			debug "  incorrect order: $miss"
> +			debug "  unprocessed parent commit: $miss ($indent)"

Without any context, it is hard to understand what the `$indent` variable
is supposed to mean, so it is unclear why we need to print it here.

I _guess_ it is the degree removed from the first-parent lineage?

In any case, it does not hurt here, so I trust that it is good to include
it in the debug output.

>  			process_split_commit "$miss" "" "$indent"
>  		fi
>  	done
> @@ -392,6 +392,24 @@ find_existing_splits () {
>  	done
>  }
>
> +find_mainline_ref () {
> +	debug "Looking for first split..."
> +	dir="$1"
> +	revs="$2"

The `git-subtree` script seems to rely on the `local` construct, using it
in plenty of other circumstances. How about using it here, too?

> +
> +	git log --reverse --grep="^git-subtree-dir: $dir/*\$" \
> +		--no-show-signature --pretty=format:'START %H%n%s%n%n%b%nEND%n' $revs |

Since all you are interested in is the `git-subtree-mainline:` trailer,
wouldn't a format like `%(trailers:key=git-subtree-mainline)` instead of
`START %H%n%s%n%n%b%nEND%n`?

See
https://git-scm.com/docs/git-log#Documentation/git-log.txt-emtrailersoptionsem
for more information about pretty formats.

BTW I am super unfamiliar with `git subtree`'s inner workings, and
therefore it would help me incredibly if the commit message talked a bit
about the commit message layout (with a particular eye on
`git-subtree-dir` and `git-subtree-mainline` which I guess are trailers
added by `git subtree`?)...

> +	while read a b junk
> +	do
> +		case "$a" in
> +		git-subtree-mainline:)
> +			echo "$b"
> +			return
> +			;;
> +		esac
> +	done
> +}
> +
>  copy_commit () {
>  	# We're going to set some environment vars here, so
>  	# do it in a subshell to get rid of them safely later
> @@ -646,9 +664,9 @@ process_split_commit () {
>
>  	progress "$revcount/$revmax ($createcount) [$extracount]"
>
> -	debug "Processing commit: $rev"
> +	debug "Processing commit: $rev ($indent)"
>  	exists=$(cache_get "$rev")
> -	if test -n "$exists"
> +	if test -z "$(cache_miss "$rev")"
>  	then
>  		debug "  prior: $exists"

I do not see the `exists` variable being used other than for the debug
statement. Maybe better something like this?

	debug "  prior found for $rev"

>  		return
> @@ -773,6 +791,17 @@ cmd_split () {
>
>  	unrevs="$(find_existing_splits "$dir" "$revs")"
>
> +	mainline="$(find_mainline_ref "$dir" "$revs")"
> +	if test -n "$mainline"
> +	then
> +		debug "Mainline $mainline predates subtree add"
> +		git rev-list --topo-order --skip=1 $mainline |
> +		while read rev
> +		do
> +			cache_set "$rev" ""

Ah, so they are not really "null mappings", but mapped to an empty string.
Makes sense. Maybe adjust the commit message?

> +		done || exit $?
> +	fi
> +
>  	# We can't restrict rev-list to only $dir here, because some of our
>  	# parents have the $dir contents the root, and those won't match.
>  	# (and rev-list --follow doesn't seem to solve this)
> --
> gitgitgadget
>
>


check_parents () {
missed=$(cache_miss "$1")
missed=$(cache_miss $1)
local indent=$(($2 + 1))
for miss in $missed
do
if ! test -r "$cachedir/notree/$miss"
then
debug " incorrect order: $miss"
process_split_commit "$miss" "" "$indent"
fi
debug " unprocessed parent commit: $miss ($indent)"
process_split_commit "$miss" "" "$indent"
done
}

set_notree () {
echo "1" > "$cachedir/notree/$1"
}

cache_set () {
oldrev="$1"
newrev="$2"
Expand All @@ -266,6 +270,16 @@ cache_set () {
echo "$newrev" >"$cachedir/$oldrev"
}

cache_set_if_unset () {
oldrev="$1"
newrev="$2"
if test -e "$cachedir/$oldrev"
then
return
fi
echo "$newrev" >"$cachedir/$oldrev"
}

rev_exists () {
if git rev-parse "$1" >/dev/null 2>&1
then
Expand Down Expand Up @@ -375,13 +389,13 @@ find_existing_splits () {
then
# squash commits refer to a subtree
debug " Squash: $sq from $sub"
cache_set "$sq" "$sub"
cache_set_if_unset "$sq" "$sub"
fi
if test -n "$main" -a -n "$sub"
then
debug " Prior: $main -> $sub"
cache_set $main $sub
cache_set $sub $sub
cache_set_if_unset $main $sub
cache_set_if_unset $sub $sub
try_remove_previous "$main"
try_remove_previous "$sub"
fi
Expand All @@ -392,6 +406,36 @@ find_existing_splits () {
done
}

find_mainline_ref () {
debug "Looking for first split..."
dir="$1"
revs="$2"

git log --reverse --grep="^git-subtree-dir: $dir/*\$" \
--no-show-signature --pretty=format:'START %H%n%s%n%n%b%nEND%n' $revs |
while read a b junk
do
case "$a" in
git-subtree-mainline:)
echo "$b"
return
;;
esac
done
}

exclude_processed_refs () {
if test -r "$cachedir/processed"
then
cat "$cachedir/processed" |
while read rev
do
debug "read $rev"
echo "^$rev"
done
fi
}

copy_commit () {
# We're going to set some environment vars here, so
# do it in a subshell to get rid of them safely later
Expand Down Expand Up @@ -646,9 +690,9 @@ process_split_commit () {

progress "$revcount/$revmax ($createcount) [$extracount]"

debug "Processing commit: $rev"
debug "Processing commit: $rev ($indent)"
exists=$(cache_get "$rev")
if test -n "$exists"
if test -z "$(cache_miss "$rev")"
then
debug " prior: $exists"
return
Expand All @@ -666,10 +710,19 @@ process_split_commit () {
# vs. a mainline commit? Does it matter?
if test -z "$tree"
then
set_notree "$rev"
if test -n "$newparents"
then
cache_set "$rev" "$rev"
if test "$newparents" = "$parents"
then
# if all parents were subtrees, this can be a subtree commit
cache_set "$rev" "$rev"
else
# a mainline commit with tree missing is equivalent to the initial commit
cache_set "$rev" ""
fi
else
# no parents with valid subtree mappings means a commit prior to subtree add
cache_set "$rev" ""
fi
return
fi
Expand Down Expand Up @@ -754,6 +807,61 @@ cmd_add_commit () {
say "Added dir '$dir'"
}

cmd_map () {

if test -z "$1"
then
die "You must provide a revision to map"
fi

oldrev=$(git rev-parse --revs-only "$1") || exit $?
newrev=

if test -n "$2"
then
newrev=$(git rev-parse --revs-only "$2") || exit $?
fi

cache_setup || exit $?
cache_set "$oldrev" "$newrev"

say "Mapped $oldrev => $newrev"
}

cmd_ignore () {
revs=$(git rev-parse $default --revs-only "$@") || exit $?
ensure_single_rev $revs

say "Ignoring $revs"

cache_setup || exit $?

git rev-list $revs |
while read rev
do
cache_set "$rev" ""
done

echo "$revs" >>"$cachedir/processed"
}

cmd_use () {
revs=$(git rev-parse $default --revs-only "$@") || exit $?
ensure_single_rev $revs

say "Using existing subtree $revs"

cache_setup || exit $?

git rev-list $revs |
while read rev
do
cache_set "$rev" "$rev"
done

echo "$revs" >>"$cachedir/processed"
}

cmd_split () {
debug "Splitting $dir..."
cache_setup || exit $?
Expand All @@ -767,11 +875,22 @@ cmd_split () {
# the 'onto' history is already just the subdir, so
# any parent we find there can be used verbatim
debug " cache: $rev"
cache_set "$rev" "$rev"
cache_set_if_unset "$rev" "$rev"
done
fi

unrevs="$(find_existing_splits "$dir" "$revs")"
unrevs="$(find_existing_splits "$dir" "$revs") $(exclude_processed_refs)"

mainline="$(find_mainline_ref "$dir" "$revs")"
if test -n "$mainline"
then
debug "Mainline $mainline predates subtree add"
git rev-list --topo-order --skip=1 $mainline |
while read rev
do
cache_set_if_unset "$rev" ""
done || exit $?
fi

# We can't restrict rev-list to only $dir here, because some of our
# parents have the $dir contents the root, and those won't match.
Expand Down
24 changes: 24 additions & 0 deletions contrib/subtree/git-subtree.txt
Expand Up @@ -52,6 +52,12 @@ useful elsewhere, you can extract its entire history and publish
that as its own git repository, without accidentally
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Tom,

On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:

> From: Tom Clarkson <tom@tqclarkson.com>
>
> Signed-off-by: Tom Clarkson <tom@tqclarkson.com>
> ---
>  contrib/subtree/git-subtree.txt | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>
> diff --git a/contrib/subtree/git-subtree.txt b/contrib/subtree/git-subtree.txt
> index 352deda69d..a5a76e8ce6 100644
> --- a/contrib/subtree/git-subtree.txt
> +++ b/contrib/subtree/git-subtree.txt
> @@ -52,6 +52,12 @@ useful elsewhere, you can extract its entire history and publish
>  that as its own git repository, without accidentally
>  intermingling the history of your application project.
>
> +Although the relationship between subtree and mainline commits is stored

As far as I can see, this is the first time the term "mainline commit" is
used in that file, and it has not really be defined what you mean by that.
I *guess* you are referring to commits in the main project that did not
come from any subtree project.

Maybe this can be described without needing a new term?

Ciao,
Dscho

> +in regular git history, it is also cached between subtree runs. In most
> +cases this is merely a performance improvement, but for projects with
> +large and complex histories the cache can be manipulated directly
> +with the use, ignore and map commands.
> +
>  [TIP]
>  In order to keep your commit messages clean, we recommend that
>  people split their commits between the subtrees and the main
> @@ -120,6 +126,21 @@ and friends will work as expected.
>  Note that if you use '--squash' when you merge, you should usually not
>  just '--rejoin' when you split.
>
> +ignore::
> +	Mark a commit and all of its history as irrelevant to subtree split.
> +	In most cases this would be handled automatically based on metadata
> +	from subtree join commits. Intended for improving performance on
> +	extremely large repos and excluding complex history that turns out
> +	to be otherwise problematic.
> +
> +use::
> +	Mark a commit and all of its history as part of an existing subtree.
> +	In normal circumstances this would be handled based on the metadata
> +	from the subtree join commit. Similar to the --onto option of split.
> +
> +map::
> +	Manually override the normal output of split for a particular commit.
> +	Extreme flexibility for advanced troubleshooting purposes only.
>
>  OPTIONS
>  -------
> @@ -142,6 +163,9 @@ OPTIONS
>  	This option is only valid for add, merge and pull (unsure).
>  	Specify <message> as the commit message for the merge commit.
>
> +--clear-cache::
> +	Reset the subtree cache and recalculate all subtree mappings from the
> +	commit history
>
>  OPTIONS FOR add, merge, push, pull
>  ----------------------------------
> --
> gitgitgadget
>

intermingling the history of your application project.

Although the relationship between subtree and mainline commits is stored
in regular git history, it is also cached between subtree runs. In most
cases this is merely a performance improvement, but for projects with
large and complex histories the cache can be manipulated directly
with the use, ignore and map commands.

[TIP]
In order to keep your commit messages clean, we recommend that
people split their commits between the subtrees and the main
Expand Down Expand Up @@ -120,6 +126,21 @@ and friends will work as expected.
Note that if you use '--squash' when you merge, you should usually not
just '--rejoin' when you split.

ignore::
Mark a commit and all of its history as irrelevant to subtree split.
In most cases this would be handled automatically based on metadata
from subtree join commits. Intended for improving performance on
extremely large repos and excluding complex history that turns out
to be otherwise problematic.

use::
Mark a commit and all of its history as part of an existing subtree.
In normal circumstances this would be handled based on the metadata
from the subtree join commit. Similar to the --onto option of split.

map::
Manually override the normal output of split for a particular commit.
Extreme flexibility for advanced troubleshooting purposes only.

OPTIONS
-------
Expand All @@ -142,6 +163,9 @@ OPTIONS
This option is only valid for add, merge and pull (unsure).
Specify <message> as the commit message for the merge commit.

--clear-cache::
Reset the subtree cache and recalculate all subtree mappings from the
commit history

OPTIONS FOR add, merge, push, pull
----------------------------------
Expand Down