-
Notifications
You must be signed in to change notification settings - Fork 555
Prune empty commits #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prune empty commits #147
Conversation
@javabrett unitl @rtyley merges this in can you provide a jar file i can try so i don't have to build it myself ? |
Edit: Updated link with rebased rebuild. |
@javabrett Thanks for the jar, I just used it to prune commits from our massive repo after some extensive BFG surgery and it worked a treat 👍 I still had to run |
@rtyley is this PR still being considered? |
52a2ae7
to
3cf762b
Compare
Rebased to current master resolving a small import conflict. Retested with latest Scala-SBT. |
I just tested this and it worked quite well. I deleted an entire subdirectory out of this project, and was left with a few hundred 'empty' commits from work within that subdirectory. This fork removed all of those empty commits. Except, merge commits that were only merging empty (now removed) commits in are still there. e.g., o main work 5 Using this fork I get commit 2 & 3 removed (yes) but commit 4 is still left there between 1 and 5. Ideally those would be detected and removed as well. Merge commits where the one ancestor was all empty. 😆 Update ... I was able to get around the above just using git-filter-branch in a second pass after bfg. It works fine, but is of course much slower than BFG. It takes about 40 minutes to run on this repo, vs bfg doing much more work in less than 2 minutes. $ git filter-branch --commit-filter "echo -n ${GIT_COMMIT}, >> ${map_file} ; git_commit_non_empty_tree "$@" | tee -a ${map_file}" removes those now pointless merge commits and provides yet another commit mapping file, which is easily joined with that from bfg to provide the final mapping. I think just running bfg again with --prune-empty (and no other dirt specified) would potentially do the trick here, but alas it says 'nothing to do, exiting'. So perhaps the PR here could be updated to have --prune-empty be considered as ... not nothing. Or as I originally asked, the isEmptyCommit function could be enhanced to detect these empty merges. Or, do nothing and have people asking for this fall down to a second pass with filter-branch. But thumbs up on this PR, it otherwise worked nicely. |
@rtyley This is one of the more popular PRs - pruning empty commits: those that BFG creates, and pre-existing ones. I could rebase this again and resolve the conflicts, but before doing-so I wanted to check whether this is likely to be mergeable (ever) - do you think is it a suitable enhancement for BFG mainline, or if not, what changes might get it merged? |
This feature removes commits that- after the cleaning process -contain *no* file-tree change when compared to their parent commit. This would be because the cleaning process has cleaned away whatever content it was that was _changing_ in the original commit. The option is off by default, it's activated by using the `--prune-empty-commits` flag, eg: $ bfg --delete-files foo --prune-empty-commits rtyley#27
...do we want to go this far!?
…o run with prune-empty-commits as its only cleaning-task.
3cf762b
to
850d967
Compare
Would love to know what the status is here. Any update in getting this PR merged? |
Seems like this repo is dying? No commits in 5 months. Even the jar posted above is a 404. |
@Nessworthy I rebuilt that jar from the rebased PR and updated the link above. |
Is it BFG or your PR that makes it real slow the longer it runs? |
Did you try the same runs on the GA version of BFG, without |
It's the only work I need BFG for, so no. |
My initial suggestion is to see if you can get some performance stats from JVM instrumentation, see if you can identify any hotspots. Or even take a few thread dumps. |
Forget it, seems to have been a memory problem. |
One caveat though. |
Urgh, the problem is more problematic. object-id-map.old-new.txt contains e. g.
which means it states that it mapped all those commits to that one new commit, which is wrong. Yet they are listed in the object-id-map.old-new.txt file as being mapped to the same commit. AND also very important, they are not removed. I still see |
Also some commits are twice now in the result, one rewritten, one not. |
Also some refs did not get rewritten.
but the ref is still pointing at This is probably also the cause why there are still non-rewritten commits in the result, as the refs were not updated properly |
The output says |
Ah, this seems to be caused by different refs with same name but different capitalisation it seems. |
It still says 3075 instead of 3077 rewritten refs, but I guess one of them is But the wrong entries in the Any chance you can fix this and provide a new build with that fix? |
|
Here you have a fully self-contained example in one line, just adapt the path to the JAR: mkdir foo && cd foo && git init && touch a b c d && git add a && git commit -m a && git add b && git commit -m b && git commit -m empty1 --allow-empty && git commit -m empty2 --allow-empty && git add c && git commit -m c && git add d && git commit -m d && git log --oneline && java -jar d:/Downloads/bfg-1.13.1-SNAPSHOT-prune-empty-commits-850d967.jar --prune-empty-commits --private . && git log --oneline && cat ..bfg-report/*/*/object-id-map.old-new.txt When I exeucted it, the relevant output was:
Which means So I'd either expect
or
or
The latter is inspired by output of commands like |
I'm having trouble seeing what is wrong with what is currently-logged - it seems to be working exactly as-designed. I'm also having trouble understanding how your proposed change improves things, but hopefully you can explain. The old->new mapping file is designed as a record of BFG removed and what it was replaced-by. For pruned-empty-commits I'll admit this is a little more subtle, because the commit it removed, but I claim this is just equivalent to just rolling/amending it into its nearest non-empty ancestor commit, which therefore replaces it. I can't think of any other "new" commit to log which better-describes what has happened when the commit was pruned. As far as Git is concerned this is exactly what has happened - the commit Your proposal suggests that it is more useful to record What is your script doing (reattaching notes)? Maybe it needs to change how it handles/parses this file. Can you explain in detail why it is a) terrible to mention the removed empty commit's replacement as its nearest non-empty ancestor and b) why better to provide no link at all. |
Well, because that is not the purpose of the file. What this file is technically useful for, is that you have a 1:1 mapping old-commit to new-commit, e. g. if you need a lookup table. In my case that's exactly what I used it for and what is recommended in one of the commits in #188. git notes copy --stdin < ..bfg-report/*/*/object-id-map.old-new.txt
cat ..bfg-report/*/*/object-id-map.old-new.txt | cut -d ' ' -f 1 | git notes remove --stdin
git notes prune this is a migration of a big old SVN repo with the KDE But if there is no correct 1:1 mapping, or rather if you cannot see in the file that a line is for the removal of a commit, you have no chance to do this correctly. So if the output would e. g. have been
the file could be used properly and still has the information about the nearest non-empty parent. And you would also have a chance to see actual errors while moving notes, as currently I get a whole bunch of errors because of this. Another example, with
then the result will be
with mapping file
All three that are mapped to How would I parse that file to do the proper work?
then it can be clearly handled easily an there is no information lost |
Is there a reason you aren't using https://github.com/jwiegley/git-scripts/blob/master/git-remove-empty-commits or advice from https://stackoverflow.com/questions/26683792/how-can-i-find-empty-git-commits ? Execution time? Maybe you could use those to pre-filter the mapping-file to remove pruned commits. |
Is https://github.com/jwiegley/git-scripts/blob/master/git-remove-empty-commits any better or faster than just doing If not, then yes, due to speed. Using format-branch is awefully slow when BFG can do it in 15-30 minutes for the 190_000 revisions repo. Why don't you think the mapping file should be enriched with that information? Why the need to have some additional slow filtering on the file based on the not-processed repository when bfg already has the information present and could provide it easily? |
Or if you don't want to change that file, how about a second file that lists the pruned commits one per line, that would also be sufficient. |
It needs more than an hour just to do the command from your second link and then I'd never know if this is really 100% the same BFG will do which already has the information. |
Ok, I learned enough Scala and trusted my beloved IntelliJ to even come up with a patch now. :-) diff --git a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala
index 9e61007..d6f544e 100644
--- a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala
+++ b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala
@@ -20,7 +20,7 @@
package com.madgag.git.bfg.cleaner
-import com.madgag.collection.concurrent.ConcurrentMultiMap
+import com.madgag.collection.concurrent.{ConcurrentMultiMap, ConcurrentSet}
import com.madgag.git._
import com.madgag.git.bfg.GitUtil._
import com.madgag.git.bfg.cleaner.protection.{ProtectedObjectCensus, ProtectedObjectDirtReport}
@@ -64,6 +64,7 @@ class ObjectIdCleaner(config: ObjectIdCleaner.Config, objectDB: ObjectDatabase,
val changesByFilename = new ConcurrentMultiMap[FileName, (ObjectId, ObjectId)]
val deletionsByFilename = new ConcurrentMultiMap[FileName, ObjectId]
+ val prunedCommits = new ConcurrentSet[ObjectId]
// want to enforce that once any value is returned, it is 'good' and therefore an identity-mapped key as well
val memo: Memo[ObjectId, ObjectId] = MemoUtil.concurrentCleanerMemo(protectedObjectCensus.fixedObjectIds)
@@ -102,7 +103,10 @@ class ObjectIdCleaner(config: ObjectIdCleaner.Config, objectDB: ObjectDatabase,
val cleanedArcs = originalCommit.arcs cleanWith this
- if (config.pruneEmptyCommits && cleanedArcs.isEmptyCommit) cleanedArcs.parents.headOption.getOrElse(ObjectId.zeroId()) else {
+ if (config.pruneEmptyCommits && cleanedArcs.isEmptyCommit) {
+ prunedCommits += commitId
+ cleanedArcs.parents.headOption.getOrElse(ObjectId.zeroId())
+ } else {
val kit = new CommitNodeCleaner.Kit(threadLocalResources, originalRevCommit, originalCommit, cleanedArcs, apply)
val updatedCommitNode = commitNodeCleaner.fixer(kit)(originalCommit.node)
val updatedCommit = Commit(updatedCommitNode, cleanedArcs)
diff --git a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala
index 691b44d..d894f52 100644
--- a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala
+++ b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala
@@ -242,9 +242,16 @@ class CLIReporter(repo: Repository) extends Reporter {
case (filename, oldIds) => (filename, Text.abbreviate(oldIds.map(oldId => oldId.shortName + oldId.sizeOpt.map(size => s" (${ByteSize.format(size)})").mkString), "...").mkString(", "))
} { oldId => Seq(oldId.name, oldId.sizeOpt.mkString) }
- println(s"\n\nIn total, ${changedIds.size} object ids were changed. Full details are logged here:\n\n\t${reportsDir.path}")
+ val prunedCommits = objectIdCleaner.prunedCommits
+ println(s"\n\n")
+ if (prunedCommits.nonEmpty) {
+ println(s"In total, ${prunedCommits.size} empty commits were pruned.")
+ }
+ println(s"In total, ${changedIds.size} object ids were changed. Full details are logged here:\n\n\t${reportsDir.path}")
- mapFile.writeStrings(SortedMap[AnyObjectId, ObjectId](changedIds.toSeq: _*).view.map { case (o,n) => s"${o.name} ${n.name}"}, "\n")
+ mapFile.writeStrings(SortedMap[AnyObjectId, ObjectId](changedIds.toSeq: _*).view.map {
+ case (o,n) => if (prunedCommits.contains(o)) s"${o.name} 0000000000000000000000000000000000000000 ${n.name}" else s"${o.name} ${n.name}"
+ }, "\n")
cacheStatsFile.writeStrings(objectIdCleaner.stats().seq.map(_.toString()), "\n")
|
Nice work! If I could be so bold as to suggest that in-addition to the gzipped-patch, you might like to fork either the master BFG repo or my fork, and put your changes on a branch. That will make it much easier for that version to be selected either for merge or by others that need your change. |
Sure @javabrett, I made a PR to your PR branch :-) |
@javabrett do you consider applying my PR to your PR branch, or rather not? :-) |
@Vampire I appreciate you putting this on a branch, so that if/when this PR is considered, all options and feedback are readily available. However note that a) I'm not a member on |
I'm clear about a and b and didn't assume either. |
FYI, I've built @javabrett 's branch, rebased against the latest UPDATE: See #147 (comment) for JAR file with the patched build of BFG 1.13.1 p.s. Pity it's been two years w/o conclusive merge. Must be quite demotivating for @javabrett |
very appreciate feature. @rtyley any chance to get this merged? |
@javabrett I have used the patched JAR you posted on a very largish Git repo (after having pruned JAR files out of it) and it worked like a charm. Thanks alot! |
Hello all, I have created an experimental version that supports empty commits that have multiple parents (e.g. merge commits): https://github.com/SeekingMeaning/bfg-repo-cleaner/raw/prune-empty-commits-built/bfg/target/bfg-1.13.3-SNAPSHOT-prune-empty-commits-built-13a7243-dirty.jar |
rtyley#147 Squashed commit of the following: commit 850d967 Author: Brett Randall <javabrett@gmail.com> Date: Tue Feb 6 20:39:47 2018 +1100 Updated --prune-empty-commits test: specs2 -> scalatest. commit c008b83 Author: Brett Randall <javabrett@gmail.com> Date: Mon May 16 09:17:33 2016 +1000 Consider --prune-empty-commits option as work on-its-own, allow BFG to run with prune-empty-commits as its only cleaning-task. commit ea4c8a2 Author: Brett Randall <javabrett@gmail.com> Date: Fri May 13 23:00:31 2016 +1000 API updates to bring this up to master 8abe03c 1.12.13-SNAPSHOT. commit 56c4cfe Author: Martin Dengler <martin@martindengler.com> Date: Tue Dec 22 14:08:39 2015 -0600 Prune empty commits test typo fix commit 8b6366d Author: Roberto Tyley <roberto.tyley@gmail.com> Date: Fri May 9 09:11:54 2014 +0100 Add nasty nasty code to address pruning the initial commit... ...do we want to go this far!? commit 1caf6f1 Author: Roberto Tyley <roberto.tyley@gmail.com> Date: Sat May 10 13:01:54 2014 +0100 Prune empty commits test commit 2f866b5 Author: Roberto Tyley <roberto.tyley@gmail.com> Date: Sun Apr 6 23:11:14 2014 +0100 Add the option to prune empty commits (issue rtyley#27) This feature removes commits that- after the cleaning process -contain *no* file-tree change when compared to their parent commit. This would be because the cleaning process has cleaned away whatever content it was that was _changing_ in the original commit. The option is off by default, it's activated by using the `--prune-empty-commits` flag, eg: $ bfg --delete-files foo --prune-empty-commits rtyley#27
@javabrett hey do you think you can rebase against master and fix the conflicting file in this PR? Also, if you have already built a jar file that includes your PR's changes, can you point me to the link to download that jar? |
@takanuva15 Here is mine that I mentioned in #147 (comment) It is the BFG 1.13.1 patched for bfg-1.13.1-SNAPSHOT-prune-empty-commits-850d967.jar.zip (12 MB) |
|
@fireundubh The commit was reverted in 0d80de6, which is present in v1.14.0 @mloskot Thanks for giving a zip-file link with your built jar. Do you know if it's possible to rebase this PR with the latest changes in master easily? (I haven't worked with Scala before) |
No idea. I've never developed anything in Scala/Java really. In 2018, I had cloned, rebased, built the thing and it just worked. |
These are Roberto's initial changes (for #27), the fixes from Martin (#121), and I have added a commit to bring to compiling and passing-tests with latest master 8abe03c .
I tested this with and without the new option - it looks like it does a sterling job of removing existing empty-commits, and those made empty by BFG-cleaning.