Skip to content

Prune empty commits #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Conversation

javabrett
Copy link
Contributor

These are Roberto's initial changes (for #27), the fixes from Martin (#121), and I have added a commit to bring to compiling and passing-tests with latest master 8abe03c .

I tested this with and without the new option - it looks like it does a sterling job of removing existing empty-commits, and those made empty by BFG-cleaning.

@donnib
Copy link

donnib commented Jun 8, 2016

@javabrett unitl @rtyley merges this in can you provide a jar file i can try so i don't have to build it myself ?

@javabrett
Copy link
Contributor Author

javabrett commented Jun 9, 2016

https://github.com/javabrett/bfg-repo-cleaner/raw/prune-empty-commits-built/bfg/target/bfg-1.13.1-SNAPSHOT-prune-empty-commits-850d967.jar

Edit: Updated link with rebased rebuild.

@clembou
Copy link

clembou commented Jul 29, 2016

@javabrett Thanks for the jar, I just used it to prune commits from our massive repo after some extensive BFG surgery and it worked a treat 👍

I still had to run git-filter-branch because BFG wasn't quite able to do what I needed, but since I pruned a bunch of commits first using your build, it ran much faster: it took 5 hours instead of 4 days 😄

@wyaeld
Copy link

wyaeld commented Jan 17, 2017

@rtyley is this PR still being considered?

@javabrett javabrett force-pushed the prune-empty-commits branch from 52a2ae7 to 3cf762b Compare January 17, 2017 23:38
@javabrett
Copy link
Contributor Author

javabrett commented Jan 17, 2017

Rebased to current master resolving a small import conflict. Retested with latest Scala-SBT.

@jwnewman12
Copy link

jwnewman12 commented Aug 8, 2017

I just tested this and it worked quite well. I deleted an entire subdirectory out of this project, and was left with a few hundred 'empty' commits from work within that subdirectory. This fork removed all of those empty commits.

Except, merge commits that were only merging empty (now removed) commits in are still there. e.g.,

o main work 5
o Merge subdir branch into master 4

o more work on subdir branch 3
o work on subdir branch 2
/
o main work base 1

Using this fork I get commit 2 & 3 removed (yes) but commit 4 is still left there between 1 and 5. Ideally those would be detected and removed as well. Merge commits where the one ancestor was all empty. 😆

Update ...

I was able to get around the above just using git-filter-branch in a second pass after bfg. It works fine, but is of course much slower than BFG. It takes about 40 minutes to run on this repo, vs bfg doing much more work in less than 2 minutes.

$ git filter-branch --commit-filter "echo -n ${GIT_COMMIT}, >> ${map_file} ; git_commit_non_empty_tree "$@" | tee -a ${map_file}"

removes those now pointless merge commits and provides yet another commit mapping file, which is easily joined with that from bfg to provide the final mapping.

I think just running bfg again with --prune-empty (and no other dirt specified) would potentially do the trick here, but alas it says 'nothing to do, exiting'. So perhaps the PR here could be updated to have --prune-empty be considered as ... not nothing. Or as I originally asked, the isEmptyCommit function could be enhanced to detect these empty merges. Or, do nothing and have people asking for this fall down to a second pass with filter-branch.

But thumbs up on this PR, it otherwise worked nicely.

@javabrett
Copy link
Contributor Author

@rtyley This is one of the more popular PRs - pruning empty commits: those that BFG creates, and pre-existing ones.

I could rebase this again and resolve the conflicts, but before doing-so I wanted to check whether this is likely to be mergeable (ever) - do you think is it a suitable enhancement for BFG mainline, or if not, what changes might get it merged?

rtyley and others added 7 commits February 6, 2018 20:40
This feature removes commits that- after the cleaning process -contain *no*
file-tree change when compared to their parent commit. This would be
because the cleaning process has cleaned away whatever content it was that
was _changing_ in the original commit.

The option is off by default, it's activated by using the
`--prune-empty-commits` flag, eg:

$ bfg --delete-files foo --prune-empty-commits

rtyley#27
…o run with prune-empty-commits as its only cleaning-task.
@javabrett javabrett force-pushed the prune-empty-commits branch from 3cf762b to 850d967 Compare February 6, 2018 10:01
@reggi
Copy link

reggi commented Apr 26, 2018

Would love to know what the status is here. Any update in getting this PR merged?

@Nessworthy
Copy link

Seems like this repo is dying? No commits in 5 months. Even the jar posted above is a 404.

@javabrett
Copy link
Contributor Author

@Nessworthy I rebuilt that jar from the rebased PR and updated the link above.

@Vampire
Copy link

Vampire commented Jun 30, 2018

Is it BFG or your PR that makes it real slow the longer it runs?
I tested with a small 2_146 commits repo first with just --prune-empty-commits --private and it was blazingly fast, finishing in 6 seconds.
Then I started on our main repo that has 362_572 commits according to progress display 17 hours ago.
It currently is at commit 157_624 and needs several minutes per commit.

@javabrett
Copy link
Contributor Author

javabrett commented Jul 1, 2018

Is it BFG or your PR that makes it real slow the longer it runs?

Did you try the same runs on the GA version of BFG, without --prune-empty-commits? Or is that the only work you need BFG to do?

@Vampire
Copy link

Vampire commented Jul 1, 2018

It's the only work I need BFG for, so no.

@javabrett
Copy link
Contributor Author

My initial suggestion is to see if you can get some performance stats from JVM instrumentation, see if you can identify any hotspots. Or even take a few thread dumps.

@Vampire
Copy link

Vampire commented Jul 2, 2018

Forget it, seems to have been a memory problem.
Where it went slow after the first 40 percent or so that were done in a couple of minutes, the system was a VM server with only 2 GiB total RAM.
When I executed it locally on my machine, it went through in about 15 minutes.

@Vampire
Copy link

Vampire commented Jul 2, 2018

One caveat though.
After the cleaning, I copy over the notes with the help of the generated object-id-map.old-new.txt file.
But it seems the left-out commits are also mentioned there, I get many "failed to copy notes from ... to ...".

@Vampire
Copy link

Vampire commented Jul 2, 2018

Urgh, the problem is more problematic.

object-id-map.old-new.txt contains e. g.

0a7c8f290b0bf07bb9598afb5539eae047029d58 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
2247be2137aeacb0a168d8802f52313e37610b7d f99bcd68fb481806f9b1e72f0049f6a35eaa004b
53908fd7087fc25a2014bde193d251d9220b7bff f99bcd68fb481806f9b1e72f0049f6a35eaa004b
55b291b867e2c55ebd90faba9efde5cd98885894 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
6b2269659ee97bc3f6af1b14a63fe7342439d730 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
6dbb2dd8d9ca46841709e03a2dd7b781421dec04 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
8246b83e3eacb424139588a5c3208e4e7407fc8b f99bcd68fb481806f9b1e72f0049f6a35eaa004b
87e37f755b81166607d93d92a1435aee0a261128 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
a3e68fb32b58c2405433c3cc16f6bccb6be2f8b9 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
d351aab97b8a2b1d26f47a4b0bb39931599d8133 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
e22b1fe7520838da2a17fa5b29bede137fdc8de5 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
e3f4dc794f345173e17189bb8eb4674f49f6da37 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
f504f15f389d67a044a06b5b23c0a9a210185198 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
fda5495ab6d80524c0c960aa7f62684d19bb8aa9 f99bcd68fb481806f9b1e72f0049f6a35eaa004b

which means it states that it mapped all those commits to that one new commit, which is wrong.
f99bcd68fb481806f9b1e72f0049f6a35eaa004b is the parent of e22b1fe7520838da2a17fa5b29bede137fdc8de5 and the others are children, grand-children and so on of that one. All of those are empty commits, so should have been removed.

Yet they are listed in the object-id-map.old-new.txt file as being mapped to the same commit.

AND also very important, they are not removed. I still see e22b1fe7520838da2a17fa5b29bede137fdc8de5 and the others in the result which is note expected after I asked BFG to remove the empty commits.

@Vampire
Copy link

Vampire commented Jul 2, 2018

Also some commits are twice now in the result, one rewritten, one not.

@Vampire
Copy link

Vampire commented Jul 2, 2018

Also some refs did not get rewritten.
If I look at object-id-map.old-new.txt, there is a line

faf3b470d9938394529c14495bfdd3a3946bdf0e b1a422540fea3f8cf6f6251d24cb20dc915d0208

but the ref is still pointing at faf3b470d9938394529c14495bfdd3a3946bdf0e instead of at b1a422540fea3f8cf6f6251d24cb20dc915d0208.

This is probably also the cause why there are still non-rewritten commits in the result, as the refs were not updated properly

@Vampire
Copy link

Vampire commented Jul 2, 2018

The output says Found 3077 commit-pointing refs, Updating 3075 Refs and Updating references: 100% (3075/3075), so it might actually "just" be two refs that were not updated. Who knows though :-/

@Vampire
Copy link

Vampire commented Jul 2, 2018

Ah, this seems to be caused by different refs with same name but different capitalisation it seems.

@Vampire
Copy link

Vampire commented Jul 3, 2018

It still says 3075 instead of 3077 rewritten refs, but I guess one of them is HEAD, no idea what the other is.

But the wrong entries in the object-id-map.old-new.txt really are a problem.
The first that is copied to the target wins, adjacent tries fail as there is already a note.
And as the entries are sorted alphanumerically by old SHA, it is random whether the correct or a wrong note gets moved over.

Any chance you can fix this and provide a new build with that fix?
Maybe just list the old sha, or as target sha 40-times 0 which is usually used for deleted stuff on the right or added stuff on the left.

@javabrett
Copy link
Contributor Author

@Vampire

  • How did you clone the original remote repo?
  • Assuming that your repo is not shareable and/or too large anyway, could you write a script (as a gist) which creates and populates a sample repo containing some empty, or to-be-empty commits to be pruned, the output/result of running this BFG PR against that, including the object-id-map.old-new.txt, and the desired-vs-observed commentary.

@Vampire
Copy link

Vampire commented Jul 3, 2018

Here you have a fully self-contained example in one line, just adapt the path to the JAR:

mkdir foo && cd foo && git init && touch a b c d && git add a && git commit -m a && git add b && git commit -m b && git commit -m empty1 --allow-empty && git commit -m empty2 --allow-empty && git add c && git commit -m c && git add d && git commit -m d && git log --oneline && java -jar d:/Downloads/bfg-1.13.1-SNAPSHOT-prune-empty-commits-850d967.jar --prune-empty-commits --private . && git log --oneline && cat ..bfg-report/*/*/object-id-map.old-new.txt

When I exeucted it, the relevant output was:

aabfb19 (HEAD -> master) d
7be784c c
7bc2d92 empty2
ad64bd5 empty1
cf88014 b
31403a4 a
...
f6a3c67 (HEAD -> master) d
1c2ae2f c
cf88014 b
31403a4 a
7bc2d92a05e657e007fc62e0b2f6f9912e744d23 cf880142a982fa81ac1dae12f592d061cd17203a
7be784cd27626502757ffcf7a1105d5b3849a489 1c2ae2f1e87da6156694f4250d514c66d196bdf9
aabfb19b2338c4a0197735f609f996edda3240b9 f6a3c67f71ea14eaf97e9bdaca9707ffabd538fa
ad64bd54f3eb076ab084fc475e629a0b7fd92552 cf880142a982fa81ac1dae12f592d061cd17203a

Which means empty1 and empty2 were correctly removed, but in the object-id-map.old-new.txt you see that the old and new value of c and d are mentioned correctly and that empty1 and empty2 are stated to be mapped to b while they were actually removed.

So I'd either expect

7be784cd27626502757ffcf7a1105d5b3849a489 1c2ae2f1e87da6156694f4250d514c66d196bdf9
aabfb19b2338c4a0197735f609f996edda3240b9 f6a3c67f71ea14eaf97e9bdaca9707ffabd538fa

or

7bc2d92a05e657e007fc62e0b2f6f9912e744d23
7be784cd27626502757ffcf7a1105d5b3849a489 1c2ae2f1e87da6156694f4250d514c66d196bdf9
aabfb19b2338c4a0197735f609f996edda3240b9 f6a3c67f71ea14eaf97e9bdaca9707ffabd538fa
ad64bd54f3eb076ab084fc475e629a0b7fd92552

or

7bc2d92a05e657e007fc62e0b2f6f9912e744d23 0000000000000000000000000000000000000000
7be784cd27626502757ffcf7a1105d5b3849a489 1c2ae2f1e87da6156694f4250d514c66d196bdf9
aabfb19b2338c4a0197735f609f996edda3240b9 f6a3c67f71ea14eaf97e9bdaca9707ffabd538fa
ad64bd54f3eb076ab084fc475e629a0b7fd92552 0000000000000000000000000000000000000000

The latter is inspired by output of commands like git-diff or git-show, which write 40 zeroes instead of a SHA for a thing that doesn't exist, e. g. when a file is added or removed in a diff.

@javabrett
Copy link
Contributor Author

I'm having trouble seeing what is wrong with what is currently-logged - it seems to be working exactly as-designed. I'm also having trouble understanding how your proposed change improves things, but hopefully you can explain.

The old->new mapping file is designed as a record of BFG removed and what it was replaced-by. For pruned-empty-commits I'll admit this is a little more subtle, because the commit it removed, but I claim this is just equivalent to just rolling/amending it into its nearest non-empty ancestor commit, which therefore replaces it. I can't think of any other "new" commit to log which better-describes what has happened when the commit was pruned. As far as Git is concerned this is exactly what has happened - the commit c (now 1c2ae2f) has a new parent b (now cf88014). empty1 and empty2 (which used to be c's parent) have been "replaced" by b.

Your proposal suggests that it is more useful to record empty1 and empty2 as dangling, dropped, not replaced. Then there would be no way of linking or reporting their nearest new-point in the history, i.e. their closest non-empty ancestor. Sorry but I just don't see how this is useful.

What is your script doing (reattaching notes)? Maybe it needs to change how it handles/parses this file.

Can you explain in detail why it is a) terrible to mention the removed empty commit's replacement as its nearest non-empty ancestor and b) why better to provide no link at all.

@Vampire
Copy link

Vampire commented Jul 3, 2018

Well, because that is not the purpose of the file.
The purpose of the file is a mapping of "old commit" to "new commit".
For empty1 and emtpy2 there are no new commits, they were removed.
If you think the nearest parent information is useful, maybe you can add it as a third field in the line?

What this file is technically useful for, is that you have a 1:1 mapping old-commit to new-commit, e. g. if you need a lookup table.

In my case that's exactly what I used it for and what is recommended in one of the commits in #188.
What I call with the file is

git notes copy --stdin < ..bfg-report/*/*/object-id-map.old-new.txt
cat ..bfg-report/*/*/object-id-map.old-new.txt | cut -d ' ' -f 1 | git notes remove --stdin
git notes prune

this is a migration of a big old SVN repo with the KDE svn2git with the option to mention SVN revisions as git notes.
After BFG has done its work, the notes need to be moved to the new commits as BFG is not capable of this yet.

But if there is no correct 1:1 mapping, or rather if you cannot see in the file that a line is for the removal of a commit, you have no chance to do this correctly.
The lines in the file are sorted alpha-numerically, so it is random whether you get the most-parent commit first or another one and which you get last.
You can either use -f to force overwriting of notes which would always make the last one win or you could ommit -f like in my example, then always the first one wins.
But you cannot get this correctly done without the information which line is a removal.

So if the output would e. g. have been

7bc2d92a05e657e007fc62e0b2f6f9912e744d23 0000000000000000000000000000000000000000 cf880142a982fa81ac1dae12f592d061cd17203a
7be784cd27626502757ffcf7a1105d5b3849a489 1c2ae2f1e87da6156694f4250d514c66d196bdf9
aabfb19b2338c4a0197735f609f996edda3240b9 f6a3c67f71ea14eaf97e9bdaca9707ffabd538fa
ad64bd54f3eb076ab084fc475e629a0b7fd92552 0000000000000000000000000000000000000000 cf880142a982fa81ac1dae12f592d061cd17203a

the file could be used properly and still has the information about the nearest non-empty parent.

And you would also have a chance to see actual errors while moving notes, as currently I get a whole bunch of errors because of this.

Another example, with a being original commit and a' being rewritten commit, if you have

c
empty3
empty2
b
empty1
a

then the result will be

c'
b'
a

with mapping file

     b b'
     c c'
empty1 a
empty2 b'
empty3 b'

All three that are mapped to b' can occur in any order as it is sorted alphanumerically by source SHA.

How would I parse that file to do the proper work?
Better would be

     b b'
     c c'
empty1 0000000000000000000000000000000000000000 a
empty2 0000000000000000000000000000000000000000 b'
empty3 0000000000000000000000000000000000000000 b'

then it can be clearly handled easily an there is no information lost

@javabrett
Copy link
Contributor Author

Is there a reason you aren't using https://github.com/jwiegley/git-scripts/blob/master/git-remove-empty-commits or advice from https://stackoverflow.com/questions/26683792/how-can-i-find-empty-git-commits ? Execution time?

Maybe you could use those to pre-filter the mapping-file to remove pruned commits.

@Vampire
Copy link

Vampire commented Jul 3, 2018

Is https://github.com/jwiegley/git-scripts/blob/master/git-remove-empty-commits any better or faster than just doing git filter-branch --prune-empty --tag-name-filter cat -- --all?

If not, then yes, due to speed. Using format-branch is awefully slow when BFG can do it in 15-30 minutes for the 190_000 revisions repo.

Why don't you think the mapping file should be enriched with that information?
Even a "-" as last character or whatever would be enough, just something to identify that the line is a prune.
Using 0000000000000000000000000000000000000000 is just due to Git doing it that way in other places as described. But anything that lets me identify which line is a prune and which not would be fine.

Why the need to have some additional slow filtering on the file based on the not-processed repository when bfg already has the information present and could provide it easily?

@Vampire
Copy link

Vampire commented Jul 3, 2018

Or if you don't want to change that file, how about a second file that lists the pruned commits one per line, that would also be sufficient.

@Vampire
Copy link

Vampire commented Jul 3, 2018

It needs more than an hour just to do the command from your second link and then I'd never know if this is really 100% the same BFG will do which already has the information.
So I'd really appreciate if BFG could provide that information in the existing file or in another file or however.
I'd implement it myself, but I never used Scala, so that's a bit hard for me.

@Vampire
Copy link

Vampire commented Jul 3, 2018

Ok, I learned enough Scala and trusted my beloved IntelliJ to even come up with a patch now. :-)
make_pruned_commits_differentiatable_in_report_file.patch.gz

diff --git a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala
index 9e61007..d6f544e 100644
--- a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala
+++ b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala
@@ -20,7 +20,7 @@

 package com.madgag.git.bfg.cleaner

-import com.madgag.collection.concurrent.ConcurrentMultiMap
+import com.madgag.collection.concurrent.{ConcurrentMultiMap, ConcurrentSet}
 import com.madgag.git._
 import com.madgag.git.bfg.GitUtil._
 import com.madgag.git.bfg.cleaner.protection.{ProtectedObjectCensus, ProtectedObjectDirtReport}
@@ -64,6 +64,7 @@ class ObjectIdCleaner(config: ObjectIdCleaner.Config, objectDB: ObjectDatabase,

   val changesByFilename = new ConcurrentMultiMap[FileName, (ObjectId, ObjectId)]
   val deletionsByFilename = new ConcurrentMultiMap[FileName, ObjectId]
+  val prunedCommits = new ConcurrentSet[ObjectId]

   // want to enforce that once any value is returned, it is 'good' and therefore an identity-mapped key as well
   val memo: Memo[ObjectId, ObjectId] = MemoUtil.concurrentCleanerMemo(protectedObjectCensus.fixedObjectIds)
@@ -102,7 +103,10 @@ class ObjectIdCleaner(config: ObjectIdCleaner.Config, objectDB: ObjectDatabase,

     val cleanedArcs = originalCommit.arcs cleanWith this

-    if (config.pruneEmptyCommits && cleanedArcs.isEmptyCommit) cleanedArcs.parents.headOption.getOrElse(ObjectId.zeroId()) else {
+    if (config.pruneEmptyCommits && cleanedArcs.isEmptyCommit) {
+      prunedCommits += commitId
+      cleanedArcs.parents.headOption.getOrElse(ObjectId.zeroId())
+    } else {
       val kit = new CommitNodeCleaner.Kit(threadLocalResources, originalRevCommit, originalCommit, cleanedArcs, apply)
       val updatedCommitNode = commitNodeCleaner.fixer(kit)(originalCommit.node)
       val updatedCommit = Commit(updatedCommitNode, cleanedArcs)
diff --git a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala
index 691b44d..d894f52 100644
--- a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala
+++ b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala
@@ -242,9 +242,16 @@ class CLIReporter(repo: Repository) extends Reporter {
       case (filename, oldIds) => (filename, Text.abbreviate(oldIds.map(oldId => oldId.shortName + oldId.sizeOpt.map(size => s" (${ByteSize.format(size)})").mkString), "...").mkString(", "))
     } { oldId => Seq(oldId.name, oldId.sizeOpt.mkString) }

-    println(s"\n\nIn total, ${changedIds.size} object ids were changed. Full details are logged here:\n\n\t${reportsDir.path}")
+    val prunedCommits = objectIdCleaner.prunedCommits
+    println(s"\n\n")
+    if (prunedCommits.nonEmpty) {
+      println(s"In total, ${prunedCommits.size} empty commits were pruned.")
+    }
+    println(s"In total, ${changedIds.size} object ids were changed. Full details are logged here:\n\n\t${reportsDir.path}")

-    mapFile.writeStrings(SortedMap[AnyObjectId, ObjectId](changedIds.toSeq: _*).view.map { case (o,n) => s"${o.name} ${n.name}"}, "\n")
+    mapFile.writeStrings(SortedMap[AnyObjectId, ObjectId](changedIds.toSeq: _*).view.map {
+      case (o,n) => if (prunedCommits.contains(o)) s"${o.name} 0000000000000000000000000000000000000000 ${n.name}" else s"${o.name} ${n.name}"
+    }, "\n")

     cacheStatsFile.writeStrings(objectIdCleaner.stats().seq.map(_.toString()), "\n")

@javabrett
Copy link
Contributor Author

Ok, I learned enough Scala and trusted my beloved IntelliJ to even come up with a patch now. :-)

Nice work! If I could be so bold as to suggest that in-addition to the gzipped-patch, you might like to fork either the master BFG repo or my fork, and put your changes on a branch. That will make it much easier for that version to be selected either for merge or by others that need your change.

@Vampire
Copy link

Vampire commented Jul 4, 2018

Sure @javabrett, I made a PR to your PR branch :-)

@Vampire
Copy link

Vampire commented Jul 12, 2018

@javabrett do you consider applying my PR to your PR branch, or rather not? :-)

@javabrett
Copy link
Contributor Author

@Vampire I appreciate you putting this on a branch, so that if/when this PR is considered, all options and feedback are readily available. However note that a) I'm not a member on rtyley/bfg-repo-cleaner and b) I don't maintain or plan to maintain a built fork. So currently I wait to see if this PR is considered for merging, along with your changes, which can be easily incorporated without merging them into my PR branch.

@Vampire
Copy link

Vampire commented Jul 16, 2018

I'm clear about a and b and didn't assume either.
I just thought you could incorporate my changes into your PR, so that both (hopefully) get merged together. :-)

@mloskot
Copy link

mloskot commented Dec 6, 2018

FYI, I've built @javabrett 's branch, rebased against the latest master, and used against a largish repo. It worked beautifully, pruning 1K empties of 25K commits. Thanks a lot for the very useful feature!

UPDATE: See #147 (comment) for JAR file with the patched build of BFG 1.13.1

p.s. Pity it's been two years w/o conclusive merge. Must be quite demotivating for @javabrett

@nfalco79
Copy link

very appreciate feature. @rtyley any chance to get this merged?

@philippn
Copy link

philippn commented Feb 13, 2020

@javabrett I have used the patched JAR you posted on a very largish Git repo (after having pruned JAR files out of it) and it worked like a charm. Thanks alot!

@SeekingMeaning
Copy link

Hello all, I have created an experimental version that supports empty commits that have multiple parents (e.g. merge commits): https://github.com/SeekingMeaning/bfg-repo-cleaner/raw/prune-empty-commits-built/bfg/target/bfg-1.13.3-SNAPSHOT-prune-empty-commits-built-13a7243-dirty.jar

Whoaa512 added a commit to Whoaa512/bfg-repo-cleaner that referenced this pull request Apr 13, 2020
rtyley#147

Squashed commit of the following:

commit 850d967
Author: Brett Randall <javabrett@gmail.com>
Date:   Tue Feb 6 20:39:47 2018 +1100

    Updated --prune-empty-commits test: specs2 -> scalatest.

commit c008b83
Author: Brett Randall <javabrett@gmail.com>
Date:   Mon May 16 09:17:33 2016 +1000

    Consider --prune-empty-commits option as work on-its-own, allow BFG to run with prune-empty-commits as its only cleaning-task.

commit ea4c8a2
Author: Brett Randall <javabrett@gmail.com>
Date:   Fri May 13 23:00:31 2016 +1000

    API updates to bring this up to master 8abe03c 1.12.13-SNAPSHOT.

commit 56c4cfe
Author: Martin Dengler <martin@martindengler.com>
Date:   Tue Dec 22 14:08:39 2015 -0600

    Prune empty commits test typo fix

commit 8b6366d
Author: Roberto Tyley <roberto.tyley@gmail.com>
Date:   Fri May 9 09:11:54 2014 +0100

    Add nasty nasty code to address pruning the initial commit...

    ...do we want to go this far!?

commit 1caf6f1
Author: Roberto Tyley <roberto.tyley@gmail.com>
Date:   Sat May 10 13:01:54 2014 +0100

    Prune empty commits test

commit 2f866b5
Author: Roberto Tyley <roberto.tyley@gmail.com>
Date:   Sun Apr 6 23:11:14 2014 +0100

    Add the option to prune empty commits (issue rtyley#27)

    This feature removes commits that- after the cleaning process -contain *no*
    file-tree change when compared to their parent commit. This would be
    because the cleaning process has cleaned away whatever content it was that
    was _changing_ in the original commit.

    The option is off by default, it's activated by using the
    `--prune-empty-commits` flag, eg:

    $ bfg --delete-files foo --prune-empty-commits

    rtyley#27
@takanuva15
Copy link

takanuva15 commented Sep 8, 2021

@javabrett hey do you think you can rebase against master and fix the conflicting file in this PR? Also, if you have already built a jar file that includes your PR's changes, can you point me to the link to download that jar?

@mloskot
Copy link

mloskot commented Sep 8, 2021

@takanuva15 Here is mine that I mentioned in #147 (comment)

It is the BFG 1.13.1 patched for --prune-empty-commits support using this PR by @javabrett
It worked perfectly for me and removed all empty commits dangling after directories extraction into new submodule repositories.

bfg-1.13.1-SNAPSHOT-prune-empty-commits-850d967.jar.zip (12 MB)

@fireundubh
Copy link

fireundubh commented Sep 11, 2021

@takanuva15 Here is mine that I mentioned in #147 (comment)

It is the BFG 1.13.1 patched for --prune-empty-commits support using this PR by @javabrett
It worked perfectly for me and removed all empty commits dangling after directories extraction into new submodule repositories.

bfg-1.13.1-SNAPSHOT-prune-empty-commits-850d967.jar.zip (12 MB)

Quite the joker. And here I thought you did this. @rtyley actually vandalized his own project in d2713b4. Definitely a stable developer.

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive


--
You can rewrite history in Git - don't let Trump do it for real!
Trump's administration has lied consistently, to make people give up on ever
being told the truth. Don't give up: https://www.rescue.org/topic/refugees-america
--

@takanuva15
Copy link

@fireundubh The commit was reverted in 0d80de6, which is present in v1.14.0

@mloskot Thanks for giving a zip-file link with your built jar. Do you know if it's possible to rebase this PR with the latest changes in master easily? (I haven't worked with Scala before)

@mloskot
Copy link

mloskot commented Sep 14, 2021

@takanuva15

Do you know if it's possible to rebase this PR with the latest changes in master easily?

No idea. I've never developed anything in Scala/Java really. In 2018, I had cloned, rebased, built the thing and it just worked.

@javabrett javabrett closed this Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.