Skip to content

Prune empty commits#147

Closed
javabrett wants to merge 7 commits into
rtyley:masterfrom
javabrett:prune-empty-commits
Closed

Prune empty commits#147
javabrett wants to merge 7 commits into
rtyley:masterfrom
javabrett:prune-empty-commits

Conversation

@javabrett

Copy link
Copy Markdown
Contributor

These are Roberto's initial changes (for #27), the fixes from Martin (#121), and I have added a commit to bring to compiling and passing-tests with latest master 8abe03c .

I tested this with and without the new option - it looks like it does a sterling job of removing existing empty-commits, and those made empty by BFG-cleaning.

@donnib

donnib commented Jun 8, 2016

Copy link
Copy Markdown

@javabrett unitl @rtyley merges this in can you provide a jar file i can try so i don't have to build it myself ?

@javabrett

javabrett commented Jun 9, 2016

Copy link
Copy Markdown
Contributor Author

https://github.com/javabrett/bfg-repo-cleaner/raw/prune-empty-commits-built/bfg/target/bfg-1.13.1-SNAPSHOT-prune-empty-commits-850d967.jar

Edit: Updated link with rebased rebuild.

@clembou

clembou commented Jul 29, 2016

Copy link
Copy Markdown

@javabrett Thanks for the jar, I just used it to prune commits from our massive repo after some extensive BFG surgery and it worked a treat 👍

I still had to run git-filter-branch because BFG wasn't quite able to do what I needed, but since I pruned a bunch of commits first using your build, it ran much faster: it took 5 hours instead of 4 days 😄

@wyaeld

wyaeld commented Jan 17, 2017

Copy link
Copy Markdown

@rtyley is this PR still being considered?

@javabrett javabrett force-pushed the prune-empty-commits branch from 52a2ae7 to 3cf762b Compare January 17, 2017 23:38
@javabrett

javabrett commented Jan 17, 2017

Copy link
Copy Markdown
Contributor Author

Rebased to current master resolving a small import conflict. Retested with latest Scala-SBT.

@jwnewman12

jwnewman12 commented Aug 8, 2017

Copy link
Copy Markdown

I just tested this and it worked quite well. I deleted an entire subdirectory out of this project, and was left with a few hundred 'empty' commits from work within that subdirectory. This fork removed all of those empty commits.

Except, merge commits that were only merging empty (now removed) commits in are still there. e.g.,

o main work 5
o Merge subdir branch into master 4

o more work on subdir branch 3
o work on subdir branch 2
/
o main work base 1

Using this fork I get commit 2 & 3 removed (yes) but commit 4 is still left there between 1 and 5. Ideally those would be detected and removed as well. Merge commits where the one ancestor was all empty. 😆

Update ...

I was able to get around the above just using git-filter-branch in a second pass after bfg. It works fine, but is of course much slower than BFG. It takes about 40 minutes to run on this repo, vs bfg doing much more work in less than 2 minutes.

$ git filter-branch --commit-filter "echo -n ${GIT_COMMIT}, >> ${map_file} ; git_commit_non_empty_tree "$@" | tee -a ${map_file}"

removes those now pointless merge commits and provides yet another commit mapping file, which is easily joined with that from bfg to provide the final mapping.

I think just running bfg again with --prune-empty (and no other dirt specified) would potentially do the trick here, but alas it says 'nothing to do, exiting'. So perhaps the PR here could be updated to have --prune-empty be considered as ... not nothing. Or as I originally asked, the isEmptyCommit function could be enhanced to detect these empty merges. Or, do nothing and have people asking for this fall down to a second pass with filter-branch.

But thumbs up on this PR, it otherwise worked nicely.

@javabrett

Copy link
Copy Markdown
Contributor Author

@rtyley This is one of the more popular PRs - pruning empty commits: those that BFG creates, and pre-existing ones.

I could rebase this again and resolve the conflicts, but before doing-so I wanted to check whether this is likely to be mergeable (ever) - do you think is it a suitable enhancement for BFG mainline, or if not, what changes might get it merged?

rtyley and others added 7 commits February 6, 2018 20:40
This feature removes commits that- after the cleaning process -contain *no*
file-tree change when compared to their parent commit. This would be
because the cleaning process has cleaned away whatever content it was that
was _changing_ in the original commit.

The option is off by default, it's activated by using the
`--prune-empty-commits` flag, eg:

$ bfg --delete-files foo --prune-empty-commits

rtyley#27
…o run with prune-empty-commits as its only cleaning-task.
@javabrett javabrett force-pushed the prune-empty-commits branch from 3cf762b to 850d967 Compare February 6, 2018 10:01
@reggi

reggi commented Apr 26, 2018

Copy link
Copy Markdown

Would love to know what the status is here. Any update in getting this PR merged?

@Nessworthy

Copy link
Copy Markdown

Seems like this repo is dying? No commits in 5 months. Even the jar posted above is a 404.

@javabrett

Copy link
Copy Markdown
Contributor Author

@Nessworthy I rebuilt that jar from the rebased PR and updated the link above.

@Vampire

Vampire commented Jun 30, 2018

Copy link
Copy Markdown

Is it BFG or your PR that makes it real slow the longer it runs?
I tested with a small 2_146 commits repo first with just --prune-empty-commits --private and it was blazingly fast, finishing in 6 seconds.
Then I started on our main repo that has 362_572 commits according to progress display 17 hours ago.
It currently is at commit 157_624 and needs several minutes per commit.

@javabrett

javabrett commented Jul 1, 2018

Copy link
Copy Markdown
Contributor Author

Is it BFG or your PR that makes it real slow the longer it runs?

Did you try the same runs on the GA version of BFG, without --prune-empty-commits? Or is that the only work you need BFG to do?

@Vampire

Vampire commented Jul 1, 2018

Copy link
Copy Markdown

It's the only work I need BFG for, so no.

@javabrett

Copy link
Copy Markdown
Contributor Author

My initial suggestion is to see if you can get some performance stats from JVM instrumentation, see if you can identify any hotspots. Or even take a few thread dumps.

@Vampire

Vampire commented Jul 2, 2018

Copy link
Copy Markdown

Forget it, seems to have been a memory problem.
Where it went slow after the first 40 percent or so that were done in a couple of minutes, the system was a VM server with only 2 GiB total RAM.
When I executed it locally on my machine, it went through in about 15 minutes.

@Vampire

Vampire commented Jul 2, 2018

Copy link
Copy Markdown

One caveat though.
After the cleaning, I copy over the notes with the help of the generated object-id-map.old-new.txt file.
But it seems the left-out commits are also mentioned there, I get many "failed to copy notes from ... to ...".

@Vampire

Vampire commented Jul 2, 2018

Copy link
Copy Markdown

Urgh, the problem is more problematic.

object-id-map.old-new.txt contains e. g.

0a7c8f290b0bf07bb9598afb5539eae047029d58 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
2247be2137aeacb0a168d8802f52313e37610b7d f99bcd68fb481806f9b1e72f0049f6a35eaa004b
53908fd7087fc25a2014bde193d251d9220b7bff f99bcd68fb481806f9b1e72f0049f6a35eaa004b
55b291b867e2c55ebd90faba9efde5cd98885894 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
6b2269659ee97bc3f6af1b14a63fe7342439d730 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
6dbb2dd8d9ca46841709e03a2dd7b781421dec04 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
8246b83e3eacb424139588a5c3208e4e7407fc8b f99bcd68fb481806f9b1e72f0049f6a35eaa004b
87e37f755b81166607d93d92a1435aee0a261128 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
a3e68fb32b58c2405433c3cc16f6bccb6be2f8b9 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
d351aab97b8a2b1d26f47a4b0bb39931599d8133 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
e22b1fe7520838da2a17fa5b29bede137fdc8de5 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
e3f4dc794f345173e17189bb8eb4674f49f6da37 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
f504f15f389d67a044a06b5b23c0a9a210185198 f99bcd68fb481806f9b1e72f0049f6a35eaa004b
fda5495ab6d80524c0c960aa7f62684d19bb8aa9 f99bcd68fb481806f9b1e72f0049f6a35eaa004b

which means it states that it mapped all those commits to that one new commit, which is wrong.
f99bcd68fb481806f9b1e72f0049f6a35eaa004b is the parent of e22b1fe7520838da2a17fa5b29bede137fdc8de5 and the others are children, grand-children and so on of that one. All of those are empty commits, so should have been removed.

Yet they are listed in the object-id-map.old-new.txt file as being mapped to the same commit.

AND also very important, they are not removed. I still see e22b1fe7520838da2a17fa5b29bede137fdc8de5 and the others in the result which is note expected after I asked BFG to remove the empty commits.

@Vampire

Vampire commented Jul 2, 2018

Copy link
Copy Markdown

Also some commits are twice now in the result, one rewritten, one not.

@Vampire

Vampire commented Jul 2, 2018

Copy link
Copy Markdown

Also some refs did not get rewritten.
If I look at object-id-map.old-new.txt, there is a line

faf3b470d9938394529c14495bfdd3a3946bdf0e b1a422540fea3f8cf6f6251d24cb20dc915d0208

but the ref is still pointing at faf3b470d9938394529c14495bfdd3a3946bdf0e instead of at b1a422540fea3f8cf6f6251d24cb20dc915d0208.

This is probably also the cause why there are still non-rewritten commits in the result, as the refs were not updated properly

@Vampire

Vampire commented Jul 2, 2018

Copy link
Copy Markdown

The output says Found 3077 commit-pointing refs, Updating 3075 Refs and Updating references: 100% (3075/3075), so it might actually "just" be two refs that were not updated. Who knows though :-/

@Vampire

Vampire commented Jul 2, 2018

Copy link
Copy Markdown

Ah, this seems to be caused by different refs with same name but different capitalisation it seems.

@Vampire

Vampire commented Jul 3, 2018

Copy link
Copy Markdown

It still says 3075 instead of 3077 rewritten refs, but I guess one of them is HEAD, no idea what the other is.

But the wrong entries in the object-id-map.old-new.txt really are a problem.
The first that is copied to the target wins, adjacent tries fail as there is already a note.
And as the entries are sorted alphanumerically by old SHA, it is random whether the correct or a wrong note gets moved over.

Any chance you can fix this and provide a new build with that fix?
Maybe just list the old sha, or as target sha 40-times 0 which is usually used for deleted stuff on the right or added stuff on the left.

@javabrett

Copy link
Copy Markdown
Contributor Author

@Vampire

  • How did you clone the original remote repo?
  • Assuming that your repo is not shareable and/or too large anyway, could you write a script (as a gist) which creates and populates a sample repo containing some empty, or to-be-empty commits to be pruned, the output/result of running this BFG PR against that, including the object-id-map.old-new.txt, and the desired-vs-observed commentary.

@Vampire

Vampire commented Jul 3, 2018

Copy link
Copy Markdown

Here you have a fully self-contained example in one line, just adapt the path to the JAR:

mkdir foo && cd foo && git init && touch a b c d && git add a && git commit -m a && git add b && git commit -m b && git commit -m empty1 --allow-empty && git commit -m empty2 --allow-empty && git add c && git commit -m c && git add d && git commit -m d && git log --oneline && java -jar d:/Downloads/bfg-1.13.1-SNAPSHOT-prune-empty-commits-850d967.jar --prune-empty-commits --private . && git log --oneline && cat ..bfg-report/*/*/object-id-map.old-new.txt

When I exeucted it, the relevant output was:

aabfb19 (HEAD -> master) d
7be784c c
7bc2d92 empty2
ad64bd5 empty1
cf88014 b
31403a4 a
...
f6a3c67 (HEAD -> master) d
1c2ae2f c
cf88014 b
31403a4 a
7bc2d92a05e657e007fc62e0b2f6f9912e744d23 cf880142a982fa81ac1dae12f592d061cd17203a
7be784cd27626502757ffcf7a1105d5b3849a489 1c2ae2f1e87da6156694f4250d514c66d196bdf9
aabfb19b2338c4a0197735f609f996edda3240b9 f6a3c67f71ea14eaf97e9bdaca9707ffabd538fa
ad64bd54f3eb076ab084fc475e629a0b7fd92552 cf880142a982fa81ac1dae12f592d061cd17203a

Which means empty1 and empty2 were correctly removed, but in the object-id-map.old-new.txt you see that the old and new value of c and d are mentioned correctly and that empty1 and empty2 are stated to be mapped to b while they were actually removed.

So I'd either expect

7be784cd27626502757ffcf7a1105d5b3849a489 1c2ae2f1e87da6156694f4250d514c66d196bdf9
aabfb19b2338c4a0197735f609f996edda3240b9 f6a3c67f71ea14eaf97e9bdaca9707ffabd538fa

or

7bc2d92a05e657e007fc62e0b2f6f9912e744d23
7be784cd27626502757ffcf7a1105d5b3849a489 1c2ae2f1e87da6156694f4250d514c66d196bdf9
aabfb19b2338c4a0197735f609f996edda3240b9 f6a3c67f71ea14eaf97e9bdaca9707ffabd538fa
ad64bd54f3eb076ab084fc475e629a0b7fd92552

or

7bc2d92a05e657e007fc62e0b2f6f9912e744d23 0000000000000000000000000000000000000000
7be784cd27626502757ffcf7a1105d5b3849a489 1c2ae2f1e87da6156694f4250d514c66d196bdf9
aabfb19b2338c4a0197735f609f996edda3240b9 f6a3c67f71ea14eaf97e9bdaca9707ffabd538fa
ad64bd54f3eb076ab084fc475e629a0b7fd92552 0000000000000000000000000000000000000000

The latter is inspired by output of commands like git-diff or git-show, which write 40 zeroes instead of a SHA for a thing that doesn't exist, e. g. when a file is added or removed in a diff.

@javabrett

Copy link
Copy Markdown
Contributor Author

I'm having trouble seeing what is wrong with what is currently-logged - it seems to be working exactly as-designed. I'm also having trouble understanding how your proposed change improves things, but hopefully you can explain.

The old->new mapping file is designed as a record of BFG removed and what it was replaced-by. For pruned-empty-commits I'll admit this is a little more subtle, because the commit it removed, but I claim this is just equivalent to just rolling/amending it into its nearest non-empty ancestor commit, which therefore replaces it. I can't think of any other "new" commit to log which better-describes what has happened when the commit was pruned. As far as Git is concerned this is exactly what has happened - the commit c (now 1c2ae2f) has a new parent b (now cf88014). empty1 and empty2 (which used to be c's parent) have been "replaced" by b.

Your proposal suggests that it is more useful to record empty1 and empty2 as dangling, dropped, not replaced. Then there would be no way of linking or reporting their nearest new-point in the history, i.e. their closest non-empty ancestor. Sorry but I just don't see how this is useful.

What is your script doing (reattaching notes)? Maybe it needs to change how it handles/parses this file.

Can you explain in detail why it is a) terrible to mention the removed empty commit's replacement as its nearest non-empty ancestor and b) why better to provide no link at all.

@Vampire

Vampire commented Jul 3, 2018

Copy link
Copy Markdown

Well, because that is not the purpose of the file.
The purpose of the file is a mapping of "old commit" to "new commit".
For empty1 and emtpy2 there are no new commits, they were removed.
If you think the nearest parent information is useful, maybe you can add it as a third field in the line?

What this file is technically useful for, is that you have a 1:1 mapping old-commit to new-commit, e. g. if you need a lookup table.

In my case that's exactly what I used it for and what is recommended in one of the commits in #188.
What I call with the file is

git notes copy --stdin < ..bfg-report/*/*/object-id-map.old-new.txt
cat ..bfg-report/*/*/object-id-map.old-new.txt | cut -d ' ' -f 1 | git notes remove --stdin
git notes prune

this is a migration of a big old SVN repo with the KDE svn2git with the option to mention SVN revisions as git notes.
After BFG has done its work, the notes need to be moved to the new commits as BFG is not capable of this yet.

But if there is no correct 1:1 mapping, or rather if you cannot see in the file that a line is for the removal of a commit, you have no chance to do this correctly.
The lines in the file are sorted alpha-numerically, so it is random whether you get the most-parent commit first or another one and which you get last.
You can either use -f to force overwriting of notes which would always make the last one win or you could ommit -f like in my example, then always the first one wins.
But you cannot get this correctly done without the information which line is a removal.

So if the output would e. g. have been

7bc2d92a05e657e007fc62e0b2f6f9912e744d23 0000000000000000000000000000000000000000 cf880142a982fa81ac1dae12f592d061cd17203a
7be784cd27626502757ffcf7a1105d5b3849a489 1c2ae2f1e87da6156694f4250d514c66d196bdf9
aabfb19b2338c4a0197735f609f996edda3240b9 f6a3c67f71ea14eaf97e9bdaca9707ffabd538fa
ad64bd54f3eb076ab084fc475e629a0b7fd92552 0000000000000000000000000000000000000000 cf880142a982fa81ac1dae12f592d061cd17203a

the file could be used properly and still has the information about the nearest non-empty parent.

And you would also have a chance to see actual errors while moving notes, as currently I get a whole bunch of errors because of this.

Another example, with a being original commit and a' being rewritten commit, if you have

c
empty3
empty2
b
empty1
a

then the result will be

c'
b'
a

with mapping file

     b b'
     c c'
empty1 a
empty2 b'
empty3 b'

All three that are mapped to b' can occur in any order as it is sorted alphanumerically by source SHA.

How would I parse that file to do the proper work?
Better would be

     b b'
     c c'
empty1 0000000000000000000000000000000000000000 a
empty2 0000000000000000000000000000000000000000 b'
empty3 0000000000000000000000000000000000000000 b'

then it can be clearly handled easily an there is no information lost

@javabrett

Copy link
Copy Markdown
Contributor Author

Is there a reason you aren't using https://github.com/jwiegley/git-scripts/blob/master/git-remove-empty-commits or advice from https://stackoverflow.com/questions/26683792/how-can-i-find-empty-git-commits ? Execution time?

Maybe you could use those to pre-filter the mapping-file to remove pruned commits.

@Vampire

Vampire commented Jul 3, 2018

Copy link
Copy Markdown

Is https://github.com/jwiegley/git-scripts/blob/master/git-remove-empty-commits any better or faster than just doing git filter-branch --prune-empty --tag-name-filter cat -- --all?

If not, then yes, due to speed. Using format-branch is awefully slow when BFG can do it in 15-30 minutes for the 190_000 revisions repo.

Why don't you think the mapping file should be enriched with that information?
Even a "-" as last character or whatever would be enough, just something to identify that the line is a prune.
Using 0000000000000000000000000000000000000000 is just due to Git doing it that way in other places as described. But anything that lets me identify which line is a prune and which not would be fine.

Why the need to have some additional slow filtering on the file based on the not-processed repository when bfg already has the information present and could provide it easily?

@Vampire

Vampire commented Jul 3, 2018

Copy link
Copy Markdown

Or if you don't want to change that file, how about a second file that lists the pruned commits one per line, that would also be sufficient.

@Vampire

Vampire commented Jul 3, 2018

Copy link
Copy Markdown

It needs more than an hour just to do the command from your second link and then I'd never know if this is really 100% the same BFG will do which already has the information.
So I'd really appreciate if BFG could provide that information in the existing file or in another file or however.
I'd implement it myself, but I never used Scala, so that's a bit hard for me.

@Vampire

Vampire commented Jul 3, 2018

Copy link
Copy Markdown

Ok, I learned enough Scala and trusted my beloved IntelliJ to even come up with a patch now. :-)
make_pruned_commits_differentiatable_in_report_file.patch.gz

diff --git a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala
index 9e61007..d6f544e 100644
--- a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala
+++ b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdCleaner.scala
@@ -20,7 +20,7 @@

 package com.madgag.git.bfg.cleaner

-import com.madgag.collection.concurrent.ConcurrentMultiMap
+import com.madgag.collection.concurrent.{ConcurrentMultiMap, ConcurrentSet}
 import com.madgag.git._
 import com.madgag.git.bfg.GitUtil._
 import com.madgag.git.bfg.cleaner.protection.{ProtectedObjectCensus, ProtectedObjectDirtReport}
@@ -64,6 +64,7 @@ class ObjectIdCleaner(config: ObjectIdCleaner.Config, objectDB: ObjectDatabase,

   val changesByFilename = new ConcurrentMultiMap[FileName, (ObjectId, ObjectId)]
   val deletionsByFilename = new ConcurrentMultiMap[FileName, ObjectId]
+  val prunedCommits = new ConcurrentSet[ObjectId]

   // want to enforce that once any value is returned, it is 'good' and therefore an identity-mapped key as well
   val memo: Memo[ObjectId, ObjectId] = MemoUtil.concurrentCleanerMemo(protectedObjectCensus.fixedObjectIds)
@@ -102,7 +103,10 @@ class ObjectIdCleaner(config: ObjectIdCleaner.Config, objectDB: ObjectDatabase,

     val cleanedArcs = originalCommit.arcs cleanWith this

-    if (config.pruneEmptyCommits && cleanedArcs.isEmptyCommit) cleanedArcs.parents.headOption.getOrElse(ObjectId.zeroId()) else {
+    if (config.pruneEmptyCommits && cleanedArcs.isEmptyCommit) {
+      prunedCommits += commitId
+      cleanedArcs.parents.headOption.getOrElse(ObjectId.zeroId())
+    } else {
       val kit = new CommitNodeCleaner.Kit(threadLocalResources, originalRevCommit, originalCommit, cleanedArcs, apply)
       val updatedCommitNode = commitNodeCleaner.fixer(kit)(originalCommit.node)
       val updatedCommit = Commit(updatedCommitNode, cleanedArcs)
diff --git a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala
index 691b44d..d894f52 100644
--- a/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala
+++ b/bfg-library/src/main/scala/com/madgag/git/bfg/cleaner/Reporter.scala
@@ -242,9 +242,16 @@ class CLIReporter(repo: Repository) extends Reporter {
       case (filename, oldIds) => (filename, Text.abbreviate(oldIds.map(oldId => oldId.shortName + oldId.sizeOpt.map(size => s" (${ByteSize.format(size)})").mkString), "...").mkString(", "))
     } { oldId => Seq(oldId.name, oldId.sizeOpt.mkString) }

-    println(s"\n\nIn total, ${changedIds.size} object ids were changed. Full details are logged here:\n\n\t${reportsDir.path}")
+    val prunedCommits = objectIdCleaner.prunedCommits
+    println(s"\n\n")
+    if (prunedCommits.nonEmpty) {
+      println(s"In total, ${prunedCommits.size} empty commits were pruned.")
+    }
+    println(s"In total, ${changedIds.size} object ids were changed. Full details are logged here:\n\n\t${reportsDir.path}")

-    mapFile.writeStrings(SortedMap[AnyObjectId, ObjectId](changedIds.toSeq: _*).view.map { case (o,n) => s"${o.name} ${n.name}"}, "\n")
+    mapFile.writeStrings(SortedMap[AnyObjectId, ObjectId](changedIds.toSeq: _*).view.map {
+      case (o,n) => if (prunedCommits.contains(o)) s"${o.name} 0000000000000000000000000000000000000000 ${n.name}" else s"${o.name} ${n.name}"
+    }, "\n")

     cacheStatsFile.writeStrings(objectIdCleaner.stats().seq.map(_.toString()), "\n")

@javabrett

Copy link
Copy Markdown
Contributor Author

Ok, I learned enough Scala and trusted my beloved IntelliJ to even come up with a patch now. :-)

Nice work! If I could be so bold as to suggest that in-addition to the gzipped-patch, you might like to fork either the master BFG repo or my fork, and put your changes on a branch. That will make it much easier for that version to be selected either for merge or by others that need your change.

@Vampire

Vampire commented Jul 4, 2018

Copy link
Copy Markdown

Sure @javabrett, I made a PR to your PR branch :-)

@Vampire

Vampire commented Jul 12, 2018

Copy link
Copy Markdown

@javabrett do you consider applying my PR to your PR branch, or rather not? :-)

@javabrett

Copy link
Copy Markdown
Contributor Author

@Vampire I appreciate you putting this on a branch, so that if/when this PR is considered, all options and feedback are readily available. However note that a) I'm not a member on rtyley/bfg-repo-cleaner and b) I don't maintain or plan to maintain a built fork. So currently I wait to see if this PR is considered for merging, along with your changes, which can be easily incorporated without merging them into my PR branch.

@Vampire

Vampire commented Jul 16, 2018

Copy link
Copy Markdown

I'm clear about a and b and didn't assume either.
I just thought you could incorporate my changes into your PR, so that both (hopefully) get merged together. :-)

@mloskot

mloskot commented Dec 6, 2018

Copy link
Copy Markdown

FYI, I've built @javabrett 's branch, rebased against the latest master, and used against a largish repo. It worked beautifully, pruning 1K empties of 25K commits. Thanks a lot for the very useful feature!

UPDATE: See #147 (comment) for JAR file with the patched build of BFG 1.13.1

p.s. Pity it's been two years w/o conclusive merge. Must be quite demotivating for @javabrett

@nfalco79

Copy link
Copy Markdown

very appreciate feature. @rtyley any chance to get this merged?

@philippn

philippn commented Feb 13, 2020

Copy link
Copy Markdown

@javabrett I have used the patched JAR you posted on a very largish Git repo (after having pruned JAR files out of it) and it worked like a charm. Thanks alot!

@SeekingMeaning

Copy link
Copy Markdown

Hello all, I have created an experimental version that supports empty commits that have multiple parents (e.g. merge commits): https://github.com/SeekingMeaning/bfg-repo-cleaner/raw/prune-empty-commits-built/bfg/target/bfg-1.13.3-SNAPSHOT-prune-empty-commits-built-13a7243-dirty.jar

Whoaa512 added a commit to Whoaa512/bfg-repo-cleaner that referenced this pull request Apr 13, 2020
rtyley#147

Squashed commit of the following:

commit 850d967
Author: Brett Randall <javabrett@gmail.com>
Date:   Tue Feb 6 20:39:47 2018 +1100

    Updated --prune-empty-commits test: specs2 -> scalatest.

commit c008b83
Author: Brett Randall <javabrett@gmail.com>
Date:   Mon May 16 09:17:33 2016 +1000

    Consider --prune-empty-commits option as work on-its-own, allow BFG to run with prune-empty-commits as its only cleaning-task.

commit ea4c8a2
Author: Brett Randall <javabrett@gmail.com>
Date:   Fri May 13 23:00:31 2016 +1000

    API updates to bring this up to master 8abe03c 1.12.13-SNAPSHOT.

commit 56c4cfe
Author: Martin Dengler <martin@martindengler.com>
Date:   Tue Dec 22 14:08:39 2015 -0600

    Prune empty commits test typo fix

commit 8b6366d
Author: Roberto Tyley <roberto.tyley@gmail.com>
Date:   Fri May 9 09:11:54 2014 +0100

    Add nasty nasty code to address pruning the initial commit...

    ...do we want to go this far!?

commit 1caf6f1
Author: Roberto Tyley <roberto.tyley@gmail.com>
Date:   Sat May 10 13:01:54 2014 +0100

    Prune empty commits test

commit 2f866b5
Author: Roberto Tyley <roberto.tyley@gmail.com>
Date:   Sun Apr 6 23:11:14 2014 +0100

    Add the option to prune empty commits (issue rtyley#27)

    This feature removes commits that- after the cleaning process -contain *no*
    file-tree change when compared to their parent commit. This would be
    because the cleaning process has cleaned away whatever content it was that
    was _changing_ in the original commit.

    The option is off by default, it's activated by using the
    `--prune-empty-commits` flag, eg:

    $ bfg --delete-files foo --prune-empty-commits

    rtyley#27
@takanuva15

takanuva15 commented Sep 8, 2021

Copy link
Copy Markdown

@javabrett hey do you think you can rebase against master and fix the conflicting file in this PR? Also, if you have already built a jar file that includes your PR's changes, can you point me to the link to download that jar?

@mloskot

mloskot commented Sep 8, 2021

Copy link
Copy Markdown

@takanuva15 Here is mine that I mentioned in #147 (comment)

It is the BFG 1.13.1 patched for --prune-empty-commits support using this PR by @javabrett
It worked perfectly for me and removed all empty commits dangling after directories extraction into new submodule repositories.

bfg-1.13.1-SNAPSHOT-prune-empty-commits-850d967.jar.zip (12 MB)

@fireundubh

fireundubh commented Sep 11, 2021

Copy link
Copy Markdown

@takanuva15 Here is mine that I mentioned in #147 (comment)

It is the BFG 1.13.1 patched for --prune-empty-commits support using this PR by @javabrett
It worked perfectly for me and removed all empty commits dangling after directories extraction into new submodule repositories.

bfg-1.13.1-SNAPSHOT-prune-empty-commits-850d967.jar.zip (12 MB)

Quite the joker. And here I thought you did this. @rtyley actually vandalized his own project in d2713b4. Definitely a stable developer.

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive


--
You can rewrite history in Git - don't let Trump do it for real!
Trump's administration has lied consistently, to make people give up on ever
being told the truth. Don't give up: https://www.rescue.org/topic/refugees-america
--

@takanuva15

Copy link
Copy Markdown

@fireundubh The commit was reverted in 0d80de6, which is present in v1.14.0

@mloskot Thanks for giving a zip-file link with your built jar. Do you know if it's possible to rebase this PR with the latest changes in master easily? (I haven't worked with Scala before)

@mloskot

mloskot commented Sep 14, 2021

Copy link
Copy Markdown

@takanuva15

Do you know if it's possible to rebase this PR with the latest changes in master easily?

No idea. I've never developed anything in Scala/Java really. In 2018, I had cloned, rebased, built the thing and it just worked.

@javabrett javabrett closed this Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.