Support prune empty commit #27

Open
is opened this Issue Nov 8, 2013 · 11 comments

Comments

Projects
None yet
8 participants

is commented Nov 8, 2013

Like git branch-filter --prune-empty

mr-c commented Apr 4, 2014

This issue is a big deal.

BFG appears to only remove empty commits from the descendents of HEAD.

Due to not noticing the discrepency this caused a big headache for me. I ended up redoing the clean up with git-filter-branch & --prune-empty; fortunately it only took 2 minutes to run.

Here's a complete rundown on what I ran including stripping out the synthetic GitHub pull request refs.

git clone --mirror git@github.com:ged-lab/khmer.git
cd khmer.git
du -hs
# 113M
git config --unset-all remote.origin.fetch
git config --add remote.origin.fetch '+refs/heads/*:refs/heads/*'
git config --add remote.origin.fetch '+refs/tags/*:refs/heads/*'
rm -Rf refs/pull
sed -i '/.*pull.*/d' packed-refs
# use a modified git-largest-object.sh to sort based on packed size & to work on bare repo
# examine output to craft the next command
PATHS="space separated list of paths to remove"
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch $PATHS --prune-empty --tag-name-filter cat -- --all
# takes ~1 minute; curly-bracket globs don't work here
rm -Rf refs/original
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now
du -hs
# 25M
#git push
# when you're ready
Owner

rtyley commented Apr 4, 2014

BFG appears to only remove empty commits from the descendents of HEAD.

Ah, you might be getting a bit confused here - The BFG currently doesn't remove empty commits from anywhere - descendants of HEAD or no.

If you're not seeing empty commits in the history of your HEAD commit, but are seeing empty commits in other branches, this is probably because The BFG protects the contents of the HEAD commit by default, and generally won't remove files from history if they're already present in a protected commit:

http://rtyley.github.io/bfg-repo-cleaner/#protected-commits

...so if the history of your HEAD doesn't have empty commits, that's just because the contents where protected by your HEAD commit, and so the corresponding contents will not have been removed. In other branches, unprotected content will have been removed, and this may well have lead to commits on those branches becoming empty diffs.

mr-c commented Apr 5, 2014

Then why did these branches diverge so much?

# original
mcrusoe@athyra:~/khmer/reposhrink/khmer.backup$ git show --raw `git merge-base master origin/feature/citations`
commit 58af106053356dfcb4a43bbc0a6f1614f7d5ac44
Author: C. Titus Brown <titus@idyll.org>
Date:   Mon Mar 31 23:21:45 2014 -0400

    added screed __version__ to info()

:100644 100644 3a9f0f0... a8f3ea9... M  khmer/khmer_args.py
# post bfg
mcrusoe@athyra:~/khmer/reposhrink/mirror-khmer.backup/khmer-bfg2.git$ git show --raw `git merge-base feature/citations master`
commit 0aaef480d02bcc955ebba0655dc1323fbe51ccc3
Author: C. Titus Brown <titus@idyll.org>
Date:   Sat Sep 18 18:42:30 2010 -0400

    fixed consume_fasta_and_tag for density approach

:100644 100644 0c79820... 73f2702... M  lib/hashbits.cc

Here is the output of that bfg run in between:

mcrusoe@athyra:~/khmer/reposhrink/mirror-khmer.backup/khmer-bfg2.git$ java -jar ~/khmer/gl-master/bfg-1.11.2.jar -D '{test-overlap1.ht,stamps-reads.fa.gz.bin,stamps-reads.fa.gz.bin.index,1m-filtered.fa,MSB2-surrender.fa,25k.fq.gz.bin,part-test.fa}' .                    [1095/9409]

Using repo : /home/mcrusoe/khmer/reposhrink/mirror-khmer.backup/khmer-bfg2.git/.

Found 677 objects to protect
Found 6 tag-pointing refs : refs/tags/2012-assembly-artifacts, refs/tags/2012-paper-kmer-percolation, refs/tags/2013-khmer-counting, ...
Found 39 commit-pointing refs : HEAD, refs/heads/bleeding-edge, refs/heads/calc-median-updates, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 2988a630 (protected by 'HEAD')

Cleaning
--------

Found 4161 commits
Cleaning commits:       100% (4161/4161)
Cleaning commits completed in 3,991 ms.

Updating 44 Refs
----------------

        Ref                                          Before     After   
        ----------------------------------------------------------------
        refs/heads/bleeding-edge                   | 97f15c59 | 8060ee61
        refs/heads/calc-median-updates             | c41afcd7 | 74faee10
        refs/heads/calc_best_assembly              | f1f9f5fa | bc4cc5d5
        refs/heads/docs_comparison_info            | 69bda540 | f8d62545
        refs/heads/feature/citations               | ebeb8501 | 76b4891d
        refs/heads/feature/hll-counter             | 0d449acb | 4483e1e0
        refs/heads/feature/missing_file_exceptions | 8615c17e | 8fdd72b9
        refs/heads/fishjord_graphalign             | d145b3bc | c4dc6518
        refs/heads/fix/count_overlap               | 850fd8ce | 95ca7078
        refs/heads/fix/hash_sizes                  | 5ee9cebb | 41538717
        refs/heads/galaxy-integration              | dd2648b2 | 47daa076
        refs/heads/graphalign-fj                   | e15ebb6c | 6066f4fb
        refs/heads/kmer_error_profile              | 5fd0e671 | 49da8218
        refs/heads/label_align                     | c1f25e8c | c7403ae6
        refs/heads/label_traverse                  | 2674402c | adfaabd3
        refs/heads/legacy                          | a2766d47 | 4faa86dc
        refs/heads/location_kmer                   | db03b0f1 | e52e2827
        refs/heads/master                          | 2988a630 | 17f8cc9a
        refs/heads/mwright/opt_nbm                 | 2184c2f1 | 6760460f
        refs/heads/parallel                        | 2c4ef321 | 34825b8e
        refs/heads/partition_fq_fix_legacy         | 52bbb5ff | 80d36d58
        refs/heads/protocols-v0.8.5                | 1cd14221 | 214dfd0a
        refs/heads/refactor/cython_bindings        | 7b757e47 | 39ad33d4
        refs/heads/reservoir_sampling2             | 253185b4 | eeef5570
        refs/heads/sparse_median                   | a58d8915 | 9723f244
        refs/heads/split_interleave                | 7a5c8331 | 4321513b
        refs/heads/update_trimmomatic_legacy       | 509068ea | 36cbcc81
        refs/tags/2012-assembly-artifacts          | 91af914b | 5cfd1f55
        refs/tags/2012-paper-diginorm              | c8e942ae | 49d59ef7
        refs/tags/2012-paper-kmer-percolation      | 33afbdf3 | 4129cff4
        refs/tags/2013-caltech-cemi                | 8bd74039 | bd1fd107
        refs/tags/2013-khmer-counting              | 20f56b27 | 21f51689
        refs/tags/iPlantDiscoveryEnvironment       | 1834774b | 1683c416
        refs/tags/protocols-v0.8.3                 | 997b7de6 | e5321213
        refs/tags/protocols-v0.8.5                 | 997b7de6 | e5321213
        refs/tags/v0.5                             | ca0c9919 | 6a782dba
        refs/tags/v0.6.1                           | 849a9362 | 558126d5
        refs/tags/v0.7                             | 656f7570 | 4b0af905
        refs/tags/v0.7.1                           | 5a3f3597 | ff867b10
        refs/tags/v0.8                             | f923ecf3 | 5e0280b1
        refs/tags/v0.8-rc1                         | dcb7ce6b | 58b31051
        refs/tags/v0.8-rc2                         | 43983a0b | 8542925a
        refs/tags/v0.8-rc3                         | 4f026c07 | fde26162
        refs/tags/v1.0                             | 2988a630 | 17f8cc9a

Updating references:    100% (44/44)
...Ref update completed in 76 ms.

Commit Tree-Dirt History
------------------------

        Earliest                                              Latest
        |                                                          |
        ...DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

        D = dirty commits (file tree fixed)
        m = modified commits (commit message or parents changed)
        . = clean commits (no changes to file tree)

                                Before     After   
        -------------------------------------------
        First modified commit | 8e6b90d0 | d2a8dfe0
        Last dirty commit     | 0d449acb | 4483e1e0


In total, 6132 object ids were changed - a record of these will be written to:

        /home/mcrusoe/khmer/reposhrink/mirror-khmer.backup/khmer-bfg2.git/..bfg-report/2014-04-04/20-14-31/object-id-map.old-new.txt

BFG run is complete!

mr-c commented Apr 5, 2014

in the bfg'd repo the following empty commit is after 0aaef480d02bcc955ebba0655dc1323fbe51ccc3 and is the source of the divergence:

mcrusoe@athyra:~/khmer/reposhrink/mirror-khmer.backup/khmer-bfg2.git$ git show --raw 5ddd819fd896e3dfa1d60dd0c39cd3a68b711c7c
commit 5ddd819fd896e3dfa1d60dd0c39cd3a68b711c7c
Author: C. Titus Brown <titus@idyll.org>
Date:   Sat Sep 18 22:06:46 2010 -0400

    added annoying surrender set
mcrusoe@athyra:~/khmer/reposhrink/mirror-khmer.backup/khmer-bfg2.git$ grep 5ddd819fd896e3dfa1d60dd0c39cd3a68b711c7c ..bfg-report/2014-04-04/20-14-31/object-id-map.old-new.txt | awk '{ print $2 }'
60c191b93a34b9cf81953a17e10ae2d7bfdff848

The original version commit:

mcrusoe@athyra:~/khmer/reposhrink/mirror-khmer.backup/khmer.git$ git show --raw 60c191b93a34b9cf81953a17e10ae2d7bfdff848
commit 60c191b93a34b9cf81953a17e10ae2d7bfdff848
Author: C. Titus Brown <titus@idyll.org>
Date:   Sat Sep 18 22:06:46 2010 -0400

    added annoying surrender set

:000000 100644 0000000... 9ebab4b... A  data/MSB2-surrender.fa
Owner

rtyley commented Apr 5, 2014

Could you share the original repo (before cleaning) with me?

Thanks for that diagnostic information you've already sent - unfortunately it doesn't quite give me enough information to get a clear picture of the evidence for your assertion that The BFG sometimes prunes empty commits. By 'pruning empty commits', I mean entire commits being removed from commit history when they no longer contain any file changes in their cleaned form, and as I said, I don't think The BFG currently does that at all.

rtyley added a commit that referenced this issue May 4, 2014

Add the option to prune empty commits (issue #27)
This feature removes commits that- after the cleaning process -contain *no*
file-tree change when compared to their parent commit. This would be
because the cleaning process has cleaned away whatever content it was that
was _changing_ in the original commit.

The option is off by default, it's activated by using the
`--prune-empty-commits` flag, eg:

$ bfg --delete-files foo --prune-empty-commits

#27

rtyley added a commit that referenced this issue May 14, 2014

Add the option to prune empty commits (issue #27)
This feature removes commits that- after the cleaning process -contain *no*
file-tree change when compared to their parent commit. This would be
because the cleaning process has cleaned away whatever content it was that
was _changing_ in the original commit.

The option is off by default, it's activated by using the
`--prune-empty-commits` flag, eg:

$ bfg --delete-files foo --prune-empty-commits

#27

@rtyley, are you planning to merge these changes to master? I think this is a very useful feature.

bogdanm commented Aug 21, 2015

+1. Can indeed be very useful.

okravets commented Sep 2, 2015

+1

Owner

rtyley commented Sep 2, 2015

Working on the many open source projects I give to the community takes up a
large proportion of my spare time. If you'd like to support development of
this feature for the BFG, please donate to help me at
https://www.bountysource.com/teams/bfg-repo-cleaner
On 2 Sep 2015 21:28, "okravets" notifications@github.com wrote:

+1


Reply to this email directly or view it on GitHub
#27 (comment)
.

mdengler added a commit to mdengler/bfg-repo-cleaner that referenced this issue Dec 22, 2015

Add the option to prune empty commits (issue #27)
This feature removes commits that- after the cleaning process -contain *no*
file-tree change when compared to their parent commit. This would be
because the cleaning process has cleaned away whatever content it was that
was _changing_ in the original commit.

The option is off by default, it's activated by using the
`--prune-empty-commits` flag, eg:

$ bfg --delete-files foo --prune-empty-commits

rtyley#27

javabrett added a commit to javabrett/bfg-repo-cleaner that referenced this issue May 13, 2016

Add the option to prune empty commits (issue #27)
This feature removes commits that- after the cleaning process -contain *no*
file-tree change when compared to their parent commit. This would be
because the cleaning process has cleaned away whatever content it was that
was _changing_ in the original commit.

The option is off by default, it's activated by using the
`--prune-empty-commits` flag, eg:

$ bfg --delete-files foo --prune-empty-commits

rtyley#27

javabrett added a commit to javabrett/bfg-repo-cleaner that referenced this issue Jan 17, 2017

Add the option to prune empty commits (issue #27)
This feature removes commits that- after the cleaning process -contain *no*
file-tree change when compared to their parent commit. This would be
because the cleaning process has cleaned away whatever content it was that
was _changing_ in the original commit.

The option is off by default, it's activated by using the
`--prune-empty-commits` flag, eg:

$ bfg --delete-files foo --prune-empty-commits

rtyley#27

wuganhao commented Dec 6, 2017

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment