Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbound (allow multiple) file matcher options #137

Closed

Conversation

javabrett
Copy link
Contributor

This change allows multiple values to the four options (--delete-files, --delete-folders, --filter-content-including and --filter-content-excluding) which use the file-matcher option. Currently these options only accept a single value, which can be cumbersome and discourage or prevent batch-cleaning in a single pass of bfg.

I added some spec tests for delete files and folders. I couldn't see existing spec tests for the other two, nor a quick way of adding test-replacement assertions, although I'm sure that can be done. Also the filter glob options already had backing Seqs, they just were able to accept multiple values, so I expect they will work.

I originally fell into a bad trap of trying to make the options themselves Seqs and use scopt's comma-sep single-value parsing, but that falls-foul of any value containing a comma e.g. a regex glob, so breaks tests immediately. On reflection is it better not to allow this and just use single-value-unbounded-count options and collect them in a Seq.

@Clement-TS
Copy link

This would be really helpful !

Whoaa512 added a commit to Whoaa512/bfg-repo-cleaner that referenced this pull request Apr 13, 2020
rtyley#137

Squashed commit of the following:

commit 141d2b9
Author: Brett Randall <javabrett@gmail.com>
Date:   Fri Apr 8 14:34:09 2016 +1000

    Made -D/--delete-files and --delete-folders options accept multiple values.

commit cbc8258
Author: Brett Randall <javabrett@gmail.com>
Date:   Fri Apr 8 14:33:47 2016 +1000

    Removed blank lines from end of source file.
@WalterObrain
Copy link

Hi @javabrett, I came here back going through the following query in the stack overflow, I am looking for removing multiple files(100+) from the git history using the BFG.
here if I have some files I can do the following way.
java -jar bfg-1.12.15.jar --delete-files "{continue.txt,event.txt,list,hosts,active.txt,percentage.txt,inactive.txt,invalid.txt,.zip}" my-repo.git/
is there any way where we bottle add all the records list into something similar delete files-list.txt and can do
java -jar bfg-1.12.15.jar --delete files-list.txt myrepo.git?

@javabrett
Copy link
Contributor Author

@WalterObrain I think the option you are after is:

-bi, --strip-blobs-with-ids <blob-ids-file>
                           strip blobs with the specified Git object ids

@WalterObrain
Copy link

Thanks, @javabrett, but am looking for the scenario where we have a lot of files I would like to go with the files option instead of the --strip-blobs-with-ids
in that case is there any customized future available.

@WalterObrain
Copy link

Lets say in my scenario i have file1.txt,file3.txt,file23.sql,file24.csv like I was having 100+ files,
so I am especially looking for handling the multiple files at a single shoot.

-b, --strip-blobs-bigger-than <size> strip blobs bigger than X (eg '128K', '1M', etc) -B, --strip-biggest-blobs NUM strip the top NUM biggest blobs -bi, --strip-blobs-with-ids <blob-ids-file> strip blobs with the specified Git object ids -D, --delete-files <glob> delete files with the specified names (eg '*.class', '*.{txt,log}' - matches on file name, not path within repo) --delete-folders <glob> delete folders with the specified names (eg '.svn', '*-tmp' - matches on folder name, not path within repo) --convert-to-git-lfs <value> extract files with the specified names (eg '*.zip' or '*.mp4') into Git LFS -rt, --replace-text <expressions-file> filter content of files, replacing matched text. Match expressions should be listed in the file, one expression per line - by default, each expression is treated as a literal, but 'regex:' & 'glob:' prefixes are supported, with '==>' to specify a replacement string other than the default of '***REMOVED***'. -fi, --filter-content-including <glob> do file-content filtering on files that match the specified expression (eg '*.{txt,properties}') -fe, --filter-content-excluding <glob> don't do file-content filtering on files that match the specified expression (eg '*.{xml,pdf}') -fs, --filter-content-size-threshold <size> only do file-content filtering on files smaller than <size> (default is 1048576 bytes) -p, --protect-blobs-from <refs> protect blobs that appear in the most recent versions of the specified refs (default is 'HEAD') --no-blob-protection allow the BFG to modify even your *latest* commit. Not recommended: you should have already ensured your latest commit is clean. --private treat this repo-rewrite as removing private data (for example: omit old commit ids from commit messages) --massive-non-file-objects-sized-up-to <size> increase memory usage to handle over-size Commits, Tags, and Trees that are up to X in size (eg '10M') <repo> file path for Git repository to clean

@javabrett
Copy link
Contributor Author

Thanks, @javabrett, but am looking for the scenario where we have a lot of files I would like to go with the files option instead of the --strip-blobs-with-ids
in that case is there any customized future available.

That is what this PR proposes.

For complex file selection cases it has proved easiest to use external scripting to identify the blobs to delete then delete using that list.

@javabrett javabrett closed this Oct 31, 2022
@hiteshshahk
Copy link

hiteshshahk commented Aug 1, 2023

With the latest version of bfg jar, it seems the options to exclude multiple files does not work.
bfg --filter-content-excluding '.xml,.jar,*.java' --strip-blobs-bigger-than 1M repository or bfg --strip-blobs-bigger-than 2M --filter-content-excluding '*.{java,xml,jsp}' repository.
In the above command only one option(--strip-blobs-bigger-than 1M) seems to be working.
Has the code change of https://github.com/javabrett/bfg-repo-cleaner/tree/unbound-file-matcher-options been merged in the latest bfg version which is available here https://repo1.maven.org/maven2/com/madgag/bfg/1.14.0/bfg-1.14.0.jar.

Another option I tried is I checked out the code https://github.com/javabrett/bfg-repo-cleaner/tree/unbound-file-matcher-options and built it in my local using sbt compile, run, package and tried to use it but getting below error when I actually try to use the generated jar bfg-1.13.1-SNAPSHOT.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants