-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request/idea: dry-run mode #17
Comments
Hi there @tfnico - hmm, my reply to this got surprisingly long, which is weird given how simple the feature sounds (I guess this is probably an indication of how obsessive I am about this stuff at the moment). Ok, to define the feature story:
It's possible to do a small imperfect chunk of this without any problem - if we're talking specifically about the Once we're talking about evaluating the results of other operations, ie the Perhaps surprisingly, given the identical runtime, this does mean the user ends up in the position that in some ways they have less diagnostic information than if they're just run the BFG for real, because now they can't actually examine the cleaned commits... which means the diagnostic output from the BFG needs to be beefed up - although I'm quite proud of the diagnostic output that The BFG does supply, it could still do with improvement, ie some variant of the stuff in #14 (display diffs of changed content) and #15 (log detailed diagnostics to file) to make |
@rtyley Sounds great. It would certainly be cool to output diffs for replaced texts, and lists of files that have been deleted. I think rewriting without changing the refs sounds like a fair compromise. Performance is smooth anyhow. |
I second that feature, I'm just afraid to use it on a big company repo, without the dry run option |
Hi @alistra - just so I can better understand the use-case, is there any reason you can't just do a |
it's harder to see the changes, I would have to manually browse around 6 branches (that are around 2 year old), would be nicer just to go through the changes list with branch/file pairs and check if we wouldn't delete something important accidentally. Not all of the code in the repo is used all the time, so the mistake wouldn't be obvious right away. |
Would you want to check every single commit on those branches (which potentially could be a lot of very repetitive information) or would you be interested in just the tips of those branches, ie the latest commit on each branch? |
Ideal solution would deduplicate the same files and tell me:
Then in order of usefulness I would like the checking the tips of branches, then the whole big dump of data |
I don't necessarily need a I am also afraid of accidentally deleting a big file that could be useful in the future (although not present in my most recent commit). |
Dry run would be very useful in my opinion, too. |
👍 for dry runs. I'm an intermediate git user, and what I'd like is to:
That way I can compare the bfg'ed repo with one of known quality and ensure files are fine. |
This should make seeing what the BFG has got up to a lot easier, and will make a dry-run mode (still not implemented) much more useful. #26 #17 (comment) The new output looks like this (and it only appears if files *have* been changed, or deleted, as appropriate). ``` Changed files ------------- File Before After -------------------------------------------------- bushhidthefacts-ORIGINAL.txt | 93fd267a | 4f6f1558 Deleted files ------------- File git-id Size (bytes) ----------------------------------- video.mp4 | 294f4016 | 126384 ```
This should make seeing what the BFG has got up to a lot easier, and will make a dry-run mode (still not implemented) much more useful. #26 #17 (comment) The new output looks like this (and it only appears if files *have* been changed, or deleted, as appropriate). ``` Changed files ------------- Filename Before & After -------------------------------------------------------------------------------------------------- CODE.conf | e3aa4a56 ⇒ b9241055, 6fd90c18 ⇒ 1f390cd7, ... PROD.conf | 5a89032a ⇒ 38193000, 2611394d ⇒ 9c742f65, ... Deleted files ------------- Filename Git id ------------------------------------------------ bg.jpg | d0ea4091 (2.0 MB) guardian_space002.png | 24215b1e (1.2 MB) ```
This should make seeing what the BFG has got up to a lot easier, and will make a dry-run mode (still not implemented) much more useful. #26 #17 (comment) The new output looks like this (and it only appears if files *have* been changed, or deleted, as appropriate). ``` Changed files ------------- Filename Before & After -------------------------------------------------------------------------------------------------- CODE.conf | e3aa4a56 ⇒ b9241055, 6fd90c18 ⇒ 1f390cd7, ... PROD.conf | 5a89032a ⇒ 38193000, 2611394d ⇒ 9c742f65, ... Deleted files ------------- Filename Git id ------------------------------------------------ bg.jpg | d0ea4091 (2.0 MB) guardian_space002.png | 24215b1e (1.2 MB) ```
👍 I literally downloaded bfg and looked for a |
+1 |
As eluded-to in comments above, I think this request decomposes into:
Also as commented above, I'm trying to understand the advantage of a Since (on Linux anyway) Say I pick a decent-size repo, the Linux Kernel, and run an academic clean-1M+ on it and time it. First I'll create a hard-linked local clone:
... then run bfg:
5 seconds for the clone, 5m13s for the bfg-run, total 5m18s. Check the original repo and it is untouched. Check the object-store in Compare this to a plain run on the repo:
About the same, 5m and change. So for the low-cost (5 seconds) of a local clone, you can do a real-test-run rather than a dry-run/report. Of course Windows users, not having hardlinks available, would have to wear the extra time and space cost of the initial clone. Also of course, you will have to pay for the disk-space usage as bfg writes to the test-repo. So it feels like a native |
I was also looking for a dry-run. But when not finding it I hoped the real run would print out some info, but I only saw a list of updated refs, it would be very helpful if it actually printed exactly what files/folders were deleted in addition. ( I was using --delete-folders ) |
My repo is huge and it would be nice to not have to make a copy of it. Very scared to run without a dry run! |
See also my comment above, but consider the following:
That is, test local clones are incredibly cheap, it is necessary to run BFG to really see what it will achieve, ergo it is better to actually run it, and there is dubious value in a dry-run mechanism. |
I like this feature idea because sometimes ^^^ is not as true as we'd like -- I'm trying to remove junk data from a repo that is 3.9 GB when checked out. (Full disclosure: I didn't do it! I'm trying to fix it : )
I admit that even thought this feature would be nice it definitely falls under the "nice to have" category. |
This should make seeing what the BFG has got up to a lot easier, and will make a dry-run mode (still not implemented) much more useful. rtyley/bfg-repo-cleaner#26 rtyley/bfg-repo-cleaner#17 (comment) The new output looks like this (and it only appears if files *have* been changed, or deleted, as appropriate). ``` Changed files ------------- Filename Before & After -------------------------------------------------------------------------------------------------- CODE.conf | e3aa4a56 ⇒ b9241055, 6fd90c18 ⇒ 1f390cd7, ... PROD.conf | 5a89032a ⇒ 38193000, 2611394d ⇒ 9c742f65, ... Deleted files ------------- Filename Git id ------------------------------------------------ bg.jpg | d0ea4091 (2.0 MB) guardian_space002.png | 24215b1e (1.2 MB) ```
I'd like to reinforce the need to a dry-run mode, I'm cleaning up a massive repo and it's painful to have to clone it twice to use bfg |
@lovesegfault so we can put numbers to this ... what are the timings if a) first clone remote? To local then second clone local to local perhaps allowing hard links. |
@javabrett First clone takes a good 30mins |
Maybe try |
Well, when working on tens of gigs repos (like Unity/Unreal) you'd be happy to have a --dry-run saving lots of time. |
In order to preview which files would be removed with -b, I first used some perl script to see which files would be deleted. However, it would seem practical if BFG could be run in dry-mode to see what the output would be, without actually doing any changes in the repo.
Of course, it's also easy to just make another clone to do the test-run first. But if it's easy to implement dry-run, why not.
The text was updated successfully, but these errors were encountered: