Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect dirt blobs in all trees, not just trees of protected refs #149

Closed
wants to merge 2 commits into from

Conversation

javabrett
Copy link
Contributor

This PR possibly sparks some debate, as there are philosophical aspects of what BFG should do with protected trees and the dirt they contain, and the fact that Git tracks trees/commits and not file-history, but the fact remains that most runners of BFG expect that blobs in protected trees don't appear to move around in the history (pulled-up to HEAD) as they do now as a result of the way Git represents history, logs and diffs.

To that end, I'm interested in discussions on:

  • Correctness - it seems more correct with this change.
  • Performance - how can this be performance-tested, so that the impact of the additional objectId lookup check can be measured?
  • Space/storage - there should be no difference - the blob is protected and retained anyway, it's only a matter of how many trees it is included in, and so where it shows-up as a add-diff.

Things I haven't considered ... if the file was updated, I believe it will appear to git log as an add at the point that the protected version of the file was first committed.

This PR was hacked out of a divergent branch, but hopefully it is OK. The test-change was required because of reused empty files in the test repo.

javabrett added a commit to javabrett/bfg-repo-cleaner that referenced this pull request Jun 1, 2016
javabrett added a commit to javabrett/bfg-repo-cleaner that referenced this pull request Feb 6, 2018
…ot just in protected refs' trees. Fixed rtyley#49, rtyley#53, rtyley#138.

Added objectId exclusion filters during tree blob-cleaning, such that blobs that exist in the trees of protected refs as stored in the census (AKA dirt) are protected not only in those trees, but in any other trees in which they occur in in the walked-history.  This prevents the perception that those files are being deleted in-history and then re-added in the final re-written commit, which is the protected HEAD (with dirt) and its untouched tree.  This is more a perception because Git does not track the history of individual files, but it does show diffs and logs that indicate such changes, and the behaviour prior to this change is to remove protected blobs from non-protected history trees, retaining them only in the final HEAD ref, which then shows as an add in that commit when logs/diffs are taken.

This change is convenient if your clean-up selectors (by name, or size) do select some dirt (files still in HEADs that you want to keep) but you don't want those files to appear as if they were recently added to HEAD.
The protected submodule test has a side-effect brought about by the re-use of the same empty file for foo.txt, moo.txt and the submodule bar/comp.  Because the same file blob is used, foo.txt when removed from history was no longer removed because it was protected by bar/comp in master HEAD.

This re-writes the commits in that repository so that moo.txt and bar/comp are different blobs to foo.txt, so that foo.txt can be removed from the history and the test pass.
@iamareebjamal
Copy link

Any updates on when this will be merged?

@javabrett javabrett closed this Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants