-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter-repo --path doesn't preserve history of content in path that earlier had been out of path (multi-path rename or copy detection) #25
Comments
Yeah, I can see how it'd be nice to have this different functionality, but it's not what was intended by the wording or what I even tried to implement. Note that the wording you quote was not for any specific command line flag, it was just generic what-can-the-tool-do wording. If you look at the actual documentation from the manual for the --path option, it states that it is for "Exact paths (files or directories) to include in filtered history.". The builtin documentation uses the same wording. I would have at a minimum left off the word "Exact" in both places if rename or copy detection was implied; the "Exact" is there just to reinforce that. Also, this is consistent with how the rev-list, log, and fast-export git subcommands work. E.g. But even if --follow implemented following of renames for multiple files or a directory or more, that still wouldn't necessarily be sufficient because perhaps the user needs copy detection (i.e. it wasn't a file renamed from somewhere else, rather it was copied). But with copy detection it's not as clear if you want the full history of the original; I can imagine that in some cases you would but not others. And if we start doing either rename or copy detection, then we're moving from well-defined correct behavior to heuristics. For diffs or logs or even merges that's fine, because the results will be interpreted by a user (even in merge, if the detection is wrong, the user can fixup conflicts and make other edits). Here, we'd record the results of the heuristics in stone. That's a bit worrying to me...and it also means we'd have to open up a pile of knobs (at the very least a similarity percentage, and whether copies are wanted in addition to renames) for configuration. All that said, I wanted something like that when I was using it too. The best compromise I came up with was to have people run 'git filter-repo --analyze' beforehand, look at the renames sub-report, and pick out additional paths by hand based on that to feed to their filter-repo run. The --analyze option still had a few caveats with the rename detection, but that was mostly fundamental to the problem. Providing it and letting the user decide what to include (though I didn't even bother with copy detection), seemed like the best option I had available. I know that's probably not what you wanted to hear, but I hope the --analyze hint and the renames information it generates is at least helpful. |
STEPS TO REPRODUCE:
(Here
93946749a1
comes fromHEAD is now at 93946749a1 [ledger] Migrate ptr.h to ledger/lib
in the output ofgit filter-repo
.)EXPECTED RESULTS:
The first number in the output of the two
git log
commands will be the same, indicating that all of the history ofsrc/ledger/bin/app/app.cc
that was available before filtering the repo remains available.OBSERVED RESULTS:
The numbers are 144 and 25, indicating that a great deal of the history of
src/ledger/bin/app/app.cc
was not preserved. Additionally, browsing the new log for the file indicates that history for the file only goes back as far as when the file was moved intosrc/ledger/
.Am I holding it wrong? I would have thought from "extracting wanted paths and their history (stripping everything else)" that the full history of the content at the given path at HEAD would be retained, even if the content at the given path at HEAD had not always been located at the given path.
The text was updated successfully, but these errors were encountered: