Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Filter branch #429

Merged
merged 1 commit into from Jun 15, 2013

Conversation

Projects
None yet
4 participants
Member

ben commented May 11, 2013

This adds an idiomatic c-sharpy api for doing filter-branch-like operations. This works without ever touching the index or the working directory, so it should be wicked fast.

  • Header rewriting (author, message, etc)
  • Tree rewriting using TreeDefinition
  • Tag rewriting (like --tag-filter)
  • Parent-commit munging (like --parent-filter)
  • Back up original refs to refs/original
  • Name review
  • XML documenation
  • Squash

The API looks like this:

repo.Refs.RewriteHistory(
    // A collection of commits that should be rewritten
    repo.Head.Commits,

    // A function that returns a `TreeDefinition` for the new commit
    commitTreeRewriter: c =>
        {
            var td = TreeDefinition.From(c);
            td.Remove("README");
            return td;
        },

    // A function that returns header information to be used for the new commit
    commitHeaderRewriter: c =>
        {
            var ch = CommitRewriteInfo.From(c);
            ch.Message += "\n\nCleaned-by: The Cleaner";
            return ch;
        });

@dahlbyk dahlbyk commented on an outdated diff May 11, 2013

LibGit2Sharp.Tests/ReferenceFixture.cs
+ public void CanDoThing()
+ {
+ using (var repo = new Repository(StandardTestRepoWorkingDirPath))
+ {
+ var result = repo.Refs.SubsetOfTheseReferencesThatCanReachAnyOfTheseCommits(repo.Refs, new[] { repo.Lookup<Commit>("f8d44d7"), repo.Lookup<Commit>("6dcf9bf") });
+ // Should get "i-do-numbers" and "diff-test-cases"
+ var expected = new []
+ {
+ "refs/heads/diff-test-cases",
+ "refs/heads/i-do-numbers",
+ "refs/remotes/origin/test",
+ "refs/tags/e90810b",
+ "refs/tags/lw",
+ "refs/tags/test",
+ };
+ Assert.Equal(expected, result.Select(x => x.CanonicalName).OrderBy(x => x));
@dahlbyk

dahlbyk May 11, 2013

Member

Generally helpful to tack on a ToArray() after OrderBy() to get a better message on failure.

@dahlbyk dahlbyk commented on an outdated diff May 11, 2013

LibGit2Sharp/CommitHeader.cs
@@ -0,0 +1,21 @@
+namespace LibGit2Sharp
+{
+ public class CommitHeader
+ {
+ public Signature Author { get; set; }
+ public Signature Committer { get; set; }
+ public string Message { get; set; }
+ public string Encoding { get; set; }
+
+ public static CommitHeader From(Commit c)
@dahlbyk

dahlbyk May 11, 2013

Member

Common use cases could be simplified by providing optional parameters for each property:

c => CommitHeader.From(c, author: new Signature("Ben Straub", "me@example.com", c.Author.When))

instead of

c => {
    var h = CommitHeader.From(c);
    h.Author = new Signature("Ben Straub", "me@example.com", h.Author.When);
    return h;
}
@dahlbyk

dahlbyk May 11, 2013

Member

(It would be worth keeping an overload without optional parameters too: commitHeaderRewriter = commitHeaderRewriter ?? CommitHeader.From; is beautiful.)

@dahlbyk dahlbyk and 2 others commented on an outdated diff May 11, 2013

LibGit2Sharp/ReferenceCollection.cs
@@ -333,5 +333,23 @@ public virtual ReflogCollection Log(Reference reference)
return new ReflogCollection(repo, reference.CanonicalName);
}
+
+ public virtual IEnumerable<Reference> SubsetOfTheseReferencesThatCanReachAnyOfTheseCommits(IEnumerable<Reference> refs, IEnumerable<Commit> targets)
@dahlbyk

dahlbyk May 11, 2013

Member

As this is independent of a particular ReferenceCollection, would it make more sense as an extension method on IEnumerable<Reference>, e.g. repo.Refs.ReachableFrom(commits)?

@nulltoken

nulltoken May 12, 2013

Member

I agree that a extension method would be handy. However, I fear that it might not be that discoverable. How about having both? BTW, ReachableFrom is an awesome name for the extension method!

@dahlbyk @carlosmn Can you think of a better name than SubsetOfTheseReferencesThatCanReachAnyOfTheseCommits() for the ReferenceCollection method.

@carlosmn

carlosmn May 12, 2013

Owner

I'd go for something more like CanReach (or CanReachAny) as the extension method. Commits (or objects) are reachable, but you're returning a list of references, so you need to talk about being able to reach or containing.

For the function, the "OfTheseCommits" is redundant, as it's in the signature that you're passing commits. Maybe something like ReferencesThatCanReachAny, though it's not all that good.

@dahlbyk dahlbyk and 1 other commented on an outdated diff May 11, 2013

LibGit2Sharp/BranchCollection.cs
@@ -215,5 +215,69 @@ private string DebuggerDisplay
"Count = {0}", this.Count());
}
}
+
+ public virtual void RewriteHistory(
@dahlbyk

dahlbyk May 11, 2013

Member

This feels weird on BranchCollection. Maybe ReferenceCollection is a better fit?

@nulltoken nulltoken commented on an outdated diff May 12, 2013

LibGit2Sharp/CommitHeader.cs
@@ -0,0 +1,21 @@
+namespace LibGit2Sharp
+{
+ public class CommitHeader
@nulltoken

nulltoken May 12, 2013

Member

I think @arrbee's right. Let's rename this to CommitMetaData

@nulltoken nulltoken commented on an outdated diff May 12, 2013

LibGit2Sharp/CommitHeader.cs
@@ -0,0 +1,21 @@
+namespace LibGit2Sharp
+{
+ public class CommitHeader
+ {
+ public Signature Author { get; set; }
+ public Signature Committer { get; set; }
+ public string Message { get; set; }
+ public string Encoding { get; set; }
@nulltoken

nulltoken May 12, 2013

Member

As we're actually not currently dealing with the encoding while recreating the Commit, maybe would it be safer to make the setter private?

@nulltoken nulltoken and 1 other commented on an outdated diff May 12, 2013

LibGit2Sharp/TreeDefinition.cs
@@ -35,6 +35,18 @@ public static TreeDefinition From(Tree tree)
return td;
}
+ /// <summary>
+ /// Builds a <see cref = "TreeDefinition" /> from a <see cref="Commit"/>'s <see cref = "Tree" />.
+ /// </summary>
+ /// <param name="commit">The <see cref="Commit"/> whose tree is to be processed</param>
+ /// <returns>A new <see cref = "TreeDefinition" /> holding the meta data of the <paramref name = "commit" />'s <see cref="Tree"/>.</returns>
+ public static TreeDefinition From(Commit commit)
+ {
@nulltoken

nulltoken May 12, 2013

Member

Could you please isolate this in a separate commit?

@ben

ben May 13, 2013

Member

The addition of the method? That's pretty much all that's in 7a0cf18, but I can pull out the other hunk from that commit if you like.

@nulltoken

nulltoken May 13, 2013

Member

Scratch that. My mistake. 😊

Member

ben commented May 13, 2013

I just pushed up what I did on my flight. Let me try to synthesize the feedback here and match it up with where things stand now:

  1. CommitHeader.From with optional parameters turns out to have some difficulties, since a conversion to a Func<> doesn't work any more. I ended up doing From with optional parameters (which is great), and the simple duplication call is called SameAs.
  2. The reachability call is kind of awkward no matter where it lives. I moved it over to IEnumerable<Reference> and called it ReachableFrom, but it still seems clunky. The problem is that it needs a set of refs and two sets of commits. If it belonged to Repository, it would get the full commit graph for free, but I kind of don't like polluting that class with tons of calls.
  3. RewriteHistory moved to Repository. It affects commits, tags, and refs, so I can't imagine a better place for it than the class that ties all those together.
  4. The DTO class for rewriting metadata ended up being called CommitRewriteInfo. Does that work for everyone?
  5. I removed the CommitRewriteInfo.Encoding field, but now I'm wondering if that was a mistake. Do you think it's useful during a rewrite to know what the encoding was originally? We're not allowing writing of custom encodings.

Also, I checked off a few of the boxes, so this is a bit more powerful now. Let me know what you think.

@ben ben and 1 other commented on an outdated diff May 13, 2013

LibGit2Sharp.Tests/FilterBranchFixture.cs
+ // * 5b5b025 another commit
+ // * 8496071 testing
+ [Fact]
+ public void HandlesExistingBackedUpRefs()
+ {
+ string path = CloneBareTestRepo();
+ using (var repo = new Repository(path))
+ {
+ var commits = repo.Commits.QueryBy(new Filter { Since = repo.Refs }).ToArray();
+ repo.RewriteHistory(commits, commitHeaderRewriter: c => CommitRewriteInfo.From(c, message: "abc"));
+
+ throw new AssertException("What should we do here?");
+ repo.RewriteHistory(commits, commitHeaderRewriter: c => CommitRewriteInfo.From(c, message: "abc"));
+ Assert.Empty(repo.Refs.Where(x => x.CanonicalName.StartsWith("refs/original/original/")));
+ }
+ }
@ben

ben May 13, 2013

Member

The travis build is going to fail here. The question is, what should happen if we try to back up the refs to refs/original (or wherever), and it already exists? We could:

  • Pretend we're doing filter-branch -f, and overwrite them.
  • Throw an exception of some sort, though I'd want to do a checking pass first so we don't actually do a rewrite if it's going to fail.
  • All of the above, with some way of turning it on.
  • A callback for collisions? I only mention it because I've seen it done inside libgit2.
@nulltoken

nulltoken May 14, 2013

Member

Throw an exception of some sort, though I'd want to do a checking pass first so we don't actually do a rewrite if it's going to fail.

👍

Member

dahlbyk commented May 13, 2013

CommitHeader.From with optional parameters turns out to have some difficulties, since a conversion to a Func<> doesn't work any more. I ended up doing From with optional parameters (which is great), and the simple duplication call is called SameAs.

R# will complain, but it's perfectly legal to have overlapping methods with and without optional parameters so you can still use CommitHeader.From as a Func<Commit, CommitHeader>.

The reachability call is kind of awkward no matter where it lives. I moved it over to IEnumerable<Reference> and called it ReachableFrom, but it still seems clunky. The problem is that it needs a set of refs and two sets of commits. If it belonged to Repository, it would get the full commit graph for free, but I kind of don't like polluting that class with tons of calls.

I had overlooked the use of repo.Commits - you're right, it does feel clunky. It's cheating a bit, but exposing GitObject.repo internally would allow you to grab repo.Commits from one of the targets instead of requiring allCommits as an argument.

Unrelated microoptimization: you could capture targetsList in a HashSet<> for more efficient lookup, since we're checking for each commit. At least I expect it would perform better...

I removed the CommitRewriteInfo.Encoding field, but now I'm wondering if that was a mistake. Do you think it's useful during a rewrite to know what the encoding was originally? We're not allowing writing of custom encodings.

Encoding is available through the Commit passed to the callback - properties on CommitRewriteInfo are only useful as input back into the tree.

@nulltoken nulltoken and 1 other commented on an outdated diff May 13, 2013

LibGit2Sharp.Tests/FilterBranchFixture.cs
+ [Fact]
+ public void DoesNotRewriteRefsThatDontChange()
+ {
+ string path = CloneBareTestRepo();
+ using (var repo = new Repository(path))
+ {
+ repo.RewriteHistory(new[] { repo.Lookup<Commit>("c47800c") },
+ c => CommitRewriteInfo.From(c, message: "abc"));
+ Assert.Null(repo.Refs["refs/original/heads/packed-test"]);
+ Assert.NotNull(repo.Refs["refs/original/heads/br2"]);
+ }
+ }
+
+
+ // This test should rewrite br2, but not packed-test:
+ // * a4a7dce (br2) Merge branch 'master' into br2
@nulltoken

nulltoken May 13, 2013

Member

Shouldn't this comment rather decorate the test above?

@ben

ben May 13, 2013

Member

Oh yeah. It should. 😊

Member

ben commented May 13, 2013

R# will complain, but it's perfectly legal to have overlapping methods [...]

Hey, you're right! I was trusting the squigglies too much.

It's cheating a bit, but exposing GitObject.repo internally [...]

Yup. I like cheating. That makes it much nicer to use, too. Now I'm wondering if it makes sense to make GitObject.repo public, it could simplify some client code too.

@dahlbyk dahlbyk commented on the diff May 14, 2013

LibGit2Sharp/CommitRewriteInfo.cs
+ /// </summary>
+ public Signature Committer { get; set; }
+
+ /// <summary>
+ /// The message to be used for the new commit
+ /// </summary>
+ public string Message { get; set; }
+
+ /// <summary>
+ /// Match the <see cref="Commit"/> passed in
+ /// </summary>
+ /// <param name="commit">The <see cref="Commit"/> whose information is to be copied</param>
+ /// <returns>A new <see cref="CommitRewriteInfo"/> object that matches the info for the <paramref name="commit"/>.</returns>
+ public static CommitRewriteInfo From(Commit commit)
+ {
+ return new CommitRewriteInfo
@dahlbyk

dahlbyk May 14, 2013

Member

Just personal preference, but I'm inclined to capture the values from commit in the other overload, and just defer to that one here.

@dahlbyk dahlbyk commented on an outdated diff May 14, 2013

LibGit2Sharp/Repository.cs
+ /// <param name="commitsToRewrite">The <see cref="Commit"/>objects to rewrite</param>
+ /// <param name="commitHeaderRewriter">Visitor for rewriting commit metadata</param>
+ /// <param name="commitTreeRewriter">Visitor for rewriting commit trees</param>
+ /// <param name="referenceNameRewriter">Visitor for renaming backed-up refs. If this returns null, that ref will not be backed up.</param>
+ /// <param name="parentRewriter">Visitor for mangling parent links</param>
+ public virtual void RewriteHistory(
+ IEnumerable<Commit> commitsToRewrite,
+ Func<Commit, CommitRewriteInfo> commitHeaderRewriter = null,
+ Func<Commit, TreeDefinition> commitTreeRewriter = null,
+ Func<string, string> referenceNameRewriter = null,
+ Func<IEnumerable<Commit>, IEnumerable<Commit>> parentRewriter = null )
+ {
+ IList<Reference> originalRefs = Refs.ToList();
+ if (originalRefs.Count == 0)
+ {
+ // No ref to rewrite. What should we do here? Silently return? Throw InvalidOperationException?
@dahlbyk

dahlbyk May 14, 2013

Member

I'm fine with a no-op - you have to go well out of your way to end up with a Repository that has no refs.

@dahlbyk dahlbyk and 1 other commented on an outdated diff May 14, 2013

LibGit2Sharp/Repository.cs
+ shaMap[commit] = newCommit;
+ }
+
+ // Rewrite the refs
+ foreach (var reference in refsToRewrite)
+ {
+ // Symbolic ref? Leave it alone
+ if (!(reference is DirectReference))
+ continue;
+
+ // TODO: deal with tags; chaining can get hairy
+ if (reference.IsTag())
+ continue;
+
+ // Direct ref? Overwrite it, point to the new commit
+ var directRef = reference as DirectReference;
@dahlbyk

dahlbyk May 14, 2013

Member

I'd use a direct cast here instead of as, since we already know reference is a DirectReference. Or maybe move this up top and continue if directRef == null?

@nulltoken

nulltoken May 14, 2013

Member

Cast -> 👍

Member

nulltoken commented May 14, 2013

RewriteHistory moved to Repository. It affects commits, tags, and refs, so I can't imagine a better place for it than the class that ties all those together.

@ben Could you please rather move it into ReferenceCollection?

@nulltoken nulltoken commented on an outdated diff May 14, 2013

LibGit2Sharp/Repository.cs
+ continue;
+
+ // TODO: deal with tags; chaining can get hairy
+ if (reference.IsTag())
+ continue;
+
+ // Direct ref? Overwrite it, point to the new commit
+ var directRef = reference as DirectReference;
+ var oldCommit = directRef.Target as Commit;
+ if (oldCommit == null) continue;
+ if (shaMap.ContainsKey(oldCommit))
+ {
+ var newName = referenceNameRewriter(reference.CanonicalName.Substring(5));
+ if (newName != null)
+ {
+ Refs.Add(newName, reference.TargetIdentifier, true, "rewrite history");
@nulltoken

nulltoken May 14, 2013

Member

Hmmm. Maybe would the reflog also deserve some love 😜

More insight here

@ben ben commented on an outdated diff May 14, 2013

LibGit2Sharp.Tests/FilterBranchFixture.cs
+ }
+
+ [Fact]
+ public void HandlesExistingBackedUpRefs()
+ {
+ string path = CloneBareTestRepo();
+ using (var repo = new Repository(path))
+ {
+ Func<Commit, CommitRewriteInfo> headerRewriter = c => CommitRewriteInfo.From(c, message: "abc");
+
+ repo.RewriteHistory(repo.Head.Commits, commitHeaderRewriter: headerRewriter);
+ Assert.Throws<InvalidOperationException>(() =>
+ repo.RewriteHistory(repo.Head.Commits, commitHeaderRewriter: headerRewriter));
+ Assert.Empty(repo.Refs.Where(x => x.CanonicalName.StartsWith("refs/original/original/")));
+ }
+ }
@ben

ben May 14, 2013

Member

Okay, I've fixed it so it throws InvalidOperationException. Does that sound right?

Member

ben commented May 14, 2013

@nulltoken mentioned the reflog. Looking at git's code, it seems like it only generates a few distinct reflog messages.

  1. filter-branch: rewrite on a successful rewrite.
  2. filter-branch: backup when backing up.
  3. filter-branch: delete if the ref was deleted, maybe because of a --prune-empty or somesuch.
  4. filter-branch: rewrite to first if a ref was rewritten into multiple commits.

I've corrected 1 and 2; RewriteHistory now emits the same messages as git would have.

Cases 3 and 4 aren't directly supported, but you could simulate them with the parent-rewrite hook (by either omitting or inserting parent links). The problem is that, since RewriteHistory isn't aware that this is happening, it can't detect what happened and emit the right message. Any opinions on the right way to support this?

@ben ben and 1 other commented on an outdated diff May 16, 2013

LibGit2Sharp/ReferenceCollection.cs
+ tagMap[tag] = newTag;
+ return newTag;
+ }
+
+ /// <summary>
+ /// Rewrite some or all of the repository's commits and references
+ /// </summary>
+ /// <param name="commitsToRewrite">The <see cref="Commit"/>objects to rewrite</param>
+ /// <param name="commitHeaderRewriter">Visitor for rewriting commit metadata</param>
+ /// <param name="commitTreeRewriter">Visitor for rewriting commit trees</param>
+ /// <param name="referenceNameRewriter">Visitor for renaming backed-up refs. If this returns null, that ref will not be backed up.</param>
+ /// <param name="tagNameRewriter">Visitor for renaming tags.
+ /// If this returns null, the tag will be deleted.
+ /// If it returns the empty string, the tag will not be changed.
+ /// If it returns the input, the tag will be moved.
+ /// Any other value results in a new tag.</param>
@ben

ben May 16, 2013

Member

This is an emulation of what git provides, but now that I've written it I'm not totally convinced it's the right thing. There's a lot of behavior here that's keying off the value of a returned string. Would it make more sense to have this be an Action<Tag, GitObject> that receives the old tag and the new target? We could even call it in the right order for chained tags.

@nulltoken

nulltoken May 17, 2013

Member

If this returns null, the tag will be deleted.

Can this handle the following use case?

initial:

refs/tags/my-tag -> TagA -> TagB -> TagC -> GitObject (either a Blob, Tree or a Commit)

expected output:

refs/original/tags/my-tag -> TagC -> GitObject (either a Blob, Tree or a Commit)
or
refs/original/tags/my-tag -> TagA -> GitObject (either a Blob, Tree or a Commit)
or
refs/original/tags/my-tag -> GitObject (either a Blob, Tree or a Commit)
@nulltoken

nulltoken May 17, 2013

Member

Any other value results in a new tag

Can this handle the following use case?

rewriter = x => { return x == "v0.0.1" ? "v0.0.1rc" : x }

initial:

refs/tags/my-tag -> TagA (name: test) -> TagB (name: v0.0.1) -> TagC (name: another) -> GitObject (either a Blob, Tree or a Commit)

expected output:

refs/original/tags/my-tag -> TagA (name: test) -> TagB (name: v0.0.1rc) -> TagC (name: another) -> GitObject (either a Blob, Tree or a Commit)
@ben

ben May 17, 2013

Member

initial:

refs/tags/my-tag -> TagA -> TagB -> TagC -> GitObject (either a Blob, Tree or a Commit)

expected output:

refs/original/tags/my-tag -> TagC -> GitObject (either a Blob, Tree or a Commit)
or
refs/original/tags/my-tag -> TagA -> GitObject (either a Blob, Tree or a Commit)
or
refs/original/tags/my-tag -> GitObject (either a Blob, Tree or a Commit)

My gut feeling is that refs/original/tags/my-tag should now point to refs/original/tags/TagA – the tag-chain rewriting should act a lot more like the commit rewriting than it currently does. I'll add this case to the tests, but it may already work.

initial:

refs/tags/my-tag -> TagA (name: test) -> TagB (name: v0.0.1) -> TagC (name: another) -> GitObject (either a Blob, Tree or a Commit)

expected output:

refs/original/tags/my-tag -> TagA (name: test) -> TagB (name: v0.0.1rc) -> TagC (name: another) -> GitObject (either a Blob, Tree or a Commit)

This one probably already works. Whatever the result of the name callback, the code records the before and after states of every rewritten tag in tagMap. So when rewriting TagA, the code notices that it's pointing to a TagAnnotation. It tries to rewrite the target tag TagB first (using a cached result if that's already done), and creates a new TagA that points to TagB's annotation.

@ben

ben May 24, 2013

Member

I'm coming back to this, and more and more I'm thinking that tag handling should be left to the caller. The default behavior is to do nothing to tags (to preserve signing, etc) – not even backing them up. If the caller wants to rewrite them, they're completely able to.

@nulltoken nulltoken and 1 other commented on an outdated diff May 17, 2013

LibGit2Sharp.Tests/FilterBranchFixture.cs
+ Assert.NotEmpty(repo.Refs.Where(x => x.CanonicalName.StartsWith("refs/original")));
+
+ Assert.Empty(repo.Refs.Where(x => x.CanonicalName.StartsWith("refs/rewritten")));
+ repo.Refs.RewriteHistory(repo.Head.Commits,
+ commitHeaderRewriter: c => CommitRewriteInfo.From(c, message: "abc"),
+ referenceNameRewriter: x => "refs/rewritten/" + x);
+ Assert.NotEmpty(repo.Refs.Where(x => x.CanonicalName.StartsWith("refs/rewritten")));
+ }
+
+ [Fact]
+ public void CanPreventRefsFromBeingBackedUp()
+ {
+ var numberOfRefs = repo.Refs.Count();
+
+ repo.Refs.RewriteHistory(repo.Head.Commits, c => CommitRewriteInfo.From(c, message: "abc"),
+ referenceNameRewriter: _ => null);
@nulltoken

nulltoken May 17, 2013

Member

I'm not sure I understand this use case. What's the point of accepting a null rewriter? Shouldn't we rather throw?

@ben

ben May 17, 2013

Member

It's pretty arbitrary, huh? Maybe this is another case where the right API is to take a Func<Reference, ObjectId> (old reference, new target) and a Func<Reference, Reference> (old reference, new symbolic target) instead of inferring meaning from strings. Would that be better? We'd have to do the ordering right to avoid complexity inside the callback.

@ben

ben May 17, 2013

Member

No, wait. The use case here is what to call the refs that are backed up. The default behavior is to copy the rewritten refs to refs/original (i.e. the old refs/heads/master is copied to refs/original/heads/master before the new refs/heads/master is written). Returning null means that you don't want to keep a backup.

I'm going to rewrite this as I mentioned above, I think it provides more power to the caller. Stay tuned.

@nulltoken nulltoken and 1 other commented on an outdated diff May 17, 2013

LibGit2Sharp/ReferenceCollection.cs
+ var directRef = reference as DirectReference;
+ var oldCommit = directRef.Target as Commit;
+ if (oldCommit == null) continue;
+ if (shaMap.ContainsKey(oldCommit.Id))
+ {
+ var newName = referenceNameRewriter(reference.CanonicalName.Substring("refs/".Length));
+ if (!string.IsNullOrEmpty(newName))
+ {
+ this.Add(newName, reference.TargetIdentifier, true, "filter-branch: backup");
+ UpdateTarget(directRef, shaMap[oldCommit.Id], "filter-branch: rewrite");
+ }
+ }
+ }
+
+ // Rewrite the tags
+ var tagMap = new Dictionary<Tag, Tag>();
@nulltoken

nulltoken May 17, 2013

Member

Can this handle situations where a chain of tags lead to a Blob or a Tree?

@ben

ben May 17, 2013

Member

I think so. tagMap is just for tracking which tags have been rewritten so that we only rewrite them once. The final target rewritings are tracked by shaMap.

Member

nulltoken commented May 17, 2013

Cases 3 and 4 aren't directly supported, but you could simulate them with the parent-rewrite hook (by either omitting or inserting parent links). The problem is that, since RewriteHistory isn't aware that this is happening, it can't detect what happened and emit the right message. Any opinions on the right way to support this?

I think it'd be ok to keep those outside the scope of this PR. Could you please open an issue to keep track on those?

Regarding this, does git filter-branch is able to handle such rewrite?

initial:

A -> B -> C

expected output:

A -> B1 -> B2 -> B3 -> C
or
A -> B1 -> B2a -> B3 -> C
         \ B2b /
Member

ben commented May 17, 2013

What do you think is the correct thing to do in this situation?

refA -> commit -> tree -> blob <- tag
(rewrite)
refA' -> commit' -> tree' -> blob'    |    blob <- tag (?)

The commit's tree has been rewritten to include a new blob in the place the old blob was. Should the tag now point to the new blob in the new tree? My gut feel is no; if you went out of your way to tag a blob, you probably want that to always point to that blob.

Member

ben commented May 17, 2013

expected output:

A -> B1 -> B2 -> B3 -> C
or
A -> B1 -> B2a -> B3 -> C
         \ B2b /

It kind of looks that way. There's a reflog message for "rewritten to multiple", meaning the commit that refA pointed to before has been transformed into multiple commits. It seems to pick the first one, but it notes that this happened.

I'm comfortable passing this off to the caller. If she wants to split a commit into an entire topology (using some combination of the treeRewriter and parentRewriter), she can decide where the transformed ref points to also. By default, the call should choose the commit that is constructed from commitHeaderRewriter, commitTreeRewriter, and parentRewriter.

@ben ben commented on an outdated diff May 17, 2013

LibGit2Sharp.Tests/FilterBranchFixture.cs
+ Assert.Empty(repo.Head.Commits.Where(c => c["README"] != null));
+ }
+
+ [Fact]
+ public void CanCustomizeRefRewriting()
+ {
+ repo.Refs.RewriteHistory(repo.Head.Commits, c => CommitRewriteInfo.From(c, message: ""));
+ Assert.NotEmpty(repo.Refs.Where(x => x.CanonicalName.StartsWith("refs/original")));
+
+ Assert.Empty(repo.Refs.Where(x => x.CanonicalName.StartsWith("refs/rewritten")));
+ repo.Refs.RewriteHistory(repo.Head.Commits,
+ commitHeaderRewriter: c => CommitRewriteInfo.From(c, message: "abc"),
+ directReferenceRewriter:
+ (r, t) => repo.Refs.DefaultDirectReferenceRewriter(r, t, "refs/rewritten"));
+ Assert.NotEmpty(repo.Refs.Where(x => x.CanonicalName.StartsWith("refs/rewritten")));
+ }
@ben

ben May 17, 2013

Member

I haven't done this for tags yet, but this is what it looks like if you don't restrict the rewriting hooks to just naming the backups. I provide a default implementation that does the right things, but it's easy to override for just naming, or you can do all your own logic in there.

@nulltoken nulltoken referenced this pull request in libgit2/libgit2 May 18, 2013

Merged

tag: Introduce git_tag_annotation_create() #1594

Member

nulltoken commented May 26, 2013

@ben repo.ObjectDatabase.CreateTag() has been merged. This may be helpful to rewrite annotated tags😉

Member

ben commented Jun 11, 2013

So now this properly rewrites chains of tag annotations without creating spurious tags in the process. Is there anything left, or should we ship it?

Member

nulltoken commented Jun 12, 2013

@ben I found a couple of issues and eventually ended on a coding spree. Sorry 😁

I've tried to simplify the code a bit (mainly to help me get a better understanding of it).

Main changes are:

  • Fix a minor issue where the rewriters were applied on commits that weren't part of the initial user's list
  • Tags are now also backed up when rewritten
  • TagAnnotation can be renamed

@nulltoken nulltoken and 1 other commented on an outdated diff Jun 12, 2013

LibGit2Sharp/Core/HistoryRewriter.cs
+ Func<IEnumerable<Commit>, IEnumerable<Commit>> parentsRewriter,
+ Func<String, bool, GitObject, string> tagNameRewriter,
+ string backupRefsNamespace)
+ {
+ this.repo = repo;
+ targetedCommits = new HashSet<Commit>(commitsToRewrite);
+
+ this.headerRewriter = headerRewriter ?? CommitRewriteInfo.From;
+ this.treeRewriter = treeRewriter;
+ this.tagNameRewriter = tagNameRewriter;
+ this.parentsRewriter = parentsRewriter ?? (ps => ps);
+
+ this.backupRefsNamespace = backupRefsNamespace;
+ }
+
+ public void AmazeMe()
@nulltoken

nulltoken Jun 12, 2013

Member

Yes. I'm really that bad at naming...

Ideas?

@dahlbyk

dahlbyk Jun 12, 2013

Member

Rewrite? Execute?

@nulltoken

nulltoken Jun 13, 2013

Member

Execute!

@dahlbyk Thanks. Fixed.

Member

nulltoken commented Jun 13, 2013

I'm not sure I can go much farther for now.

@ben @dahlbyk @yorah Would you please sprinkle a bit of your review magic on those? I'm especially not in love the latest commit.

Member

ben commented Jun 14, 2013

🔥 @yorah 🔥 ❗️ Making a pull request to a pull request? 🙀

@ben ben commented on an outdated diff Jun 14, 2013

LibGit2Sharp.Tests/FilterBranchFixture.cs
+
+ var annotatedTag = repo.Tags["so-lonely-but-annotated"];
+ Assert.Equal("Bam!\n", ((Commit)annotatedTag.Target).Message);
+ }
+
+ [Fact]
+ public void CanRewriteTrees()
+ {
+ repo.Refs.RewriteHistory(repo.Head.Commits, commitTreeRewriter: c =>
+ {
+ var td = TreeDefinition.From(c);
+ td.Remove("README");
+ return td;
+ });
+
+ Assert.Empty(repo.Head.Commits.Where(c => c["README"] != null));
@ben

ben Jun 14, 2013

Member

Super nitpicky: would this read better as Assert.True(repo.Head.Commits.All(c => c["README"] == null));?

Member

ben commented Jun 14, 2013

I think it's ready! 💯 💴

@nulltoken nulltoken merged commit 39716ec into vNext Jun 15, 2013

1 check passed

default The Travis CI build passed
Details
Member

nulltoken commented Jun 15, 2013

Awesome job @ben! I tweaked the skipped test a bit before merging it.

@yorah I squashed in your commit as well. The rollbackActions Queue is an amazing idea! Thanks a lot for your help ❤️

Member

ben commented Jun 15, 2013

❗️

@ben ben deleted the filter-branch branch Jun 15, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment