Skip to content

Treat git subtrees as remotes. A less painful alternative to git submodule.

Notifications You must be signed in to change notification settings

sethfowler/git-remote-subtree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

git-remote-subtree

An alternative to git submodules and subtrees. Subrepos appear as normal remotes which differ from your main repository only in the contents of one subdirectory.

The idea is that hide the functionality of git-subtree behind a protocol which works transparently, doesn't pollute the commit history of the repo, and doesn't interfere with normal git functionality. Just as a failing escalator becomes stairs, any issue with git-remote-subtree should yield a monorepo that continues to work perfectly, with no loss of data or commits.

Project status

There is no working implementation yet. git-remote-subtree currently just mirrors a remote repository in a hidden bare repo and performs pushes and pulls indirectly through it.

The next step is to add support for rewriting paths and basing the rewritten commits on top of a specific parent commit provided by the user. This will be sufficient to allow pulling as long as no new commits are added to the super repo, and because it's deterministic, it won't require scanning the super repo for matching commits.

Proposed approach

Say the user sets up a remote like so:

git remote add subRepo subtree::subRepoDir::on::branch::from::http://example.com/normalRepo

This creates a git remote called subRepo which wraps a normal git repo at http://example.com/normalRepo. The ref subRepo/branch contains the same content as normalRepo/branch, but it is as if all of the files in normalRepo are moved into a single top-level directory subRepoDir/, and subRepoDir is grafted into the same tree as all the other content at superRepo/branch. If subRepoDir already exists in superRepo/branch, the effect is as if its existing contents were replaced by the contents of normalRepo/subBranch. If subRepoDir already exists in superRepo/branch and the contents are exactly the same, then subRepo/subBranch will have the same SHA as superRepo/branch, and pulling one into the other will be a no-op.

Similarly, pushing from superRepo/branch to subRepo/subBranch behaves as if the contents of subRepoDir were at the top level, and everything else was thrown away. If the contents of subRepoDir are the same as normalRepo/subBranch, then pushing is a no-op.

This setup means that the functionality of git-subtree can be implemented totally by pushing to and pulling from a subtree:: remote. Because all the magic is inside the remote helper, the main repo remains clean, and all other git functionality works just as you would expect.

This should be fairly simple to implement while preserving history. A hand-wavy algorithm for fetching is as follows:

  • Do a fetch on normalRepo and superRepo into our hidden repo.

  • See if the tree object of the oldest commit in normalRepo/subBranch is present in the local repo. If not, we know that we've never merged this branch in before, and we can skip some of the following work.

  • Walk backwards in the commit graph from normalRepo/subBranch until we find a tree object that's in the local repo. See if one of the associated commits in the local repo is the same in every other respect except for the parent commits and the fact that the tree is rewritten. If so, this is the last common commit. If not, keep walking backwards; if we run out of commits, we've never merged this branch in before, and in the steps below we can just start at the current commit in the local repo.

  • Create a temporary branch in our hidden repo pointing at the version of the common commit in our local repo.

  • Cherry-pick commits from our local repo until we either hit a commit that modifies the tree object for subRepoDir (which we won't cherry-pick), or we run out of commits.

  • Cherry-pick all of the commits from normalRepo/subBranch onto our temporary branch, with the tree rewritten appropriately. We know they'll apply cleanly because the state of subRepoDir in our temporary repo is clean with respect to normalRepo.

  • The temporary repo contains the data we'll return from the fetch. Repeat as necessary for the other branches.

This is obviously quite expensive, so in practice we'll want to cache some information to speed this up.

Pushing is a bit simpler; once we find the common commit we just need to transform each commit in our local repo that touches subRepoDir into a corresponding commit on normalRepo/subBranch.

It might sound like there's a lot to implement here, but actually git-subhistory in particular is fairly close to what's needed here, and translating it into e.g. Python would get us 80% of the way there.

Resources and related work

How to Write a New Git Protocol

Mastering Git Subtrees

git-subtree docs

git-subhistory

git-subrepo

Which commit has this blob?

About

Treat git subtrees as remotes. A less painful alternative to git submodule.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages