Guide to git remote hg

Alexander Georgievskiy edited this page Oct 31, 2016 · 10 revisions

Guide to git-remote-hg

git-remote-hg is an extension (or a series of patches) for git which allows you to "git clone" a Mercurial repository and interact with it as if it were a git repository. Hence it is the mirror counterpart to Hg-Git Mercurial Plugin. It is still under development and not quite ready for everyday use, but it does work, and you are encouraged to try it, and even better, help to improve and finalize it. More on that in a later section.

Note that https://github.com/rfk/git-remote-hg is a distinct tool with the same name and purpose. The main difference is while that tool is built upon hg-git, our code directly accesses the Mercurial Python API and hence is substantially faster for most operations, especially on big repositories.

Two other tools which use hg-git to allow git cloning of mercurial repositories are git-hg and git-hg-again.

Using git-remote-hg

Using git-remote-hg currently requires using a patched version of git itself, as various changes to git (for fixing bugs and adding features) are required for git-remote-hg to work. The following steps should help get you going:

  1. Clone a repository containing the git-remote-hg code. For example, the msysgit master repository, which currently represents the official repository for git-remote-hg work:
    git clone https://github.com/msysgit/git.git msysgit
    cd msysgit

The drawback of this is that you will get also all other msysgit changes, unrelated to git-remote-hg. Hopefully, there will soon be a way to get only the remote-hg related changes. As a temporary solution, for now you could use this alternative repository (which only contains remote-hg related commits atop a stock mainline git).

    git clone https://github.com/fingolfin/git.git git-remote-hg
    cd git-remote-hg
    git checkout remote-hg
  1. Compile it, and install it somewhere into your PATH. This is the exact same as for a stock git, and of course requires that you install all necessary dependencies, see also the section on install git in the git manual. In addition, you need to install Mercurial, as git-remote-hg uses that. As an example, to install the custom git into /usr/local, do:
    make prefix=/usr/local all
    sudo make prefix=/usr/local install

This should have put "git-remote-hg" along with all the usual git binaries into /usr/local/. 3. Once this is done, you can clone a hg repository like this:

    git clone hg::ssh://example.com/path/to/hg-repos

The result is a git repository, with which you can work as usual. But when you "git push", your changes will be sent to the upstream Mercurial repository. Conversely, a "git fetch" will retrieve new commits from the remote Mercurial repository.

Plan and TODO

Here are several points that should or could be improved in git-remote-hg, as well as suggestions on how development could proceed in order to get it ready for general consumption. The first goal should be to get git-remote-hg into a state where upstream is willing to merge the changes (or at least central parts of them, such as the modifications to the git C code).

Secondly, all features of git and of mercurial must be properly mapped back and forth. A notable missing point here are octopus commits (details on that below).

Third, overall stability (although it is already quite good) and error handling should be improved.

Important stuff

  • A major blocker for inclusion into mainline git is the fact that remote-hg requires certain changes to "git fast-export". Details can be found in this thread on the git mailing list. Unfortunately, the result of this discussion was that the change for this included in git-remote-hg in its current form was not deemed fit for inclusion in mainline git. There was some discussions of alternative ways to address the issue that would have a good chance of being merged. We should look into implementing such an alternative, or altering the existing code to make it acceptable for upstream.

  • Code should be added which checks the version of the mercurial libraries, and handles the result suitably. Over time, the Mercurial APIs change in sometimes non-compatible ways. So we may want to refuse to run on known bad (old) versions. In addition or alternatively, use different code depending on the version of the mercurial libs. Whatever, some checks are needed.

  • Add support for git octopus merges. In Mercurial, commits can have at most two parents. Hence an octopus merge of n parents must be faked by (n-1) regular (two-way) merges. Specifically, this could be done this way: Given a commit A with parent P_1 to P_n, begin by creating a single tree object corresponding to the tree at A (i.e. the final result of the octopus merge). Then, we create a series of fake merge commits; the fake part is that for each of them, we use the tree (of A) we created in the first step. So, we begin with a commit A_1 that has P_1 and P_2 as parent, but has the tree of A. Then A_2 with A_1 and P_3 are parent and the same tree, etc., until we finally obtain A as a commit with parents A_{n-1} and P_n. Besides being simple, this also has the advantage that it is easy to detect this situation and convert a series of such fake merge commits back into a single octopus merge, with all parents in the right order.

Interfacing with Mercurial

  • There should be exception handlers dealing with errors inside the mercurial code. At the very least, we could by default suppress any stack trace, as they make it hard to spot the actual error message.

  • Cloning a test repository https may result in tons of warnings about certificate verification problems, depending on whether web.cacerts has been setup up or not. Here is an example with a repository from bitbucket:

    $ git clone hg::https://bitbucket.org/fingolfin/foobar
    Cloning into 'foobar'...
    warning: bitbucket.org certificate with fingerprint 24:9c:45:8b:9c:aa:ba:55:4e:01:6d:58:ff:e4:28:7d:2a:14:ae:3b not verified (check hostfingerprints or web.cacerts config setting)
    warning: bitbucket.org certificate with fingerprint 24:9c:45:8b:9c:aa:ba:55:4e:01:6d:58:ff:e4:28:7d:2a:14:ae:3b not verified (check hostfingerprints or web.cacerts config setting)
    ...

First observation: This warning should only be shown once, and ideally, handled as similar warnings elsewhere (e.g. by prompting the user whether to reject, temporarily accept or permanently accept this cert).

Secondly, those warnings should not be there in the first place; in particular, when directly cloning the repository via the "hg" command, they do not show up. What happens here is that when running "hg, then the "web.cacerts" config value is set, but when running git-remote-hg, it is not set. Analyze why. (Extra info: When installing Mercurial via Fink on Mac OS X, then web.cacerts is set in "/sw/etc/mercurial/hgrc", but for some reason, this setting is ignored when running git-remote-hg).

  • Pushing to a https repository (e.g. to https://bitbucket.org/fingolfin/foobar) requiring auth causes troubles; in particular, an uncaught exception with 40+ lines of backtrace, and this error:
    mercurial.error.Abort: http authorization required

Another example where user interaction may be necessary. Moreover, cloning the same repository via "hg clone" works. As does a subsequent "hg push", which may not even ask for auth when the mercurial-keyring extension is installed. So another potential task would be to look into what it would take to benefit from mercurial-keyring when using git-remote-hg; or alternatively, whether (and how) one might use the git-credential system to the same effect.

=== Random thoughts, observations, etc.

  • git-remote-hg, local_repo(): "local.prefix" member is never set. this is potentially used by the exporter object. If the exporter is never used, then we shouldn't create it, no? If it is used, then the prefix probably needs to be set...

  • git-remote-hg, get_repo(): What is the following substitituion about? It seems to allow overriding the repo.is_local value, alas that one is never used... Overall, this code seems to be intended to allow accessing local hg repositories but pretending they are remote. So perhaps it is a debugging left-over?

      if url.startswith("remote://"):
          remote = True
          url = "file://%s" % url[9:]
    

    .... repo.is_local = not remote and repo.local # FIXME: unused?

  • get_base_path() is the only thing in repo that is used by a lot of code in git_remote_helpers/hg that otherwise only needs repo.hgrepo; So perhaps we can do something more clever than passing the whole repo object to all of those objects...

  • right now there is some kind of "fake method overloading" going on with local_repo calling setup_local_repo get_repo calling setup_repo Perhaps this could be modelled in a more OOP like way, perhaps by adding a repository base class which all remote helpers can use (at least those implemented using git_remote_helpers/helper.py ...

  • in the git-remote-testgit, the sanitize() function is duplicated: it is in git-remote-testgit.py and git_remote_helpers/git/repo.py

  • Some relevant discussions on the git mailing list: