Skip to content
This repository

Improve hg-git's author/committer parsing to match what git expects #230

Closed
wants to merge 4 commits into from

2 participants

Ehsan Akhgari Augie Fackler
Ehsan Akhgari

I found out about this problem when converting Mozilla's hg repository to git.

The way that git parses the author/commiter lines is that it looks for the first less-than character, and expects that to start the email address, and then it looks for the next greater-than character, and it then expects everything following that character to be parsed as a date.

hg-git's output doesn't completely match that expectation. For example, for this revision hg.mozilla.org/mozilla-central/rev/a537a070dbf40081e1d32321924b6589b271574e, the author is "Ms2ger@gmail.com", which makes hg-git generate a author line like this:

author Ms2ger@gmail.com none@none 123456000 +0000

Which git fails to parse. Another example is this revision http://hg.mozilla.org/mozilla-central/rev/e88d2327e25d600ce326615f682db1d79d2bb10e, where there is no space between the username and the email, which creates an author line like this:

author Ms2ger 123456000 +0000

And you can see how that would confuse git!

With the fixes in this pull request, hg-git can generate better commit objects, that git can actually deal with. I managed to convert the entire hg history of mozilla-central to git with these patches.

Ehsan Akhgari

ehsan@04a37b4 also fixes another instance of this problem, for commits like https://hg.mozilla.org/mozilla-central/rev/e751acb410d0

Ehsan Akhgari ehsan closed this
Ehsan Akhgari ehsan reopened this
Augie Fackler
Collaborator

Can you add some tests for this? I'm hesitant to pull this without corresponding tests. I'm also skeptical of the correctness of get_valid_git_username_email() - couldn't that fail if the username was something like " foo@example.com " or some other broken thing?

When you're ready, I'd greatly prefer patches mailed to the hg-git Google Group - it's easier for me to review and apply them there than on a pull request here.

Thanks!

Augie Fackler durin42 closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 4 unique commits by 1 author.

Aug 11, 2011
Ehsan Akhgari Sanitize the author username and email address to make sure that git …
…is not confused by brackets around those names
04a37b4
Ehsan Akhgari Make get_valid_git_username_email a proper method on the GitHandler o…
…bject
4c86b43
Aug 15, 2011
Ehsan Akhgari Make the space between the username and email address in hg username …
…parsing code optional, to handle cases like 'User<user@somewhere.org>'
bb86917
Aug 23, 2011
Ehsan Akhgari Treat the trailing bracket after the hg author name as optional.
This causes stuff like
https://hg.mozilla.org/mozilla-central/rev/e751acb410d0 to be parsed
correctly.
6b72297
This page is out of date. Refresh to see the latest.

Showing 1 changed file with 7 additions and 4 deletions. Show diff stats Hide diff stats

  1. 11  hggit/git_handler.py
11  hggit/git_handler.py
@@ -337,12 +337,15 @@ def export_hg_commit(self, rev):
337 337
         self.swap_out_encoding(oldenc)
338 338
         return commit.id
339 339
 
  340
+    def get_valid_git_username_email(self, name):
  341
+        return name.lstrip('<').rstrip('>')
  342
+
340 343
     def get_git_author(self, ctx):
341 344
         # hg authors might not have emails
342 345
         author = ctx.user()
343 346
 
344 347
         # check for git author pattern compliance
345  
-        regex = re.compile('^(.*?) \<(.*?)\>(.*)$')
  348
+        regex = re.compile('^(.*?) ?\<(.*?)\>?(.*)$')
346 349
         a = regex.match(author)
347 350
 
348 351
         if a:
@@ -350,11 +353,11 @@ def get_git_author(self, ctx):
350 353
             email = a.group(2)
351 354
             if len(a.group(3)) > 0:
352 355
                 name += ' ext:(' + urllib.quote(a.group(3)) + ')'
353  
-            author = name + ' <' + email + '>'
  356
+            author = self.get_valid_git_username_email(name) + ' <' + self.get_valid_git_username_email(email) + '>'
354 357
         elif '@' in author:
355  
-            author = author + ' <' + author + '>'
  358
+            author = self.get_valid_git_username_email(author) + ' <' + self.get_valid_git_username_email(author) + '>'
356 359
         else:
357  
-            author = author + ' <none@none>'
  360
+            author = self.get_valid_git_username_email(author) + ' <none@none>'
358 361
 
359 362
         if 'author' in ctx.extra():
360 363
             author = "".join(apply_delta(author, ctx.extra()['author']))
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.