Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal error thrown by felipec-git on fast-import of Hg assets #56

Closed
slessman opened this issue Jan 15, 2014 · 13 comments
Closed

Fatal error thrown by felipec-git on fast-import of Hg assets #56

slessman opened this issue Jan 15, 2014 · 13 comments

Comments

@slessman
Copy link

I am using this rev of felipec-git:

93d24ea
branch 'fc/master'

Our Hg -> Git sync process is now failing with the error:

fatal: mark :547140 not declared
fast-import: dumping crash report to /var/git/the_repo.git/fast_import_crash_31937

From the E-mailed error reports I get, it looks to me like a series of events occurred during sequential sync operations that corrupted an import, which then corrupted my repository.

The first notification with error report and crash log: [https://gist.github.com/shaunlessman/f75bc2d9e80ad80e0f6e]

The second notification, no error, no crash: [https://gist.github.com/shaunlessman/002222814d8691685737]

The third notification, no error, no crash: [https://gist.github.com/shaunlessman/c3a6469cd141b73aa112]

The fourth notification with error report and crash log: [https://gist.github.com/shaunlessman/34113470d5f3d8c47a90]

The fifth notification with error report and crash log: [https://gist.github.com/shaunlessman/04580a867c55a8ccd08c]

The sixth notification with error report and crash log: [https://gist.github.com/shaunlessman/97f8f10568a2032953a5]

Most notifications and crash logs after the 6th look like a minor variation of this: [https://gist.github.com/shaunlessman/ea58c856ea1e73e839b8]

The only information that seems to change in the last notification posted is an increment in the number reported via 'fatal: mark :547256 not declared' of around 60.

I'm guessing the problem is occurring on an import of the refs/hg/origin/branches/PROJECT-3118-Update branch, that seems to be a common theme in the crash logs as well.

EDIT: The sync script I use is [https://gist.github.com/shaunlessman/7a2841bcac7e318d2f91], it runs every 15mins

@slessman
Copy link
Author

I recovered from this problem by:

  • Moving my my_repo.git repo out of the way
  • Closing the problem PROJECT-3118-Update branch in my Hg repository
  • Re-cloning the my_repo.git repo from my Hg repo
  • Replacing my config file in the newly-cloned repo with the old one that was moved out of the way
  • Running git fetch --all to get all remote refs and re-pulling my active remote branches

So far, no issues. It doesn't fix the problem but it lets me work normally until it occurs again.

@rpearl
Copy link

rpearl commented Feb 14, 2014

I am getting the same issue. Please let me know if there is any additional information that I can provide about the state of the repository that would help with a fix.

@slessman
Copy link
Author

@rpearl I haven't had this problem occur since I closed the problem branch in Mercurial and re-cloned. I have no confidence that it won't happen again but at least recovery from the problem seems to be fairly straightforward. Well, as long as the problem branch is not one of your main development lines anyway. :)

@fingolfin
Copy link

This seems to be the same issue as dusty-phillips/gitifyhg#88 -- i.e. a git garbage collection (gc) will remove objects that are referenced by the marks file, because for some of them there simply is no git ref referencing them.

It's kind of nasty :-/

@fingolfin
Copy link

Actually, I take that back (sorry for the confusion): this issue is different, though possibly related. In the bug I linked, the mark is present but the sha-1 it refers to is not known to GAP due to gc. In the issue here, the mark table somehow got truncated. I just experienced a similar situation. First, I did a pull from a hg remote, and got an error fatal: object not found: fe5fd3da2cbd210213816c90fe6afba6bd093ea9

At this point, I might have fixed it by carefully hand-editing the marks files. Unfortunately, I made another git pull right away then got this error: fatal: mark :20574 not declared

When I looked at the marks files, I noticed two things: marks-hg listed mark :20574, referencing a hg commit SHA-1. But the last mark in marks-git was :19089, so it look as if it was truncated? My last pull would have imported maybe 10 commits, so that alone can't account for discrepancy.

A manual check discovered a git commit matching the hg commit corresponding to mark :20574 (finding it was easy because that commit happened to be the head of a hg branch).

@felipec
Copy link
Owner

felipec commented Apr 12, 2014

EDIT: Disregard this comment, it was before I had a better idea of your issue.

I'm trying to understand what is happening here, but it's not clear from those reports. What I would do if I were you is modify the script to do some checks before and after you do the git push.

First, I would create back ups of the marks-git and marks-hg files before running the git push command.

If a crash occurs, save both the back up and the current marks in a directory and post them here to do some forensic analysis.

Finally, restore the back up marks, and everything should keep working.

If all else fails, remove the marks files, and run git fetch again.

They are located in "$repo/hg/origin/marks-*".

My guess is that some git push commands are crashing at certain point, and then the marks are messed up, and then everything else fails.

@felipec
Copy link
Owner

felipec commented Apr 12, 2014

All right. I've managed to analyze better the output, and the issue I see is that in your script you don't check the exit status of the commands, so you continue with the push, even if the fetch failed, that's why there's so much output in the reports which was confusing.

Now, in the first crash the failure is that the mark '544029' cannot be found by git fast-import, which presumably means it's missing in the 'marks-git' file, why that is happening is very important, but I will come back to that.

In the 4th one fast-import can't find mark '544213' which comes after '544029' and it's trying to update the branch PROJECT-3118-Update, specifically, the notes. The 5th one is trying to do exactly the same. The 6th one is trying to update many branches (including PROJECT-3118-Update), but fails after trying to to find '544029', the same one as before. The 7ht is trying '544235', again after '544029', and again trying to update PROJECT-3118-Update.

The conclusion so far is that the commits marks of PROJECT-3118-Update are somehow screwed up, so if there's any change to that branch, the fetch fails.

What is interesting is the manner in which they are screwed, it seems the commits are marked correctly in marks-hg, but not in mark-git, which means that at some point there was a crash in git fetch and git fast-import didn't update marks-git, but git remote-hg did update marks-hg. I cannot see how this could happen in recent versions of remote-hg, which leads me to believe that you were running and old version when the crash happens, and everything has been screwed up since.

Another interesting bit is this:

fatal: corrupt mark line: :330340 912c63cfcbc91

Which means that something went really really wrong, and the marks-git file was truncated. However, if that was truly the case, nothing would ever work again. And from the report it seems the next push did actually succeed, so maybe somebody was in the middle of writing it, and there's something weird that that machine's file-system. But the previous git fetch failed, so the file shouldn't be updated.

Maybe somebody was using remote-hg on that repository at the same time the script was running.

Either way you mark files are screwed up, and it doesn't really make sense to try to analyze the errors you get after you fix them. The easiest way is to simply remove the "$repo/hg/origin/marks-*" files, which would cause the next git fetch to take a long time, since it would have to basically import everything again. However, I can probably write a script that would synchronize the files manually if you really want to avoid that (some, maybe a lot, of commits would need to be re-imported regardless).

If after the mark files are fixed you get them desynchronized again, then there's definitely a problem worth looking into, but until that happens my money is on the fact that you were running an old version of the script.

@felipec
Copy link
Owner

felipec commented Apr 12, 2014

I've written a script that should synchronize marks-hg with marks-git, and report if there are issues or not (back up "marks-hg" just in case).

http://pastie.org/9074335

Also, I've checked the code more thoroughly, and now I see that a crash in git fetch could trigger the issue that you see, even in the newer versions of the script.

Maybe this patch would solve the issue for you:

--- a/git-remote-hg.py
+++ b/git-remote-hg.py
@@ -1255,12 +1255,10 @@ def main(args):
             die('unhandled command: %s' % line)
         sys.stdout.flush()

+    marks.store()
+
 def bye():
-    if not marks:
-        return
-    if not is_tmp:
-        marks.store()
-    else:
+    if is_tmp:
         shutil.rmtree(dirname)

 atexit.register(bye)

Unfortunately this might cause issues if git push crashes, which could be even worst.

I'll spend some time trying to reproduce crashes and their subsequent issues to test the code thoroughly, but my guess is that Git core would need to be updated, it's not enough to fix the remote helper script.

@felipec
Copy link
Owner

felipec commented Apr 12, 2014

Ok, I've done a bunch of tests and I've narrowed down the issues.

First of all, you need to fix the marks. The easiest is to simply remove $repo/hg/origin/marks-*, and run git fetch, that will have to re-import the repository. Alternatively, you can use the tool I wrote: git-marks-check. If you run it like git marks-check origin -f -v it will fix the marks files, and then you need to run git fetch again, but only a smaller subset of changes would need to be re-imported.

https://gist.github.com/felipec/10551806

That should get you up and running, however, the issue might appear again, to avoid the synchronization problem you need to do two things:

  1. You need the change I suggested in a previous comment, I've already pushed the change, so if you use the absolute latest you would be fine. Or you can apply it manually:

bd830d4

  1. You need a fixed git push. I've pushed the changes to my master branch, but I suppose you are using upstream Git, and if so, you need to manually restore the marks-git file if git push fails.

Something like this perhaps:

my_push () {
    cp hg/origin/marks-git{,.bak} &&
    git --bare push "$@" ||
    mv hg/origin/marks-git{.bak,}
}

my_push --quiet origin branches/the_dev_branch_2
my_push --quiet origin branches/the_main_branch

Finally, you don't need to specify --bare on every operation, you can tell Git it's a bare clone with git config core.bare true, or even better at clone time with git clone --bare.

So my suggested script would be:

#!/bin/bash

export PATH=$PATH:/usr/local/bin

cd /var/git/the_repo.git

my_push () {
    cp hg/origin/marks-git{,.bak} &&
    git push "$@" ||
    mv hg/origin/marks-git{.bak,}
}

# Get the entire repository from the remote
git fetch --quiet origin

# Push branches we're working on back to the remote repo
my_push --quiet origin branches/the_dev_branch_2
my_push --quiet origin branches/the_main_branch

Additionally, you might want to run git marks-check origin at the end of the script to make sure everything is OK.

@fingolfin
Copy link

I have not followed the recent changes discussed here, so the following might be obsolete. But: With the version of git-remote-hg in git.git's next branch, the following is an easy way to trigger a fast-import crash which then garbles the marks file:


# Create a multi-head repository
hg init hgrepo
cd hgrepo
echo a > a && hg add a && hg commit -m a
echo b > b && hg add b && hg commit -m b
hg update 0
echo c > c && hg add c && hg commit -m c
cd ..

# Clone it via remote-hg
git clone "hg::hgrepo" gitrepo

cd gitrepo
git gc --prune=now
git pull

The problem here is that some commits are not reachable from any user-visible ref, hence can be pruned, which then leads to "gaps" in the marks tables. There are more ways to trigger this, but the above (a hg branch with multiple heads) seemed to me to be the simplest way to trigger the issue.

@slessman
Copy link
Author

I've been using 93d24ea branch fc/master and not upstream git during and since this issue was reported. Haven't had the issue again since we closed the problem PROJECT-3118-Update hg branch (bookmark, whatever) so I don't think I have much to contribute as far as ensuring the problem is fixed. I'll definitely update and build/install your latest master, is there anything else I can do to assist with reproducing or ensuring the latest patch fixes this issue?

@felipec
Copy link
Owner

felipec commented Apr 20, 2014

@fingolfin Yeah, I see the same problem with git-fc, but I think that's a separate issue.

@felipec
Copy link
Owner

felipec commented Apr 20, 2014

@shaunlessman The best you can do is just keep running the sync and if a problem happens report it back here, however, if you don't ensure the marks are correct (either with my script or by removing the files), any report won't be useful, so please do that.

I'll mark this as closed as I think this particular issue is fixed, reopen if you have problems.

@felipec felipec closed this as completed Apr 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants