-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ipfs daemon hangs when MFS root is not available locally #7183
Comments
If you don't start your daemon, and run |
I prior checked with that the daemon isn't running with ps and grep.
|
@Stebalien is there a way to replace the MFS root? I have no knowledge of it, but I could hack something together. It's not the first time we see this. |
We should refuse to start, but we shouldn't hang. This error indicates that the repo has been corrupted somehow (or maybe wasn't initialized properly?). We should:
|
It complains the Is it possible that somehow it has been able to remember what was the last hash of the root MFS even if it wasn't able to properly store the corresponding block ? (like I said I was running out of disk for some time just before I started to have trouble with ipfs.) |
Do you enable garbage collection ( |
When it complained that the disk was full (during an When I run the gc it removed some blocks. Most of my data (at least 60 GB out of 70+GB) was in mfs. The size of the repo folder is still more than 70GB. So my data is still there but since I lost the MFS root block I'm unable to access it. It's not the first time I end up with a problem with the mfs. During the last month I had several times mfs corruptions (https://discuss.ipfs.io/t/update-root-cid). I believe because I was using the The root cid is a quite sensible peace of information. Maybe keeping track of some older root cid would allow to do some roll-back latter on (ideally they could be stored outside in a human readable rotating file and the gc would respect them). I consider myself as new to IPFS but the way I see it is that mfs is a layer on top of IPFS. Maybe the lack of mfs should not prevent the ipfs daemon to start (in a kind of reduced mode) and perform basic add/pin operations. It's also maybe a feature that is not useful in some context like on a ipfs-cluster server. |
I wrote a piece of code that looks for the biggest tree in the repo. With it I finally found a DAG tree with 75GB. The content of the DAG contains the links (folders and files) I had at the root mfs. Now that I have a CID that looks valid, how can I change the mfs root CID that the ipfs daemon should look for ? |
@nlko , can you try https://github.com/hsanjuan/mfs-replace-root ? Hopefully this allows you to move on with your stuff. Figuring out if this was due to a GC-related race or simply a bad error when trying to write on a full disk may take a while, and I understand resetting your repo is not an option.
It is important to do this but from the user point of view, it would be good to include a way to forcefully set the mfs root whenever something like this arises (as part of |
I had similar problem. There should be command line action to set mfs root or to reset it. |
@hsanjuan thanks for the tool I compiled and it works. I was able to set the root mfs cid and retrieve the files but it required the repo to be v9 and so to use the rc2 of ipfs 0.5. It will be very useful when ipfs 0.5 is released. I finally find another way to recover my repository (in V7 repo) this way :
I still don't know why the reference of the mfs root was broken when the disk was lacking of free space. |
I hit this recently. Saved also by mfs-replace-root |
I packaged my IPFS folder in a state which hit this bug (probably) for debugging purposes: |
It might be useful to have a way smaller sample broken repo, for those who want to explore this but can't repro. E.g. My node is only in single digit of gigs but I hit this so much that I've added an mfs-replace-root call to the systemd script to prevent it recurring Edit: fwiw that means mine won't work, as I no longer see this issue :p |
@bqv I agree. It's easy to create a repo with a broken state (just set the MFS root to a bogus CID) the question is if there are any forensics we can do that might help us with the problem. Does your script only replace on error? If so maybe you can modify it to snapshot before replacing. |
It doesn't, I now just treat mfs as volatile storage and let it be cleared every reboot/restart |
Also having the same issue. Any updates? Thanks! |
From #6935: Currently, if we can't find the MFS root block, we hang on start trying to find it (possibly before the network is ready?). We shouldn't block startup on this. |
Well, the MFS is kind of crucial for many applications ... so not sure what's the alternative to hanging... I mean we shouldn't delete it with a GC run. But if we lost the root and cannot find it in the network there's nothing we should continue with. Sure, the GC issue should be fixed. But if something else seems to have deleted the root, I think the safest thing would be to stop accepting local commands but proceed to search for the root on the network (while the chances are pretty slim to find it there). |
We're adding the "replace MFS root" as an IPFS command in #8648. Any feedback welcome. |
Adding to this, when running
|
Version information:
Description:
Hello,
When I start the daemon it now hangs after displaying this (I tried this several times and also restarted the computer):
The execution hangs there forever. I tried some ipfs commands that failed returning
Error: cannot acquire lock: Lock FcntlFlock of ~/.ipfs/repo.lock failed: resource temporarily unavailable
After an hour, if I press Ctrl+C it stops and displays :
Here is how I managed to reach this point:
ipfs add
comands. It failed stating that the disk was full according to a .LOG file. (which is wasn't anymore, more than 180G available).Any idea how I can at least recover the content of the repo ?
The text was updated successfully, but these errors were encountered: