Ipfs daemon hangs when MFS root is not available locally #7183

nlko · 2020-04-19T21:20:22Z

Version information:

go-ipfs version: 0.4.23-
Repo version: 7
System version: amd64/linux
Golang version: go1.13.7

Description:

Hello,

When I start the daemon it now hangs after displaying this (I tried this several times and also restarted the computer):

$ ipfs daemon
Initializing daemon...
go-ipfs version: 0.4.23-
Repo version: 7
System version: amd64/linux
Golang version: go1.13.7

The execution hangs there forever. I tried some ipfs commands that failed returning Error: cannot acquire lock: Lock FcntlFlock of ~/.ipfs/repo.lock failed: resource temporarily unavailable

After an hour, if I press Ctrl+C it stops and displays :

22:46:53.828 ERROR   cmd/ipfs: error from node construction:  could not build arguments for function "reflect".makeFuncStub (/usr/lib/go/src/reflect/asm_amd64.s:12): failed to build *mfs.Root: function "github.com/ipfs/go-ipfs/core/node".Files (pkg/mod/github.com/ipfs/go-ipfs@v0.4.23/core/node/core.go:74) returned a non-nil error: error loading filesroot from DAG: failed to get block for QmeXwawbVyYrFh6ZjjjbBzzY8gZAcnt7UfLyY2ZhG6MkYQ: context canceled daemon.go:337

Error: could not build arguments for function "reflect".makeFuncStub (/usr/lib/go/src/reflect/asm_amd64.s:12): failed to build *mfs.Root: function "github.com/ipfs/go-ipfs/core/node".Files (pkg/mod/github.com/ipfs/go-ipfs@v0.4.23/core/node/core.go:74) returned a non-nil error: error loading filesroot from DAG: failed to get block for QmeXwawbVyYrFh6ZjjjbBzzY8gZAcnt7UfLyY2ZhG6MkYQ: context canceled

Here is how I managed to reach this point:

Everything was working fine, the daemon was running for more than 24h and I was doing some normal mfs operation like everyday using ipfs-desktop;
The disk where the repo was located ran out of space for a couple of minutes.
The reason why the disk ran out of space has nothing to do with ipfs : I was using dd (for a big copy) In the meantime I was watching a video from the mfs.
From that point I wasn't able to perform ipfs add comands. It failed stating that the disk was full according to a .LOG file. (which is wasn't anymore, more than 180G available).
I tried to restart the daemon and the daemon hung like I said previously.

Any idea how I can at least recover the content of the repo ?

The text was updated successfully, but these errors were encountered:

hsanjuan · 2020-04-20T09:35:56Z

If you don't start your daemon, and run ipfs files ls /, what happens?

nlko · 2020-04-20T09:56:14Z

I prior checked with that the daemon isn't running with ps and grep.

$ ipfs files ls /
Error: could not build arguments for function "reflect".makeFuncStub (/usr/lib/go/src/reflect/asm_amd64.s:12): failed to build *mfs.Root: function "github.com/ipfs/go-ipfs/core/node".Files (pkg/mod/github.com/ipfs/go-ipfs@v0.4.23/core/node/core.go:74) returned a non-nil error: error loading filesroot from DAG: merkledag: not found

hsanjuan · 2020-04-20T10:23:17Z

@Stebalien is there a way to replace the MFS root? I have no knowledge of it, but I could hack something together. It's not the first time we see this.

Stebalien · 2020-04-20T17:50:30Z

We should refuse to start, but we shouldn't hang. This error indicates that the repo has been corrupted somehow (or maybe wasn't initialized properly?). We should:

Track down why this might be happening.
Fetch the MFS root with an "offline" DAG service on start, instead of trying to fetch it from the network.

nlko · 2020-04-20T22:10:57Z

It complains the QmeXwawbVyYrFh6ZjjjbBzzY8gZAcnt7UfLyY2ZhG6MkYQ block is missing. If I understand well it is looking for a file name L6/CIQPBH3WIZMW4EJVPZ4DDBCASE7JMK6F4RPT3BAYO2TEQPBIO4XBL6I.data which is missing in the filesystem.

Is it possible that somehow it has been able to remember what was the last hash of the root MFS even if it wasn't able to properly store the corresponding block ? (like I said I was running out of disk for some time just before I started to have trouble with ipfs.)

Stebalien · 2020-04-20T22:20:26Z

Do you enable garbage collection (ipfs daemon --enable-gc or ipfs repo gc)? If so, I believe we have some races where the garbage collector may remove the root if run at the same time.

nlko · 2020-04-21T06:43:21Z

When it complained that the disk was full (during an ipfs add) I manually ran the gc.
Then I restarted the daemon because the ipfs add was still failing (event if at that time I was capable of accessing mfs files).
But restarting the daemon hung.

When I run the gc it removed some blocks. Most of my data (at least 60 GB out of 70+GB) was in mfs. The size of the repo folder is still more than 70GB.

So my data is still there but since I lost the MFS root block I'm unable to access it.

It's not the first time I end up with a problem with the mfs. During the last month I had several times mfs corruptions (https://discuss.ipfs.io/t/update-root-cid). I believe because I was using the mfs write.

The root cid is a quite sensible peace of information. Maybe keeping track of some older root cid would allow to do some roll-back latter on (ideally they could be stored outside in a human readable rotating file and the gc would respect them).

I consider myself as new to IPFS but the way I see it is that mfs is a layer on top of IPFS. Maybe the lack of mfs should not prevent the ipfs daemon to start (in a kind of reduced mode) and perform basic add/pin operations. It's also maybe a feature that is not useful in some context like on a ipfs-cluster server.

nlko · 2020-04-21T13:07:42Z

I wrote a piece of code that looks for the biggest tree in the repo.

With it I finally found a DAG tree with 75GB.

The content of the DAG contains the links (folders and files) I had at the root mfs.

Now that I have a CID that looks valid, how can I change the mfs root CID that the ipfs daemon should look for ?

hsanjuan · 2020-04-21T21:51:06Z

@nlko , can you try https://github.com/hsanjuan/mfs-replace-root ? Hopefully this allows you to move on with your stuff. Figuring out if this was due to a GC-related race or simply a bad error when trying to write on a full disk may take a while, and I understand resetting your repo is not an option.

Track down why this might be happening.

Fetch the MFS root with an "offline" DAG service on start, instead of trying to fetch it from the network.

It is important to do this but from the user point of view, it would be good to include a way to forcefully set the mfs root whenever something like this arises (as part of ipfs files commands), imho.

hsn10 · 2020-04-22T06:35:10Z

I had similar problem. There should be command line action to set mfs root or to reset it.

nlko · 2020-04-22T14:22:01Z

@hsanjuan thanks for the tool I compiled and it works. I was able to set the root mfs cid and retrieve the files but it required the repo to be v9 and so to use the rc2 of ipfs 0.5. It will be very useful when ipfs 0.5 is released.

I finally find another way to recover my repository (in V7 repo) this way :

Within the blocks I searched for the block containing the dag with the largest size (https://github.com/nlko/find-biggest-dag);
I obtained the CID from the block filename itself (https://github.com/nlko/ipfs-block-to-cid);
I created a new empty ipfs repository from scratch (identical to the broken one but with brand new datastore and blocks folder);
I replaced the blocks folder with the one from the broken one.
then I declared the old mfs root to be a subfolder of the current mfs root ipfs files cp /ipfs/<OLD_ROOT_CID> /old_root
then it was possible to access it : ipfs files ls /old_root

I still don't know why the reference of the mfs root was broken when the disk was lacking of free space.

nlko · 2020-04-22T14:28:14Z

@hsn10 I think if we need to reset the mfs to an empty one we will be able to do so using the @hsanjuan tool :

mfs-replace-root QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn

QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn is a CID for an empty folder.

bqv · 2020-08-24T06:54:42Z

I hit this recently. Saved also by mfs-replace-root

RubenKelevra · 2021-01-28T14:18:28Z

I packaged my IPFS folder in a state which hit this bug (probably) for debugging purposes:

#7844 (comment)

bqv · 2021-01-28T19:23:44Z

It might be useful to have a way smaller sample broken repo, for those who want to explore this but can't repro. E.g. My node is only in single digit of gigs but I hit this so much that I've added an mfs-replace-root call to the systemd script to prevent it recurring

Edit: fwiw that means mine won't work, as I no longer see this issue :p

aschmahmann · 2021-01-28T19:42:07Z

@bqv I agree. It's easy to create a repo with a broken state (just set the MFS root to a bogus CID) the question is if there are any forensics we can do that might help us with the problem.

Does your script only replace on error? If so maybe you can modify it to snapshot before replacing.

bqv · 2021-01-28T19:44:33Z

It doesn't, I now just treat mfs as volatile storage and let it be cleared every reboot/restart

jjzazuet · 2021-12-10T04:42:50Z

Also having the same issue. Any updates? Thanks!

schomatis · 2021-12-23T16:30:24Z

From #6935:

Currently, if we can't find the MFS root block, we hang on start trying to find it (possibly before the network is ready?).

We shouldn't block startup on this.
We should set a short timeout and log loudly.

RubenKelevra · 2021-12-23T17:41:21Z

Well, the MFS is kind of crucial for many applications ... so not sure what's the alternative to hanging... I mean we shouldn't delete it with a GC run. But if we lost the root and cannot find it in the network there's nothing we should continue with.

Sure, the GC issue should be fixed. But if something else seems to have deleted the root, I think the safest thing would be to stop accepting local commands but proceed to search for the root on the network (while the chances are pretty slim to find it there).

schomatis · 2022-01-10T15:01:48Z

We're adding the "replace MFS root" as an IPFS command in #8648. Any feedback welcome.

EtDu · 2023-02-23T07:56:15Z

Adding to this, when running ipfs files ls /

Error: could not build arguments for function "reflect".makeFuncStub (reflect/asm_amd64.s:14): failed to build *mfs.Root: received non-nil error from function "github.com/ipfs/go-ipfs/core/node".Files (github.com/ipfs/go-ipfs@v0.10.0/core/node/core.go:112): failure writing to dagstore: Forbidden: Forbidden
	status code: 403, request id: TQCHP91BZ63NR4JF, host id: lLlfnL8JgDxZHMe0Lq4/iYm74DofzTa2QX4w8zNJnMRHCq0+vphA5Q9plo3uqs0S6Ii0AR55eIg=

nlko added the kind/bug A bug in existing code (including security flaws) label Apr 19, 2020

hsanjuan mentioned this issue Apr 20, 2020

Ipns Daemon hangs during startup #7182

Closed

hsanjuan changed the title ~~Ipfs daemon hangs~~ Ipfs daemon hangs when MFS root is not available locally Apr 21, 2020

hsanjuan added exp/intermediate Prior experience is likely helpful effort/hours Estimated to take one or several hours P1 High: Likely tackled by core team if no one steps up topic/files Topic files labels Apr 21, 2020

aschmahmann mentioned this issue Nov 20, 2020

Don't hang on start when we can't find the MFS root #6935

Closed

aschmahmann mentioned this issue Dec 29, 2020

extremly long startup time #7845

Closed

schomatis self-assigned this Dec 23, 2021

schomatis mentioned this issue Jan 10, 2022

feat(cmds): files: add new-root command to change the MFS root #8648

Draft

schomatis mentioned this issue Jan 11, 2022

fix(core): look for MFS root in local repo only #8661

Merged

BigLep added this to the Best Effort Track milestone Mar 3, 2022

lidel closed this as completed in #8661 Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ipfs daemon hangs when MFS root is not available locally #7183

Ipfs daemon hangs when MFS root is not available locally #7183

nlko commented Apr 19, 2020

hsanjuan commented Apr 20, 2020 •

edited

Loading

nlko commented Apr 20, 2020

hsanjuan commented Apr 20, 2020

Stebalien commented Apr 20, 2020

nlko commented Apr 20, 2020 •

edited

Loading

Stebalien commented Apr 20, 2020

nlko commented Apr 21, 2020

nlko commented Apr 21, 2020

hsanjuan commented Apr 21, 2020

hsn10 commented Apr 22, 2020

nlko commented Apr 22, 2020

nlko commented Apr 22, 2020

bqv commented Aug 24, 2020

RubenKelevra commented Jan 28, 2021

bqv commented Jan 28, 2021 •

edited

Loading

aschmahmann commented Jan 28, 2021

bqv commented Jan 28, 2021

jjzazuet commented Dec 10, 2021

schomatis commented Dec 23, 2021

RubenKelevra commented Dec 23, 2021

schomatis commented Jan 10, 2022

EtDu commented Feb 23, 2023

Ipfs daemon hangs when MFS root is not available locally #7183

Ipfs daemon hangs when MFS root is not available locally #7183

Comments

nlko commented Apr 19, 2020

Version information:

Description:

hsanjuan commented Apr 20, 2020 • edited Loading

nlko commented Apr 20, 2020

hsanjuan commented Apr 20, 2020

Stebalien commented Apr 20, 2020

nlko commented Apr 20, 2020 • edited Loading

Stebalien commented Apr 20, 2020

nlko commented Apr 21, 2020

nlko commented Apr 21, 2020

hsanjuan commented Apr 21, 2020

hsn10 commented Apr 22, 2020

nlko commented Apr 22, 2020

nlko commented Apr 22, 2020

bqv commented Aug 24, 2020

RubenKelevra commented Jan 28, 2021

bqv commented Jan 28, 2021 • edited Loading

aschmahmann commented Jan 28, 2021

bqv commented Jan 28, 2021

jjzazuet commented Dec 10, 2021

schomatis commented Dec 23, 2021

RubenKelevra commented Dec 23, 2021

schomatis commented Jan 10, 2022

EtDu commented Feb 23, 2023

hsanjuan commented Apr 20, 2020 •

edited

Loading

nlko commented Apr 20, 2020 •

edited

Loading

bqv commented Jan 28, 2021 •

edited

Loading