Garbage collector deletes data stored in the MFS (which was pinned) #7008

RubenKelevra · 2020-03-17T23:13:40Z

Version information:

go-ipfs version: 0.4.23-6ce9a355f
Repo version: 7
System version: amd64/linux
Golang version: go1.14

Description:

I'm using IPFS in a script which updates the local MFS as needed. New files are added with ipfs cp /ipfs/<cid> /path/to/file after ipfs-cluster-ctl added them to the cluster.

So the files are pinned locally (by the cluster service) and also stored in the MFS.

Files which should be deleted are removed from the MFS and I use ipfs-cluster-ctl to add a expire timeout of 14 days to the file.

Since I started to add a lot of files to the repo, I decided to let the garbage collector deal with old stuff and clean up the repo.

After the garbage collector completed the work.

Now I cannot get the hashes or the content of some in the MFS stored files. This is unexpected and should not happen (as far as I understand).

ipfs files ls /path/to/file/ | grep "filename" shows that the directory still contains the file, when the daemon is freshly started. After a files stat --hash on the file, the directory cannot be listed anymore until the daemon is restarted.

$ ipfs files stat --hash --timeout 120s /path/to/a/file.img
Error: Post "http://127.0.0.1:5001/api/v0/files/stat?...&hash=true&stream-channels=true&timeout=120s": context deadline exceeded

ipfs-cluster-ctl shows me the CID and that it's allocated on the local node (and pinned).

ipfs dht findprovs <CID> (the cid taken from ipfs-cluster-ctl) returns with no result - which explains why I cannot access the file anymore.

ipfs pin ls --timeout=120s /ipfs/<CID> results in a timeout.

$ ipfs repo verify returns with a successful integrity check of the repo.

IPFS/IPFS-Cluster stores the blocks and the databases on a ZFS filesystem which reports no integrity errors.

The text was updated successfully, but these errors were encountered:

RubenKelevra · 2020-03-18T09:04:51Z

After a fresh start of the ipfs-daemon I cannot remove the one file I identified so far from the MFS.

$ ipfs files rm /path/to/file.bin does not return

I try to recover from the situation by just adding all files again to the ipfs repo (with pin=0). Hopefully just the blocks are missing and not the metadata is corrupt.

RubenKelevra · 2020-03-18T13:12:34Z

So the issue are 'just' missing blocks, which also lead to non-fullfillable requests like files stat --hash on a file with missing blocks or non-working files rm.

After adding all files again without pinning I could remove the file with the issue and found 3 other files which blocks was also missing. I added them too from a backup and could continue.

So the GC seems to be not safe to use when anything is happening to the MFS, especially worrying was for me that the file was in the MFS and pinned too. Since the files was all pinned I don't see how this was happening in the first place. Maybe ipfs-cluster-service is unpinning and pinning again right afterwards when I add a timeout to a pin with ipfs-cluster-ctl pin add --expire-in and for the short duration while the file was unpinned it got removed.

This still doesn't explain, while a file which is in the MFS can lose it blocks when the GC is running.

ribasushi · 2020-03-18T13:21:52Z

This sounds like a missing lock somewhere. The team is in over-drive right now trying to get #6776 out the door, so response might be delayed by a week or two.
Sorry about that!

RubenKelevra · 2020-03-18T18:42:52Z

@ribasushi I don't expect a priority on this one, since it's just a race condition anyway. Maybe just happening in my setup and similar ones.

But I think it should be reviewed if the first RC is out, just to make sure it's not a widespread issue. :)

I commented several times to document my recovery efforts to make sure to get the most informations on this event as possible, not to push it again.

Some thoughts on this topic:

There was no error, warning or info message while this happened or afterwards while the access was not possible.

I'm wondering how files stat --hash can be impacted by missing data, since a simple files ls can list the content of the folder. I think that a stat with --hash is trying to read too much data - it should just access the directory listing and return the hash.

I'm not sure how the files rm can fail if the element is missing. I think this could be optimized too, that it doesn't require access to the data behind a CID, if the user request to remove it. GC would remove the CID and any blocks remaining anyway, since they are not referenced anymore. Or am I missing something? 🤔

RubenKelevra · 2020-04-05T15:53:08Z

I can confirm this bug for this version as well:

go-ipfs version: 0.5.0-dev-6c45f9ed9
Repo version: 9
System version: amd64/linux
Golang version: go1.13.8

I basically have to stop my scripts and add the data back to the repo with pin=0 to make sure everything is still available for IPFS after each run of the GC :/

schomatis · 2021-12-23T16:27:43Z

Probably related to #6113.

RubenKelevra added the kind/bug A bug in existing code (including security flaws) label Mar 17, 2020

RubenKelevra mentioned this issue May 4, 2020

extremly slow ingress of data via ipfs-cluster-ctl without --local ipfs-cluster/ipfs-cluster#1111

Open

RubenKelevra mentioned this issue Jun 16, 2021

ipfs freezes (memleak?) #8195

Closed

schomatis self-assigned this Dec 23, 2021

schomatis mentioned this issue Jan 21, 2022

GC and MFS are not safe #6113

Open

BigLep added this to Backlog in Maintenance Priorities - Go Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage collector deletes data stored in the MFS (which was pinned) #7008

Garbage collector deletes data stored in the MFS (which was pinned) #7008

RubenKelevra commented Mar 17, 2020 •

edited

RubenKelevra commented Mar 18, 2020

RubenKelevra commented Mar 18, 2020

ribasushi commented Mar 18, 2020

RubenKelevra commented Mar 18, 2020

RubenKelevra commented Apr 5, 2020 •

edited

schomatis commented Dec 23, 2021

Garbage collector deletes data stored in the MFS (which was pinned) #7008

Garbage collector deletes data stored in the MFS (which was pinned) #7008

Comments

RubenKelevra commented Mar 17, 2020 • edited

Version information:

Description:

RubenKelevra commented Mar 18, 2020

RubenKelevra commented Mar 18, 2020

ribasushi commented Mar 18, 2020

RubenKelevra commented Mar 18, 2020

RubenKelevra commented Apr 5, 2020 • edited

schomatis commented Dec 23, 2021

RubenKelevra commented Mar 17, 2020 •

edited

RubenKelevra commented Apr 5, 2020 •

edited