Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible inconsistency of new data, with the "--local" flag #1067

Open
RubenKelevra opened this issue Apr 14, 2020 · 5 comments
Open

Possible inconsistency of new data, with the "--local" flag #1067

RubenKelevra opened this issue Apr 14, 2020 · 5 comments
Labels
kind/bug A bug in existing code (including security flaws) need/review Needs a review

Comments

@RubenKelevra
Copy link
Collaborator

Additional information:

  • OS: Linux
  • IPFS Cluster version: 0.12.1
  • Installation method: dist.ipfs.io

Describe the bug:

When using ipfs add with the --local flag, data is pushed to the local ipfs daemon, for faster completion of this operation, instead of pushing it to remote nodes.

But this data doesn't seem to be protected from a garbage collection run. So if the garbage collector is being used, while adding data there's a high chance that data will be removed before it get's replicated completely in the cluster.

The lower the bandwidth and the higher the amount of data, the higher the chance that this will occur.

We need to pin the locally added data until at least $minimum-redundancy peers report a successful pin operation on them. In the case of $minimum-redundancy=-1 it's probably wise to wait for 50% to have it pinned.

We can clean up the state also, if the peer adding the data is also in the allocation list, and has successfully completed the pin operation (while not unpinning in this case).

@RubenKelevra RubenKelevra added kind/bug A bug in existing code (including security flaws) need/review Needs a review labels Apr 14, 2020
@hsanjuan
Copy link
Collaborator

IPFS does not offer a way to do a transaction or disable GC. We cannot pin until we are finished adding, and thus GC could run before that as well.

The solution here is simple though, but depends on the user: do not let IPFS run GC automatically when you are adding things.

@RubenKelevra
Copy link
Collaborator Author

Well, when the GC is enabled to run automatically (--enable-gc) there's no way to block it, right?

And since the GC will run when the storage is full, it's basically always the case when you add new stuff, so chances are high that data gets corrupted that way.

IPFS does not offer a way to do a transaction or disable GC. We cannot pin until we are finished adding, and thus GC could run before that as well.

How exactly does ipfs add handle this case? This should be a transaction, right?

Can't we just use ipfs add and then unpin, when we got a status update, that enough other peers have pinned it?

@hsanjuan
Copy link
Collaborator

Can't we just use ipfs add and then unpin

Cluster does the chunking on it's own and then do block-puts on ipfs (the reason is that we wanted to do sharding of the chunks). So yeah, it's an option, but not an overly cheap one in terms of dev time. And the workaround of disabling gc and running gc manually exists.

@RubenKelevra
Copy link
Collaborator Author

And the workaround of disabling gc and running gc manually exists.

Yeah, I agree, that's a good temporary solution, the issue is more: It's not documented anywhere that this issue exist in the first place.

Maybe we should refuse to start if the --enable-gc flag is set on the IPFS daemon and explain the current limitation?

I fixed my setup like 4-5 times because of that 😕

@hsanjuan
Copy link
Collaborator

It's not documented anywhere that this issue exist in the first place.

I have added a mention in the documentation (setup docs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/review Needs a review
Projects
None yet
Development

No branches or pull requests

2 participants