Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

localstore: fix index leak when NN chunks enter localstore #1940

Merged
merged 8 commits into from
Nov 14, 2019

Conversation

acud
Copy link
Member

@acud acud commented Nov 11, 2019

This PR fixes an index leak that was manifesting when chunks that would enter the node's local storage would, under certain connectivity conditions be not added to the GC index.

This is due to the fact that the uploader node would be the closest node to the chunk, thus again, under certain connectivity condition this would result in the chunk never syncing, and the retrievalDataIndex eventually growing out of bound, since no matter how many chunks in lower bins would be garbage collected, the higher bins would always make the store size eventually overflow.

This fix introduces a pluggable function to localstore that is called when a chunk enters the local storage.

During implementation several weaknesses were unraveled:

  1. a node with very little storage could GC the content uploaded before it is pull synced
  2. as a result, anonymous content syncing on light nodes could be severely damaged
  3. a flag indicating that the node is actively waiting for an upload to sync should probably be added to Swarm in order to short-circuit this mechanism for ephemeral nodes

fixes #1920

@acud acud added the bug label Nov 11, 2019
@acud acud requested review from janos, zelig and nolash November 11, 2019 08:20
@acud acud self-assigned this Nov 11, 2019
@acud acud added this to Backlog in Swarm Core - Sprint planning via automation Nov 11, 2019
@acud acud moved this from Backlog to In progress in Swarm Core - Sprint planning Nov 11, 2019
@acud acud force-pushed the fix-index-leak-maybe-hopefully branch 3 times, most recently from def1d03 to ee99c08 Compare November 11, 2019 08:44
@janos
Copy link
Member

janos commented Nov 11, 2019

The code cannot be compiled:

storage/localstore/mode_put_test.go:325:19: cannot use func literal (type func(chunk.Address) bool) as type func([]byte) bool in field value

a chunk needs be et after being Put into the database
@acud acud force-pushed the fix-index-leak-maybe-hopefully branch from ee99c08 to e10b1e4 Compare November 11, 2019 10:42
@acud acud moved this from In progress to In review (includes Documentation) in Swarm Core - Sprint planning Nov 11, 2019
storage/localstore/localstore.go Outdated Show resolved Hide resolved
storage/localstore/mode_put.go Outdated Show resolved Hide resolved
sync_test.go Outdated Show resolved Hide resolved
storage/localstore/mode_put_test.go Outdated Show resolved Hide resolved
network/kademlia.go Outdated Show resolved Hide resolved
storage/localstore/localstore.go Outdated Show resolved Hide resolved
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

Copy link
Contributor

@nolash nolash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see @zelig reviewed it while I had breakfast halfway. Anyway, curious about the negative bin id

Also curious about the description;

as a result, anonymous content syncing on light nodes could be severely damaged

what is anonmous content syncing on light nodes?!

under certain connectivity condition

I would like descriptions to be more precise, helps us puny humans that don't have context to understand better and do better review work.

storage/localstore/mode_put.go Outdated Show resolved Hide resolved
storage/localstore/mode_put.go Outdated Show resolved Hide resolved
@acud
Copy link
Member Author

acud commented Nov 13, 2019

@nolash the issue is as follows:
In the old syncer an upstream peer would tell the downstream peer on which bins to ask for chunks on (subscriptions). this would guarantee, for example, that the upstream peer would establish subscriptions on bins within depth, although that upstream peer might have the downstream peer outside of it's saturation depth. this would also always guarantee that all chunks will get synced from the downstream peer at some point or another (since subscriptions were established according to the downstream peer's perspective).

in the new stream this is not the case. upstream peers establish streams according to their own kademlia and regardless of the downstream peer's connectivity. while this is more resilient implementation (and less attack prone), it did not in its naive implementation cater for the fact that the downstream peer might be the closest node to certain chunks. the connectivity pattern that was manifesting was that the downstream peer appeared outside of saturation depth of other nodes, thus syncing of certain chunks was not bound to happen (since the downstream peer was the closest one and the upstream peers all had not been subscribed to that node's higher bins since it was outside of their depth). this therefore manifested in a localstore leak since some chunks were never Set therefore never added to gcIndex therefore never deleted.

@nolash
Copy link
Contributor

nolash commented Nov 13, 2019

@janos I think I understand more or less. Thanks.

@nolash
Copy link
Contributor

nolash commented Nov 13, 2019

Still not sure about the "what is anonmous content syncing on light nodes" though :)

@acud acud merged commit 70d4fc0 into master Nov 14, 2019
Swarm Core - Sprint planning automation moved this from In review (includes Documentation) to Done Nov 14, 2019
@acud acud deleted the fix-index-leak-maybe-hopefully branch November 14, 2019 12:37
@acud acud mentioned this pull request Nov 14, 2019
acud added a commit that referenced this pull request Nov 20, 2019
@acud acud added this to the 0.5.3 milestone Nov 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

problem with GC
4 participants