Spam protection of Swarm through modifying GC rules #757

nagydani · 2018-06-28T19:31:00Z

Nodes get paid through Swap only when they serve a chunk upon a retrieval request. Thus, Swap incentivizes them to store profitable chunks, i.e. ones that can be expected to be requested soon.
Therefore, it is both in the node's best interest and, as argued below, that of the network, not to treat fresh uploads and syncs identically to retrieval requests, when deciding about which chunk to garbage-collect in the event of saturated storage.

One common method of keeping chunks ordered by expected profitability is to keep them in a queue, and move a chunk that has been requested to the head of that queue, whereas garbage collection is performed on the tail. With a suitable data structure, such as a doubly linked list, all aforementioned operations have O(1) complexity. Newly synced or uploaded chunks (the two are indistinguishable) can be inserted at the k=int(αn)th position where n is the size of the queue and α is a constant real parameter between 0 and 1. Note that while finding the kth element in a queue is an O(k) complexity operation, tracking the kth element requires O(1) operations at each update of the queue.

Even an arbitrarily powerful DDoS attack flooding the network with bogus chunks cannot force a guaranteed fraction (namely α) of the most popularly requested chunks out of Swarm. Yet, new uploads are not immediately garbage collected, thus a simple flooding attack won't make uploads impossible either, especially if the uploader is willing to pay the Swap price of moving their content to the front of the queue.

Of course, the latter option is available to the DDoS attacker as well, but it is not free. The larger the Swarm and the more storage space the nodes have, the more expensive it becomes to effectively DDoS it and a large part of the costs gets directly transfered from the attacker to honest nodes.

cobordism · 2018-06-29T13:23:41Z

this makes sense to me.

The only other enhancement I'd suggest is that (if it can be done) retrieval requests that originate on the local node do not move chunks to the top of the queue either.
My thinking is, that if I download a 20GB movie through swarm on my local node, I don't want to have my entire chunk store (of proximate chunks) flushed out because a 20GB retrieval request passed through my node. There is no reason to think that these chunks will be profitable in future, because the retrieval requests were not sent to me due to my location in the network, nor due to the chunks' popularity.

On the other hand, if I use a dapp often, I would want the dapp data cached locally...

In the end I think we may have to explore keeping a separate chunk queue for locally requested chunks. (i.e. any chunks requested as a result of an http request enter the local chunk queue and any chunk received over bzz enter the network chunk queue).
Does that make sense?

cobordism · 2018-06-29T14:31:13Z

or as we just discussed in the standup, we can treat chunks that were requested locally through HTTP requests similarly to synced chunks. That is, we don't insert them at the top of the chunk queue but at some α (either same as for syncing or different α).
This prevents "local binge watching of swarmflix" to completely flush out the local swarm cache of popular/proximate chunks.

holisticode · 2018-07-01T23:13:39Z

It is not clear to me why we first say

and move a chunk that has been requested to the head of that queue,

but then later:

Newly synced or uploaded chunks (the two are indistinguishable) can be inserted at the k=int(αn)th position where n is the size of the queue and α is a constant real parameter between 0 and 1.

In this context it's not clear why *α* is needed at all, where it comes from and what determines it (why between 0 and 1? What's the difference between a value 0.2 and 0.8?.

Does it mean that a constant number of chunks will never be evicted, i.e. the actual "configurable" capacity is smaller than the actual capacity? More clarity please, be aware that there are math morons like me :)

So the queue would look like:

     |   R1  |  R2  |  R3 | ... |  Rα  |  SU1  |  SU2 |  SU3 | ... | SUc |

where Rn are requested chunks and SU n up to c (queue capacity) are synced/uploaded chunks?

holisticode · 2018-07-01T23:13:48Z

It sounds like the following:

if the uploader is willing to pay the Swap price of moving their content to the front of the queue.

isn't the usual insurance (or is it), but rather a new concept of premium for GC priority. This is interesting but if so, should be explicitly emphasized as a new feature.

holisticode · 2018-07-01T23:26:53Z

@homotopycolimit I think your

separate chunk queue for locally requested chunks.

proposal is very good. A local cache. It would also solve the "binge" swarmflix issue. But probaly requires more work.

janos · 2018-07-02T08:56:21Z

A single queue with one constant α for synced chunks and another constant α' for uploaded chunks for gc prioritization compared to chunks that are requested should be the good enough. Keeping two or three queues would more likely put more work on each gc run. The proposal is very good and does make sense to have it implemented.

nagydani · 2018-07-02T10:36:58Z

Yes, I agree with the latter option. Should I amend the text of the ticket?

cobordism · 2018-07-02T11:38:58Z

In this context it's not clear why α is needed at all, where it comes from and what determines it (why between 0 and 1? What's the difference between a value 0.2 and 0.8?.

0 means the head of the queue and 1 means the end of the queue. Everything else means: somewhere in the middle.

Does it mean that a constant number of chunks will never be evicted.

No, because everything that is at the head of the queue can eventually move down as other chunks move to the front of the queue through retrieval requests. But it does mean that at any one time, there are chunks (the entire head of the queue up to alpha * n) that cannot be evicted through new syncing / uploads.

holisticode · 2018-07-02T13:48:41Z

It does mean though that both caches (up to alpha for RetrieveRequests for RetrieveRequests and from alpha to end for SyncRequests) become smaller.

I wonder if it makes sense to make alpha configurable for the user (a user may be running a node solely for profitability)

cobordism · 2018-07-02T14:17:55Z

There still is only one cache and it stays the same size. The question is just, when the cache is full, what gets garbage collected first?

cobordism · 2018-09-14T14:34:15Z

Our main site at theswarm.eth got GC'd. We need to implement these "anti-spam" rules soon

nagydani added enhancement localstore labels Jun 28, 2018

zelig mentioned this issue Nov 26, 2018

Garbage Collection and Syncing and Proof of Burn #1017

Open

adamschmideg mentioned this issue Apr 25, 2019

localstore/db how to insert items in GC index #1031

Open

janos mentioned this issue Dec 11, 2019

LocalStore GC with Quantiles #2035

Open

acud mentioned this issue Sep 9, 2020

Storage incentives architecture proposals ethersphere/bee#654

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spam protection of Swarm through modifying GC rules #757

Spam protection of Swarm through modifying GC rules #757

nagydani commented Jun 28, 2018

cobordism commented Jun 29, 2018

cobordism commented Jun 29, 2018

holisticode commented Jul 1, 2018 •

edited

Loading

holisticode commented Jul 1, 2018

holisticode commented Jul 1, 2018

janos commented Jul 2, 2018

nagydani commented Jul 2, 2018

cobordism commented Jul 2, 2018

holisticode commented Jul 2, 2018

cobordism commented Jul 2, 2018

cobordism commented Sep 14, 2018

Spam protection of Swarm through modifying GC rules #757

Spam protection of Swarm through modifying GC rules #757

Comments

nagydani commented Jun 28, 2018

cobordism commented Jun 29, 2018

cobordism commented Jun 29, 2018

holisticode commented Jul 1, 2018 • edited Loading

holisticode commented Jul 1, 2018

holisticode commented Jul 1, 2018

janos commented Jul 2, 2018

nagydani commented Jul 2, 2018

cobordism commented Jul 2, 2018

holisticode commented Jul 2, 2018

cobordism commented Jul 2, 2018

cobordism commented Sep 14, 2018

holisticode commented Jul 1, 2018 •

edited

Loading