Provide fewer nodes #6155

eingenito · 2019-03-28T20:29:29Z

Problem

Providing is an expensive operation and even with significant DHT speedups is likely to remain so. The growth of the IPFS network has made this problem more acute. It would be beneficial if IPFS instances could choose to provide only a subset of all nodes without dramatically impacting the availability and retrieval speed of the data.

Notes

The Go-IPFS team began discussion of this problem in our October 2018 London meetup.

@Kubuxu devised an algorithm and associated visualization that demonstrated the selection of good nodes for providing.
A requirement for all partial providing strategies is associated changes in bitswap (or whatever) that allow it to walk 'up' the dag from a node, searching for providers of ancestor nodes (since the current node may never have been provided).

eingenito · 2019-03-28T21:21:36Z

Some conversation paraphrased from IRC:

@steb: I think the higher priority work will be the reverse link graph. We're going to run into problems with provider strategies unless we have some way to traverse abck up a dag and find providers for parents.
@eingenito: I wonder if the reverse link graph needs to be persisted or if it can be transient and owned by a bitswap session? Note: I think that there are other features enabled by a persistent database of per node metadata and this might a good place to start.

@eingenito: I think Kuba's algorithm was maximally efficient but required analysis of an entire dag to select optimal nodes for providing? Is that right?
@Kubuxu : no, it was nowhere near maximum efficiency but it required minimal information about the DAG. It just walked a DAG and made decisions depending on current node (and single number about the past state of the tree as we walked it).

@eingenito - Also, given how expensive providing is always likely to be I wonder if it isn't more useful to assign a priority to a given node wrt providing and then just provide in priority order knowing that many will simply never get provided.

Edit(kubux): s/@kuba/Kubuxu/ (someone else has handle kuba on GH.

Kubuxu · 2019-03-28T23:03:42Z

The core idea about probabilistic providing bases on the fact that when someone has a given node there is a good chance he has its children. When we are missing a node we can try walking up the tree of nodes and try to resolve providers as we backtrack.

The essential part is having information about backlinks from the current node. This can be implemented in a number of fashions:

backlink/graph DB (hard/costly)
tracking backlinks as we recurse a tree
storing root and path to the current node as we recurse a tree

The probabilistic providing strategy has to optimize which nodes to provide depending on a number of factors:

A: target ratio of nodes we want to provide
B: maximum+average walk up backlinks to find a single provider
C: average walk up backlinks to find M of N providers

Why I propse probabilistic instread of deterministc providing strategy is that in the former the C is much smaller than B, this isn't true in the latter (while maintianing the same A).

I've also discovered a way to describe basic probabilistic providing strategies as markov chains, I used it in my simulations (as markov chains simplify to matrix multiplications). I will explain it if there is interest in it.

eingenito · 2019-03-29T18:46:16Z

@Kubuxu thanks for the synopsis, it's very helpful. I have an observation, already mentioned above, but I want to restate it. Providing is so expensive, and the load on IPFS nodes so variable that I think it might always be important to model providing (and reproviding) as a best effort service. A probabilistic scheme for choosing good blocks to provide might be best used as a source of weights or priorities for all blocks so the re/provider system can appropriately order actual writing to the DHT. That way we can optimize providing under changing conditions and backfill lower priority provider records when conditions allow. Does that make sense?

Kubuxu · 2019-03-30T17:07:19Z

It makes a lot of sense. Another thing is, we still have a lot of headway to improve our DHT.

eingenito added the topic/meta Topic meta label Apr 8, 2019

eingenito mentioned this issue Apr 17, 2019

Provider Strategy Discussion #6221

Open

momack2 added this to Inbox in ipfs/go-ipfs May 9, 2019

michaelavila added the topic/provider Topic provider label Jun 5, 2019

michaelavila mentioned this issue Jul 30, 2019

Providing System ipfs-inactive/package-managers#84

Open

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide fewer nodes #6155

Provide fewer nodes #6155

eingenito commented Mar 28, 2019 •

edited

eingenito commented Mar 28, 2019 •

edited by Kubuxu

Kubuxu commented Mar 28, 2019

eingenito commented Mar 29, 2019

Kubuxu commented Mar 30, 2019

Provide fewer nodes #6155

Provide fewer nodes #6155

Comments

eingenito commented Mar 28, 2019 • edited

Problem

Notes

eingenito commented Mar 28, 2019 • edited by Kubuxu

Kubuxu commented Mar 28, 2019

eingenito commented Mar 29, 2019

Kubuxu commented Mar 30, 2019

eingenito commented Mar 28, 2019 •

edited

eingenito commented Mar 28, 2019 •

edited by Kubuxu