Improve IPFS Content Providing

Authors: @aschmahmann

Initial PR: #31

Purpose & impact

Background & intent

Describe the desired state of the world after this project? Why does that matter?

Currently go-ipfs users are able to utilize the public IPFS DHT to find who has advertised they have some CID in under 1.5s in 95+% of cases. However, the process of putting those advertisements into the DHT is slow (e.g. 1 minute) and is a bottleneck for users trying to make their content discoverable. Users who have moderate amounts of content on their nodes complain about their content being hard to find in the DHT as a result of their nodes' inability to advertise. Additionally, some of the measures users can take to reduce the number of provider records they emit by taking actions like only reproviding the roots of graphs (see reprovider strategies) are not generally recommended due to some outstanding issues such as the inability to resume downloads of a DAG.

While R&D work on larger scale improvements to content routing is ongoing we can still take the opportunity now to make our existing system more usable and alleviate much of our users' existing pain with content routing.

After completion of this project the state should be that go-ipfs users with lots of data are able to setup nodes that can put at least 100M records in the DHT per day. Additionally, users should be empowered to not have to advertise data that is not likely to be accessed independently (e.g. blocks that are part of a compressed file).

Assumptions & hypotheses

What must be true for this project to matter?

The IPFS public DHT content provider subsystem is insufficient for important users
The work is useful even though a more comprehensive solution will eventually be put forward, meaning either:
- Users are not willing to wait, or ecosystem growth is throttled, until we build a more comprehensive content routing solution
- The changes made here are either useful independent of major content routing changes, or the changes are able to inform or build towards a more comprehensive routing solution

User workflow example

How would a developer or user use this new capability?

Users who use go-ipfs would be able to tell what percentage of their provider records have made it out to the network in a given interval and would notice more of their content being discoverable via the IPFS public DHT. Additionally, users would have a number of configurable options available to them to both modify the throughput of their provider record advertisements and to advertise fewer provider records (e.g. only advertising pin roots)

Impact

How directly important is the outcome to web3 dev stack product-market fit?

🔥🔥🔥 = 0-3 emoji rating

Probably the most visible primitive in the web3 dev stack is content addressing which allows someone to retrieve data via its CID no matter who has it. However, while content addressing allows a user to retrieve data from anyone it is still critical that there are systems in place that allow a user to find someone who has the data (i.e. content routing).

Executing well here would make it easier for users to utilize the IPFS public DHT, the mostly widely visible content routing solution in the IPFS space. This would dramatically improve usability and the onboarding experience for both new users and the experience of existing users, likely leading to ecosystem growth.

Leverage

How much would nailing this project improve our knowledge and ability to execute future projects?

🎯🎯🎯 = 0-3 emoji rating

Many of the components of this proposal increase development velocity by either exposing more precise tooling for debugging or working with users, or by directly enabling future work.

Confidence

How sure are we that this impact would be realized? Label from this scale.

2 . We don't have direct market research demonstrating improving the resiliency of content routing will definitely lead to more people choosing IPFS or to work with the stack. However, this is a pain point for many of our users (as noted on the IPFS Matrix, Discuss and GitHub) and something we have encountered as an issue experienced by various major ecosystem members (Protocol Labs infra, Pinata, Infura, etc.).

Project definition

Brief plan of attack

Enable downloading sub-DAGs when a user already has the root node, but is only advertising the root node
- e.g. have Bitswap sessions know about the graph structure and walk up the graph to find providers when low on peers
Add a new command to go-ipfs (e.g. ipfs provide) that at minimum allows users to see how many of their total provider records have been published (or failed) in the last 24 hours)
Add an option to go-libp2p-kad-dht for very large routing tables that are stored on disk and are periodically updated by scanning the network
Make IPFS public DHT puts take <3 seconds (i.e. come close to get performance)
- Some techniques available include:
  - Decreasing DHT message timeouts to more reasonable levels
  - Not requiring the "followup" phase for puts
  - Not requiring responses from all 20 peers before returning to the user
  - Not requiring responses from the 3 closest peers before aborting the query (e.g. perhaps 5 of the closest 10)
Add a function to the DHT for batch providing (and putting) and utilize it in go-ipfs
- Tests with libp2p/go-libp2p-kad-dht#709 showed tremendous speedups even in a single threaded provide loop if the provider records were sorted in XOR space

What does done look like?

What specific deliverables should completed to consider this project done?

The project is done when users can see how much of their provide queue is complete, are able to allocate resources to increase their provide throughput until satisfied, and allocating resources is either not prohibitively expensive, or it is deemed too much work to decrease the resource allocation.

What does success look like?

Success means impact. How will we know we did the right thing?

Success means that much fewer users report issues finding content, instead things either work for them or they file issues or ask questions on how to decrease their resource usage for providing. Things should just work for users who have 10-100k provider records and leave their nodes on continuously.

Counterpoints & pre-mortem

Why might this project be lower impact than expected? How could this project fail to complete, or fail to be successful?

People have other issues that the DHT put performance is just masking, which means we will not immediately be able to see the impact from this project alone
Users will not want to spend the raw bandwidth of emitting their records even if lookups are instant
Decreasing the query put time is much harder than anticipated
Technical work required is harder than anticipated

Alternatives

How might this project’s intent be realized in other ways (other than this project proposal)? What other potential solutions can address the same need?

These alternatives are not exclusive with the proposal

Focus on decreasing the number of provider records
- e.g. Add more options for data reproviding such as for UnixFS files only advertising Files and Directories
- might be tricky UX and plumbing, but is something we likely will need to tackle eventually
Focus on decreasing the frequency of reproviding records
- e.g. Build a second routing layer where nodes are encouraged or required to have high availability (e.g. a federated routing layer or opt-in second DHT that tracks peer availability more rigorously)
- has possibility for high payoff, although has more risk associated with it

Dependencies/prerequisites

None

Future opportunities

Making it easier to implement alternative #1 above (enabled by ipfs provide and being able to download sub-DAGs when only the root node is provided)
Vastly improved lookup performance of the delegated routers that can be used in js-ipfs (enabled by allowing users to have large routing tables)

Required resources

Effort estimate

L. There is some uncertainty in how much work will be required to increase put performance. However, all of the changes are client side which make them relatively easy to test. This estimate could be an overestimate as some of the changes have some uncertainty which is currently being estimated at the higher end (i.e. the work in go-ipfs and go-bitswap)

Roles / skills needed

3-4x go-engineers
- 1-2x go-ipfs experience
- 1-2x go-libp2p (ideally go-libp2p-kad-dht) experience
Some input and support may be required from research

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ipfs-content-providing.md

ipfs-content-providing.md

Improve IPFS Content Providing

Purpose & impact

Background & intent

Assumptions & hypotheses

User workflow example

Impact

Leverage

Confidence

Project definition

Brief plan of attack

What does done look like?

What does success look like?

Counterpoints & pre-mortem

Alternatives

Dependencies/prerequisites

Future opportunities

Required resources

Effort estimate

Roles / skills needed

Files

ipfs-content-providing.md

Latest commit

History

ipfs-content-providing.md

File metadata and controls

Improve IPFS Content Providing

Purpose & impact

Background & intent

Assumptions & hypotheses

User workflow example

Impact

Leverage

Confidence

Project definition

Brief plan of attack

What does done look like?

What does success look like?

Counterpoints & pre-mortem

Alternatives

Dependencies/prerequisites

Future opportunities

Required resources

Effort estimate

Roles / skills needed