Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider Subsystem + Strategies #5774

Open
29 of 44 tasks
michaelavila opened this issue Nov 13, 2018 · 14 comments
Open
29 of 44 tasks

Provider Subsystem + Strategies #5774

michaelavila opened this issue Nov 13, 2018 · 14 comments
Labels
topic/meta Topic meta topic/perf Performance topic/provider Topic provider

Comments

@michaelavila
Copy link
Contributor

michaelavila commented Nov 13, 2018

Current work is being tracked here: #5870

I'm creating this issue to expose how I'm approaching the provider subsystem and new provider strategies tasks. I will add info as I make progress. If you feel strongly about anything written here, please let me know so we can discuss.

There's a substantial implementation discussion in #5840

There are a handful of related issues: #3813 #4147 #4311
And PRs: #4333 #4113

if it's crossed out, it's completed

Provider Subsystem

Problem

The current method for deciding which blocks to provide is naive and results in performance problems when adding many blocks.

Solution

For a given dag of blocks provide only some subset of those blocks.

Steps

These are some very small steps I'm taking to progress towards the end goal. The first few are so small they wouldn't result in PRs on their own, but I want to give some insight into how I'm approaching the problem. The last few are so vaguely defined they're not actionable, my hope is that they become clearer as we make progress.

1. Remove bitswap provide

The purpose of this step is to remove providing from bitswap and to rely solely on reprovide. This is not optimal.

2. Introduce provider and provide all strategy, use post add

The purpose of this step is to provide each block that has been added using ipfs add using roughly the mechanism the reprovider uses. This is not optimal.

3. Make the provider robust

The purpose of this step is to modify provider so that it provides blocks as efficiently as go-bitswap. This is a little closer to optimal.

  • use something like the logic go-bitswap currently uses to send provide messages

I've deferred bringing the provider and reprovider in line with one another until the very end of all this work. There's a bit to think about that I just don't want to tackle yet. Like, how strategies should be articulated in code and to the user given that provide/reprovide logic is different (e.g. reprovide strategies operate over the entire blockstore), and bigger things like, when reproviding, how to account for blocks that were not provided initially so that they aren't reprovided.

4. Provide all blocks in the appropriate place in go-ipfs

additions and modifications

  • ipfs add
  • ipfs block put
  • ipfs dag put
    • deal with dag put not using core api currently
  • ipfs object new
  • ipfs object patch
    • set-data
    • rmlink
    • addlink
    • append data
  • ipfs object put
  • ipfs tar add
    • deal with tar not using core api currently
  • gatewayHandler
    • postHandler
    • putHandler
      • deal with put not using core api currently
    • deleteHandler
  • ipfs pubsub
    • I believe the only block that needs to be provided is the one created during ipfs pubsub sub <topic>

retrievals

  • ipfs get
    • deal with get not using core api currently
  • ipfs ls
    • deal with ls not using core api currently
  • ipfs cat
  • ipfs refs
    • deal with refs not using core api currently
  • ipfs block get
  • ipfs block stat
  • ipfs object get
  • ipfs object data
  • ipfs object diff
  • ipfs object links
  • ipfs object stat
  • ipfs dag get
    • deal with dag get not using core api currently
  • ipfs dag resolve
    • deal with dag resolve not using core api currently
  • Only provide when fetching data if the data comes from the network (e.g. ipfs ls a cid that you do not have in your local repo)

Can possibly merge at this point.

5. Change find providers to account for some cids not being provided

The purpose of this step is to modify the existing find providers system to work in the situation where the block being looked for is not provided. In the process of starting this work, we will need a strategy for providing only a subset in order to exercise the code that looks for providers. We can use a simplistic strategy like pin roots.

  • introduce simplistic strategy for not providing (announcing) all cids
  • do not get stuck in a loop finding and asking a provider for a provided cid when the non-provided cid you're looking for is not available from that provider

6-8. Introduce new, more efficient strategy

The purpose of this step is to introduce a providing strategy that more robustly solves our providing needs. This is optimal. This is likely many steps and will become clearer as we complete work and @Kubuxu completes his research/experiments. We will elaborate on this step in time.

Can possibly merge at this point.

9. Make the provider and reprovider consistent

The purpose of this step is to ensure we don't have inconsistencies between the provider and reprovider, e.g. a programmer not providing blocks for some operation, but those blocks being provided eventually by the reprovider anyway.

  • if something is not provided initially, don't reprovide it either
  • configure provide and reprovide strategies separately (yes?)
  • allow provide/reprovide to be configured with the same set of strategies (maybe?)

Branches

cc @keks @Kubuxu

@hannahhoward
Copy link
Contributor

This seems like a good approach because if we can get to step 3, we get a speed boost to all commands that provide blocks, independent of finding a better providing strategy. But also, once we come up with such a strategy, it will be easier to introduce.

@eingenito
Copy link
Contributor

I just want to understand 1 and 2; I don't think we can do a release of this work short of 3. Don't we have to be able to at least maintain the current behavior> So are 1 and 2 internal milestones on a longer lived branch on the way to a unified mechanism for re/providing? Or would they be turned with an experiment flag? That seems challenging given how complex the code might be at that intermediate point, but maybe that would actually be a good way to get to 3?

@michaelavila
Copy link
Contributor Author

@eingenito that's correct. That's why I said this "The first few are so small they wouldn't result in PRs on their own, but I want to give some insight into how I'm approaching the problem."

@eingenito
Copy link
Contributor

Totally make sense.

@magik6k
Copy link
Member

magik6k commented Nov 21, 2018

1. Remove bitswap provide

The purpose of this step is to remove providing from bitswap and to rely solely on reprovide. This is not optimal.

I assume you plan on al least improving reprovider? Last time I looked at it it was barely providing 1-few nodes per minute, which isn't really useful, and relying on it for all bitswap (incoming data) providing is probably a bad idea. We also want to provide relevant data quickly after it has been fetched to be able to spread load over more peers in case some piece of content becomes popular.

@michaelavila
Copy link
Contributor Author

@magik6k that's correct. I'll update this issue as I have discovered a few things and slightly changed direction since I last edited it. It might be more accurate now to say that the goal is to introduce an efficient provider system and then, in the future, refactor the reprovider to use the provider in some way.

@magik6k
Copy link
Member

magik6k commented Dec 3, 2018

I see there are some ticks in 4. Provide all blocks in the appropriate place in go-ipfs, do you have some code to look at that you could PR (even if deeply WIP)?

@michaelavila
Copy link
Contributor Author

@magik6k of course! Here: https://github.com/ipfs/go-ipfs/tree/experiment/provide%2Breprovide

Any and all feedback is welcome. What's there right now is intended to be the simplest change I can make to get the behavior I need up to this point. I've had some discussions about introducing some abstraction so that we don't have to explicitly call node.Provider.Provide as much, but I haven't spent much time determining what (and if) that abstraction should be.

@magik6k
Copy link
Member

magik6k commented Dec 3, 2018

This looks mostly good, but as noted on the golang sync, I'd prefer the providing logic to be implemented in coreapi. Ideally commands shouldn't have to access ipfs node (cmdenv.GetNode()) at all.

@michaelavila
Copy link
Contributor Author

@magik6k thank you for taking a look. I will go through and move this stuff to coreapi. If I have any questions I'll reach back out here and I'll add you to the PR when it goes up.

@michaelavila
Copy link
Contributor Author

michaelavila commented Dec 3, 2018

@magik6k question: what do you propose doing about commands which do not delegate out directly to CoreAPI, e.g. ipfs add which uses Unixfs().Add? Disregard, I see now that Unixfs is just part of CoreAPI.

@eingenito
Copy link
Contributor

I just wanted not to lose track of a comment that @Stebalien made in a PR because it's interesting context to some of this discussion: #5870 (comment)

@michaelavila
Copy link
Contributor Author

Note that there are some reasonable requests for the ipfs dht provide command, which is tangentially related to this work. I'd like to look into it as well

@melroy89
Copy link

melroy89 commented Jan 5, 2022

ipfs dht provide command/query is taking still way too long.

Any updates? Any new strategies? I only see a lot of cross issues that are linked.

EDIT: Is it possible to improve the speed using additional parameters maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/meta Topic meta topic/perf Performance topic/provider Topic provider
Projects
None yet
Development

No branches or pull requests

7 participants