Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Priortise FastRetrieval candidates from indexer #39

Closed
rvagg opened this issue Jan 17, 2023 · 1 comment
Closed

Priortise FastRetrieval candidates from indexer #39

rvagg opened this issue Jan 17, 2023 · 1 comment

Comments

@rvagg
Copy link
Member

rvagg commented Jan 17, 2023

From #35 (comment)

This has been discussed in the context of autoretrieve for some time now but I think originally they were all set to true but now it appears Boost has this wired up so keep-unsealed-copy config = FastRetrieval metadata.

How:

  1. The Metadata field from the indexer results is base64 (padded) encoded varint-prefixed dag-cbor which can be decoded via https://github.com/ipni/index-provider/blob/19931fd5c692e5efd38093c1764cf0a3ca464af4/metadata/graphsync_filecoinv1.go#L66
  2. FastRetrieval = true entries should get priority in the queryCompare function for the prioritywaitqueue when we have multiple queries returned and waiting for attempts.

Questions:

  1. Should prioritisation be even higher than this? If candidate A is FastRetrieval=false and B is FastRetrieval=true and A returns its query first, should we hold off on attempting a retrieval and give B a chance to come back first?
  2. Perhaps we should shorten the first-byte timeout for FastRetrieval=false candidates under the assumption that they may be attempting an unseal?
  3. If we have >X candidates and some high percentage of them are FastRetrieval=true, perhaps we should filter out the others and only attempt the ones we assume have unsealed copies? e.g. 10 candidates for a CID, 7 of them are FastRetrieval=false, don't even bother attempting the other 3?
@hannahhoward
Copy link
Collaborator

Let's talk through this in more detail. As we start thinking about how we get the data the fastest, here are some thoughts:

  • I think we need to find a way to skip the query phase. My proposal is as follows:
    • simply propose free deals to each returned peer, one by one -- they'll respond if they don't accept free
    • prioritize the order by:
      • first, whether peers we have an existing libp2p connection to first (since this is a ~0.5s penalty)
      • second, by peers who advertise "fast retrieval" and/or "verified data" first (since these are most likely to return free retrievals)
      • eventually, by tracking information about peers and prioritizing the most efficient ones (probabilisticly)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants