Rfm 17.1 - Sharing Provider Records with Multiaddress #22

cortze · 2022-11-11T15:00:26Z

This is the first draft of the report that extends RFM17 to measure if the Multiaddresses of a content provider are being shared during the retrieval process of a CID process.

It includes the study's motivation, the methodology we followed, the discussion of the results we got out of the study, and a conclusion.

All kind of feedback is appreciated, so please, go ahead to point out improvements!

Also, should I be running a more extensive set of CIDs for extended periods?

cc: @yiannisbot @guillaumemichel @dennis-tra

yiannisbot

Great work! Some typos and a couple of clarification comments.

RFMs.md

yiannisbot · 2022-11-15T07:29:52Z

RFMs.md

+#### Measurement Plan
+
+- Spin up a node that generates random CIDs and publishes provider records.
+- Periodically attempt to fetch the PR from the DHT, tracking whether they are retrievable and whether they are shared among the multiaddresses.


What do we mean "shared among the multiaddresses"? Whether the PR can be found in the multiaddress of the original node that stored the record?

As a third "Measurement Plan", how about just getting a bunch of PeerIDs and their multiaddresses and pinging them over time to see whether they listen to that multiaddress? The same as we do for PRs, but now for peer records.

What do we mean "shared among the multiaddresses"? Whether the PR can be found in the multiaddress of the original node that stored the record?

In the networking layer, when we ask for the PR of a CID, we just get as a reply an AddrInfo of each provider that the remote peer is aware of. So the PR as we understand it, it's just how we store it in the DHT.
This AddrInfo contains two fields: PeerID and Multiaddresses, and it will only fill up the Multiaddresses if their TTL are still valid.

how about just getting a bunch of PeerIDs and their multiaddresses and pinging them over time to see whether they listen to that multiaddress?

That can be a nice side experiment, yes. Although I think that we are indirectly doing it. In the hoarder, I keep the AddrInfo of each PR Holder with the first Multiaddresses that I got from the publication, and I only use those addresses to establish the connections. So if they were changing IPs, I wouldn't be able to connect to them.

Let me know anyways if you want me to set up a specific test for the IP rotation.

Re: "shared among the multiaddresses": ok, so IIUC, we either mean if the PR is available in the multiaddresses of all the content providers, or if all the multiaddresses of all content providers are included in the PR. :) Is it any of these two?

Re: IP Rotation: that's great! But for this experiment we're keeping the connection open for 30mins to check the TTL, right? Can we run an experiment where we keep those connections open for a time period equal to the Expiry Interval? It would be 24hrs according to the current setting and 48hrs according to our proposal. Ideally, we'd also need to do that for a large number of peers.

we either mean if the PR is available in the multiaddresses of all the content providers, or if all the multiaddresses of all content providers are included in the PR. :) Is it any of these two?

We are in the second one, we only get an AddrInfo for those providers that the remote peer is aware of, and it depends on the TTL of the Multiaddress to include them or not in the AddrInfo of the provider. Should I say it in the opposite way "the multiaddress is shared among the PRs"?
Let me point you to the code; maybe is easier to understand it:

here is the inner method during the dht.FindProviders() method.

here is the networking method to ask for the PRs to a remote peer.

Can we run an experiment where we keep those connections open for a time period equal to the Expiry Interval?

Absolutely! I can make a new run with 10k-20k CIDs over 60h if that is enough

Rephrasing the first point to make it clearer. Regarding the extra experiment: that would be fantastic, yes!

RFMs.md

yiannisbot · 2022-11-15T08:29:06Z

results/rfm17.1-sharing-prs-with-multiaddresses.md

+
+Results are similar when we analyze the replies of the peers that report back the PR from the DHT lookup process. We increased the number of content providers we were looking for to track the multiple remote peers. Figure [3] represents the number of remote peers reporting the PR for the CIDs we were looking for, where we can see a stable 20 peers by median over the entire study. 
+
+For those wondering why more than 20 peers (k replication value when publishing the CIDs) are reporting the PR, we must remind you that Hydra-Boosters share the PR database among different `PeerID` heads. Which means that if one hydra hears about a PR, all the leaders of that common database will also share it.  


If that were the case, then we'd see about 2k holders, i.e., approximately the same as the number of Hydra heads in the network. Could it instead be other peers that fetch and reprovide the content?

Suggested change

For those wondering why more than 20 peers (k replication value when publishing the CIDs) are reporting the PR, we must remind you that Hydra-Boosters share the PR database among different `PeerID` heads. Which means that if one hydra hears about a PR, all the leaders of that common database will also share it.

For those wondering why more than 20 peers (k replication value when publishing the CIDs) are reporting the PR, we must remind you that Hydra-Boosters share the PR database among different `PeerID` heads. Which means that if one hydra hears about a PR, all the heads of that common database will also share it.

In the hoarder, I always check if the PRs that are shared back match the PeerID of the content publisher (which, in this case, is my local host 1, the publisher one). So if someone tries to reprovide or ping the CID, it wouldn't affect these results.

About the hydras, I'm not aware of how many hydra "bellies" are out there. Is there a single big one or multiple small ones? Also, we have to keep in mind that the DHT lookup converges into a region of the SHA256 hash space, so it's quite unlucky that we will get connections and replies from hydras that are in the opposite part of the hash space.

Is there a single big one or multiple small ones?

Yup, there is a single one shared among all of them.

results/rfm17.1-sharing-prs-with-multiaddresses.md

yiannisbot · 2022-11-15T08:47:19Z

@cortze I've just done a thorough review of this - great work! My main worry is that the claim of: "if a multiaddress is returned together with the PeerID for the TTL period (10mins or 30mins), then we can extend the TTL to the PR expiry interval" doesn't really hold. Why would we arrive to this conclusion?

The main argument in order to increase the multiaddress TTL to the PR expiry interval would be to show that the multiaddress of the PR holder doesn't usually change. It would be great to have some experiments along the lines of the comment I inserted above: #22 (comment)

I'd love to hear your thoughts on this. Basically, similar to the CID Hoarder, what we need here is a PeerID Hoarder :-D This tool would get a lot of PeerIDs, record the multiaddress by which we first saw the peer and then periodically ping the peer to figure out if it changed its Multiaddress within the PR Expiry Interval. I'm not sure if this functionality can easily be included in Nebula @dennis-tra ? This is what would give us a solid justification to argue for the extension of the TTL.

Other thoughts?

Typos and rephrasings Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

cortze · 2022-11-15T10:40:21Z

Thanks for the feedback @yiannisbot , I really appreciate it!

My main worry is that the claim of: "if a multiaddress is returned together with the PeerID for the TTL period (10mins or 30mins), then we can extend the TTL to the PR expiry interval" doesn't really hold. Why would we arrive to this conclusion

I will try to make it a bit more explicit in the conclusion (my bad). It's not an "it won't hold" statement. It is an "It won't have as much impact as we are expecting" statement.

As far as your network has different TTL values for Multiaddresses (like in the current network), the smallest TTL will be the one limiting negatively the final result of the DHT lookup process (at least the go-libp2p-kad-dht one). So unless the largest part of the network updates to that TTL, we will still face the same problem, and there will still be sporadic problems originated from those remaining "old" clients. (The double-hashing implementation would be a nice incentive to force a total network update)

Basically, similar to the CID Hoarder, what we need here is a PeerID Hoarder :-D This tool would get a lot of PeerIDs, record the multiaddress by which we first saw the peer and then periodically ping the peer to figure out if it changed its Multiaddress within the PR Expiry Interval.

I left you a comment as well in the #22 comment
I think that we have a few options here. The hoarder already does this indirectly (it contacts the PR Holders to the Multiaddress that we stored while storing the PRs). Also, I think that Nebula already tracks IP rotation. We could have a deeper chat about this :)

I'll iter again over your comments and suggestions, will ping you back whenever I make a commit!

dennis-tra · 2022-11-21T08:09:03Z

I'm not sure if this functionality can easily be included in Nebula @dennis-tra ?

Sorry for the late reply! The information is already recorded by Nebula and would just need to be analyzed :)

I'll iter again over your comments and suggestions, will ping you back whenever I make a commit!

Just ping here or in Discord and I'll also have a proper read. I just skimmed it in the past 🙈

cortze · 2022-11-21T10:26:29Z

I already added some explanations and most of the changes that @yiannisbot suggested. I set up another Hoarder run with 20k CIDs for 60 hours, so the plots and some numbers might change.

If you can go through and give me some thoughts @dennis-tra , I would appreciate your feedback as well 😄

RFMs.md

yiannisbot · 2022-11-22T07:36:02Z

RFMs.md

+#### Measurement Plan
+
+- Spin up a node that generates random CIDs and publishes provider records.
+- Periodically attempt to fetch the PR from the DHT, tracking whether they are retrievable and whether they are shared among the multiaddresses.


Rephrasing the first point to make it clearer. Regarding the extra experiment: that would be fantastic, yes!

results/rfm17.1-sharing-prs-with-multiaddresses.md

yiannisbot · 2022-11-22T07:40:22Z

results/rfm17.1-sharing-prs-with-multiaddresses.md

+
+Results are similar when we analyze the replies of the peers that report back the PR from the DHT lookup process. We increased the number of content providers we were looking for to track the multiple remote peers. Figure [3] represents the number of remote peers reporting the PR for the CIDs we were looking for, where we can see a stable 20 peers by median over the entire study. 
+
+For those wondering why more than 20 peers (k replication value when publishing the CIDs) are reporting the PR, we must remind you that Hydra-Boosters share the PR database among different `PeerID` heads. Which means that if one hydra hears about a PR, all the leaders of that common database will also share it.  


Is there a single big one or multiple small ones?

Yup, there is a single one shared among all of them.

yiannisbot

Minor edit to address one of my previous comments.

yiannisbot · 2022-11-22T07:50:19Z

The hoarder already does this indirectly (it contacts the PR Holders to the Multiaddress that we stored while storing the PRs). Also, I think that Nebula already tracks IP rotation.

Great that the Hoarder contacts the original Multiaddress! That's what we need. So if we run the experiment for long enough and monitor that, then we have what we're looking for.

This ^ together with an analysis of logs from Nebula will tell us what is the rate of PR Holders that switch IP addresses over the republish interval. I think with those two, this will be complete and ready for merging.

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

guillaumemichel · 2022-11-22T10:04:53Z

results/rfm17.1-sharing-prs-with-multiaddresses.md

+
+_Figure 2: Number of PR Holders replying with the `PeerID` + `Multiaddress` combo._
+
+### 4.2-Reply of peers reporting the PR during the DHT lookup


I am not sure I understood this correctly: The only difference between 4.1 and 4.2 is that Hydras appears in 4.2 but not 4.1?

Since hydras are present in the set of PR holders, they appear in both 4.1 and 4.2.
However, since the DHT lookup wasn't stopped after the first retrieval of the PRs, I assume that most of the peers that report the PRs beyond those initial PR Holders are Hydras (for their shared DB of PR).

So what exactly is the perform operation? Is it only a FindProviders? And it may get more than 20 peers responding with the PR, because some peers on the path to the CID would be Hydra nodes?
As the number of hops in a DHT lookup is usually 3-5, we would expect at MOST 23-25 peers responding with a PR, if all of the peers helping to route the request (NOT PR holders) are Hydra nodes. According to the plot in 4.2 there are regularly much more than this number. How do you explain this?
Or maybe I missed something here ^^

So what exactly is the perform operation? Is it only a FindProviders?

Yes, it's a modification of the FindProviders() method that doesn't look in the local Provider DB of the host, and that directly performs the DHT lookup.

And it may get more than 20 peers responding with the PR, because some peers on the path to the CID would be Hydra nodes?

Exactly, that is the explanation that I gave for this phenomenon.

As the number of hops in a DHT lookup is usually 3-5, we would expect at MOST 23-25 peers responding with a PR

Can you give a bit more context on this statement? My understanding from RFM 17 is that we perform between 3 and 6 hops, however, that only determines the depth of the peer tree that is built during the lookup. We are not taking into account that the tree can also grow in width.

Can you give a bit more context on this statement? My understanding from RFM 17 is that we perform between 3 and 6 hops, however, that only determines the depth of the peer tree that is built during the lookup. We are not taking into account that the tree can also grow in width.

In Figure 3, we see that up to 60 peers respond with the PR during the DHT lookup. There are only 20 PR holders, and 2-5 intermediary DHT server nodes to which we send the request (2-5 as the last hop is a PR holder). How can we get responses from 60 peers?

In the case where we would expect the most answers, we would have the 20 PR holders + 5 intermediary nodes that are all Hydras, which is far from 60. Even if we add the concurrency factor $\alpha=3$, and suppose that the requests to the DHT intermediary nodes are performed exactly at the same time, to 15 Hydra nodes (5 hops * $\alpha$), + 20 PR holders, this only makes 45 answers in this very specific corner case.

This is an interesting point worth digging into, but want to understand a detail:

However, since the DHT lookup wasn't stopped after the first retrieval of the PRs

@cortze how does the operation of the Hoarder differ compared to the vanilla version? When it gets a response with a PR, it doesn't stop and keep looking, but up to which point? And when does it stop?

@yiannisbot The FindProviders() that I use in the hoarder slightly differs from the vanilla operation:
It removes the "Find in the ProvidersStore" operation, forcing it to look for the Providers only using the vanilla DHT lookup, and adds some traces to track when we receive a new Provider.

I've been relaunching the test with a two-minute timeout for the FindProviders operation, and the results seem to be in the range that @guillaumemichel suggests (keep in mind that the Hydras' DB has been plugged off).

The number of remote peers replying with the PR during the DHT lookup (with a 2-minute timeout) looks like this.

yiannisbot · 2022-12-09T07:39:40Z

I set up another Hoarder run with 20k CIDs for 60 hours, so the plots and some numbers might change.

@cortze do we have any results from this experiment? I think with these results and addressing Guillaume's question, this should be ready to be merged, right?

results/rfm17.1-sharing-prs-with-multiaddresses.md

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

cortze · 2023-01-18T09:52:44Z

@cortze do we have any results from this experiment? I think with these results and addressing Guillaume's question, this should be ready to be merged, right?

@yiannisbot The results of this run were not as good as I expected. To track such a large set of CIDs, I had to increase the concurrency parameters of the hoarder, and as we spotted in our last meeting (link to the Issue describing the bottleneck) the code is not that prepared to support such a high degree of concurrency.

However, I think that even with such a low number of CIDs and a lower ping-interval between pings (3 minutes), we can conclude that increasing provider Multiaddress' TTL would improve content fetching times. And the impact would be much higher if we merge it with go-libp2p-kad-dht#802.

RFM17 already proved that the IP rotation of PRHolders barely happens:

cortze · 2023-01-18T16:00:41Z

@yiannisbot I've updated the document with your suggestions and with two extra paragraphs describing:

The observed IP-churn of DHT servers in IPFS (from the RFM-17)
Contribution section, where I aggregated all the pull requests related to this RFM

I've also updated the figures. The new ones have the DHT lookup limited to 2 mins - which shows a reasonable number of peers that return the PRs as pointed out by @guillaumemichel .

The new data still faces a lower number of online PR Holders due to a problem storing the records in a part of the network. However, I consider them more than good enough to conclude that increasing the TTL of the Provider's Multiaddres would avoid the second DHT lookup to map the PeerID of the Provider with its Multiaddres.

Let me know what do you think about the update :) Cheers!

yiannisbot

Great work! 👏🏼 I hope the suggested changes land in production soon. Thanks for making the final touches.

cortze added 9 commits November 10, 2022 20:27

first draft of the RFM17.1 proposal

5c8b5bb

create folder for rmf17.1 + add plots

d1fba2d

create draft for the rfm17.1 report

dba8805

Add link to report

c205989

check spelling and grammar

6fc3125

update report link + text mods

33301c2

update date and text mods

38a4f3e

add rfm17.1 as completed

dff7050

update pictures

9694e7f

yiannisbot requested changes Nov 15, 2022

View reviewed changes

Apply suggestions from Yiannis' review

6f7181e

Typos and rephrasings Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

apply suggestions from reviewer

988f65c

expand dht lookup section

9316421

yiannisbot reviewed Nov 22, 2022

View reviewed changes

yiannisbot requested changes Nov 22, 2022

View reviewed changes

Apply suggestions from code review

cdaf249

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

guillaumemichel reviewed Nov 22, 2022

View reviewed changes

dennis-tra mentioned this pull request Nov 22, 2022

Increase provider Multiaddress TTL libp2p/go-libp2p-kad-dht#795

Merged

cortze mentioned this pull request Dec 7, 2022

Adjust PeerSet logic in the DHT lookup process libp2p/go-libp2p-kad-dht#802

Merged

yiannisbot reviewed Dec 9, 2022

View reviewed changes

results/rfm17.1-sharing-prs-with-multiaddresses.md Outdated Show resolved Hide resolved

Update results/rfm17.1-sharing-prs-with-multiaddresses.md

9d39827

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

cortze mentioned this pull request Jan 18, 2023

MessageSender limits concurrent RPCs to remote peer by only keeping one single stream per peer libp2p/go-libp2p-kad-dht#805

Open

update rfm-17.1 figures

3cbc50f

apply suggestions + add IP-churn & contributions paragraphs

a35c539

Merge branch 'master' into rfm-17.1

607142d

yiannisbot approved these changes Jan 25, 2023

View reviewed changes

yiannisbot merged commit 956a3bb into probe-lab:master Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rfm 17.1 - Sharing Provider Records with Multiaddress #22

Rfm 17.1 - Sharing Provider Records with Multiaddress #22

cortze commented Nov 11, 2022

yiannisbot left a comment

yiannisbot Nov 15, 2022

yiannisbot Nov 15, 2022

cortze Nov 15, 2022

yiannisbot Nov 15, 2022

cortze Nov 15, 2022 •

edited

yiannisbot Nov 22, 2022

yiannisbot Nov 15, 2022

yiannisbot Nov 15, 2022

cortze Nov 15, 2022

yiannisbot Nov 22, 2022

yiannisbot commented Nov 15, 2022

cortze commented Nov 15, 2022

dennis-tra commented Nov 21, 2022

cortze commented Nov 21, 2022

yiannisbot Nov 22, 2022

yiannisbot Nov 22, 2022

yiannisbot left a comment

yiannisbot commented Nov 22, 2022

guillaumemichel Nov 22, 2022

cortze Nov 22, 2022

guillaumemichel Nov 22, 2022

cortze Nov 22, 2022

guillaumemichel Nov 22, 2022

yiannisbot Dec 9, 2022

cortze Jan 18, 2023

yiannisbot commented Dec 9, 2022

cortze commented Jan 18, 2023

cortze commented Jan 18, 2023

yiannisbot left a comment


		Results are similar when we analyze the replies of the peers that report back the PR from the DHT lookup process. We increased the number of content providers we were looking for to track the multiple remote peers. Figure [3] represents the number of remote peers reporting the PR for the CIDs we were looking for, where we can see a stable 20 peers by median over the entire study.

		For those wondering why more than 20 peers (k replication value when publishing the CIDs) are reporting the PR, we must remind you that Hydra-Boosters share the PR database among different `PeerID` heads. Which means that if one hydra hears about a PR, all the leaders of that common database will also share it.


		_Figure 2: Number of PR Holders replying with the `PeerID` + `Multiaddress` combo._

		### 4.2-Reply of peers reporting the PR during the DHT lookup

Rfm 17.1 - Sharing Provider Records with Multiaddress #22

Rfm 17.1 - Sharing Provider Records with Multiaddress #22

Conversation

cortze commented Nov 11, 2022

yiannisbot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cortze Nov 15, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiannisbot commented Nov 15, 2022

cortze commented Nov 15, 2022

dennis-tra commented Nov 21, 2022

cortze commented Nov 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiannisbot left a comment

Choose a reason for hiding this comment

yiannisbot commented Nov 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiannisbot commented Dec 9, 2022

cortze commented Jan 18, 2023

cortze commented Jan 18, 2023

yiannisbot left a comment

Choose a reason for hiding this comment

cortze Nov 15, 2022 •

edited