Implement Accelerated DHT #45

dennis-tra · 2023-10-03T14:53:42Z

Context: #7

fullrt.go

dennis-tra · 2023-10-17T15:38:32Z

fullrt_test.go

@@ -0,0 +1,15 @@
+package zikade
+
+//func TestNewFullRT(t *testing.T) {


write at least one test for FullRT

internal/coord/brdcst/brdcst.go

internal/coord/query.go

internal/coord/query/query.go

internal/coord/routing.go

internal/coord/routing/crawl.go

iand · 2023-10-18T12:00:35Z

fullrt.go

+	"github.com/plprobelab/zikade/pb"
+)
+
+type FullRT struct {


It doesn't make much sense to me to call this FullRT. It seems to just be following the same naming as the go-libp2p-kad-dht implementation.

It seems to me that this is really a specialised routing table population strategy for the DHT. Can we make it an option on the normal DHT type?

Yeah, I also wasn't sure about that. Though it's more than just a specialized routing table population strategy. The routing.Routing implementation also behaves quite differently.

If we put an option on the DHT type we'd need to branch into either the default routing.Routing implementation or the fullRT routing.Routing implementation which I think is not super elegant. I don't have a better idea though :/

iand · 2023-10-18T12:35:31Z

internal/coord/routing/crawl.go

+			}
+
+			for j := 0; j < c.cfg.MaxCPL; j++ {
+				target, err := c.cplFn(node.Key(), j)


Rename this to targetNode since job.target is a key

iand · 2023-10-18T14:17:16Z

internal/coord/routing/crawl.go

+		return &StateCrawlIdle{}
+	}
+
+	if len(c.info.waiting) >= c.cfg.MaxCPL*c.cfg.Concurrency {


Suggested change

if len(c.info.waiting) >= c.cfg.MaxCPL*c.cfg.Concurrency {

if len(c.info.waiting) >= c.cfg.Concurrency {

Concurrency is the maximum number of concurrent requests, but the original code is sending 16 times as many

Sending 16 requests is intended because each request contains (should contain) a different target key for which we want to know the 20 closest nodes that the other peer knows. This is the strategy for effectively fetching the entire routing table of a remote peer.

Concurrency should be the maximum number of in-flight requests. The 16 is irrelevant here, since we are checking how many are currently in-flight. As it stands if the user specifies concurrency of 200 then they will actually end up with 3200 concurrent requests.

iand · 2023-10-18T14:20:46Z

internal/coord/routing/crawl.go

+	}
+
+	span.SetAttributes(
+		attribute.Int(prefix+"_todo", len(c.info.todo)),


Report these as metric gauges too

iand · 2023-10-18T14:21:47Z

internal/coord/routing/crawl.go

+	}
+
+	// clear info to indicate that we're idle
+	c.info = nil


Add a metrics gauge that is 1 when the crawl is running and 0 otherwise

iand · 2023-10-18T14:24:04Z

internal/coord/routing/crawl.go

+	c.info = nil
+
+	return &StateCrawlFinished[K, N]{
+		Nodes: nodes,


This could contain thousands of nodes, do we need to return this? Could include stats instead: number found, number of errors etc.

iand · 2023-10-18T14:26:42Z

internal/coord/routing/crawl.go

+	cpls    map[string]int
+	waiting map[string]N
+	success map[string]N
+	failed  map[string]N


We don't used the actual list of failures or errors. It would be less memory to keep a counts instead. If we don't return the (potentially very large) list of successful nodes then we could keep a count instead too

iand · 2023-10-18T14:28:37Z

internal/coord/routing/crawl.go

+				}
+
+				newJob := crawlJob[K, N]{
+					node:   node,


What prevents us from crawling a node multiple times for the same target? Nodes A and B could both return node C in their list of nodes closer to target T.

Ah, I see that's what cpls map is for. Can you add comments on the fields?

Actually, thinking about it, couldn't we end up asking the same node for different targets with the same CPL? The CPL function returns a random key in with the given CPL so a node could be asked to crawl the same CPL more than once with different keys

I think you're right 🤔 Let me write a test for it and assert exactly that 👍

dennis-tra requested review from guillaumemichel and iand as code owners October 3, 2023 14:53

dennis-tra mentioned this pull request Oct 3, 2023

WIP: Accelerated DHT (crawling functionality) libp2p/go-libp2p-kad-dht#951

Closed

dennis-tra force-pushed the v2-issue-7-accelerated-dht branch from b55f2db to de85493 Compare October 5, 2023 15:17

iand assigned dennis-tra Oct 6, 2023

dennis-tra force-pushed the v2-issue-7-accelerated-dht branch 3 times, most recently from b9d84a9 to 57d2949 Compare October 12, 2023 08:32

dennis-tra added 10 commits October 13, 2023 09:37

WIP

0b618bc

WIP

3dc3d73

improve testing

9adc0b0

improve crawler configuration

c7f2cf3

WIP

91b9d08

WIP

2a64316

WIP

0bd2b48

WIP

7a432c7

WIP

4acfc83

WIP

d99b889

dennis-tra force-pushed the v2-issue-7-accelerated-dht branch from 6a24dd1 to d99b889 Compare October 13, 2023 07:38

dennis-tra added 9 commits October 13, 2023 14:38

WIP

490e947

wip

69ce4f4

WIP

16a4aa6

WIP

7d2f3f4

WIP

7b270b5

Merge branch 'main' into v2-issue-7-accelerated-dht

c978c93

WIP

50a19a6

WIP

ea58ea1

fix test

6583292

dennis-tra commented Oct 17, 2023

View reviewed changes

WIP

1877f06

WIP

8173c35

dennis-tra mentioned this pull request Oct 18, 2023

Implement ProvideManyRouter interface #35

Open

WIP

e6118a3

dennis-tra force-pushed the v2-issue-7-accelerated-dht branch from 6d4043c to e6118a3 Compare October 18, 2023 09:02

WIP

d8a4546

iand reviewed Oct 18, 2023

View reviewed changes

Add logging to crawl and include state machines

7edf5d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Accelerated DHT #45

Implement Accelerated DHT #45

dennis-tra commented Oct 3, 2023

dennis-tra Oct 17, 2023

iand Oct 18, 2023

dennis-tra Oct 20, 2023

iand Oct 18, 2023

iand Oct 18, 2023

dennis-tra Oct 20, 2023 •

edited

Loading

iand Oct 20, 2023

iand Oct 18, 2023 •

edited

Loading

iand Oct 18, 2023

iand Oct 18, 2023

iand Oct 18, 2023

iand Oct 18, 2023

iand Oct 18, 2023

iand Oct 18, 2023

dennis-tra Oct 20, 2023

		@@ -0,0 +1,15 @@
		package zikade

		//func TestNewFullRT(t *testing.T) {

	if len(c.info.waiting) >= c.cfg.MaxCPL*c.cfg.Concurrency {
	if len(c.info.waiting) >= c.cfg.Concurrency {

Implement Accelerated DHT #45

Are you sure you want to change the base?

Implement Accelerated DHT #45

Conversation

dennis-tra commented Oct 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dennis-tra Oct 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iand Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dennis-tra Oct 20, 2023 •

edited

Loading

iand Oct 18, 2023 •

edited

Loading