[WIP] Work towards improving the duplicate blocks issue #8

whyrusleeping · 2018-08-30T04:48:23Z

Bitswap currently has a bad problem of getting duplicate blocks. That is, if we ask for X, and more than one of our peers has X, we will probably get X from each of those peers. Wasting bandwidth.

In this branch, we will attempt to fix the problem (or at least, improve it).

The first thing i've added here is a test that shows the problem. From there, we can try and get the duplication factor as low as possible.

whyrusleeping · 2018-08-30T07:18:55Z

Alright, got a few tests (more definitely needed) that show various fetching scenarios pretty well.

I've also got a 'first candidate solution' that improves things pretty nicely in some cases, but in some cases it increases latency significantly. Before we try to micro-optimize to fix that latency bump, i want to get more test scenarios that also have different latencies.

whyrusleeping · 2018-08-30T07:21:29Z

Fun things to try:

changing the activeWantsLimit
changing the number of workers
changing the baseTickDelay
changing the provSearchDelay

I'm also curious to try out the 'multi-block-bitswap' branch from @taylormike

whyrusleeping · 2018-08-31T01:15:54Z

The first (naive) fix i've written here just splits up the wants amongst the peers in the 'active' set, which is the set of peers we've received data from in this session. This does a decent job of load balancing, but it would be better to track which peers are faster, and weight requests to them higher.

The code also currently will send out wants to all peers until we get some peers in the 'active' set. This is problematic when we request a whole bunch of highly available data all at once, as all the wants will go out before we are able to filter anything. We can probably have a limit on the number of live wants a session can have before getting any active peer information.

whyrusleeping · 2018-08-31T17:18:26Z

@Stebalien wanna take a look at the set of test scenarios here? I could use some help brainstorming more

whyrusleeping · 2018-09-03T05:17:08Z

I'm now looking into what happens when the blocks we are fetching arent clustered onto the same peers (i.e. if one peer has the first item, they don't necessarily have the next). This is currently a pretty slow usecase, with fetching 100 blocks, randomly interspersed across a set of ten peers taking ~50 seconds.

My current idea is to keep track of the number of blocks we receive, and the number of 'broadcasts' we send out, and if that ratio is too low, bump up the number of peers we sent each want to until that ratio gets better. A really simple approach brought the time for this scenario down from 50 seconds to 30 seconds, which is pretty nice. Next i'm gonna go look into how this affects more optimal usecases (it shouldnt actually affect them at all, because the ratio should never drop that low)

whyrusleeping · 2018-09-03T05:45:02Z

discovering something mildly annoying... it seems the requests are falling back to broadcast even in cases where they really shouldnt (like when everyone has every block). Gonna look into this...

whyrusleeping · 2018-09-03T05:51:12Z

meh, was a bug in my new code. Still seeing an unfortunate number of broadcasts, but its more reasonable now.

Stebalien

Couple of nits while familiarizing myself with the code.

Stebalien · 2018-09-03T15:34:02Z

network/ipfs_impl.go

@@ -47,6 +48,8 @@ type impl struct {

 	// inbound messages from the network are forwarded to the receiver
 	receiver Receiver
+
+	stats NetworkStats


I believe this needs to go at the top of the struct as it needs to be 64-bit aligned:

The first word in a variable or in an allocated struct, array, or slice can be relied upon to be 64-bit aligned.

The compiler should really catch that for us...

It should, but it doesn't...

Stebalien · 2018-09-03T16:53:37Z

session.go

@@ -153,6 +174,10 @@ func (s *Session) interestedIn(c *cid.Cid) bool {
 const provSearchDelay = time.Second * 10

 func (s *Session) addActivePeer(p peer.ID) {
+	if s.liveWantsLimit == broadcastLiveWantsLimit {
+		s.liveWantsLimit = targetedLiveWantsLimit
+	}


Not a fan of using "constant" variables like this... how about a wantBudget() function (or something like that) that returns len(s.liveWants) - ((len(s.activePeers) > 0) ? targetedLiveWantsLimit : broadcastLiveWantsLimit).

yeah, totally.

Stebalien · 2018-09-03T20:36:07Z

Summarizing our offline discussion, I'm not sure if the dupl variable is really the right approach. I'm expecting nodes to have merkle-chains from some common root to (reasonably) complete subdags (i.e., they'll traverse a path and then read out an entire subdag (maybe up to some depth)).

With the dupl approach:

If peers have random blocks, dupl should be stable.
If peers have all the blocks, dupl should stay ~1.
If peers have blocks according to my diagram above, dupl should grow quite large (and fluctuate) as we randomly query peers from a set where some peers simply won't have anything we're looking for.

Without remembering anything about the DAG structure, about the only solution I can think of is to somehow split peers into "active" and "inactive" groups. That is, if a peer frequently doesn't have blocks we're looking for, we drop them to the inactive group; if we start running low on active peers, we start pulling in inactive peers.

One way to do this is to simply sort peers by some frecency (frequent/recency) metric that tracks how reliably they deliver blocks. We can then ask for blocks from the top N (where N can float based on how many duplicate blocks we appear to be receiving).

This should work fine for, e.g., sequential access. However, it won't work well with, e.g., ipfs get as that does random access.

Ideally, we'd use the DAG structure. Looking at this structure, I'd predict that a peer is more likely to have a node if we know they have either sibling or uncle nodes and less likely if we know they don't have sibling nodes.

If a peer has many siblings, they probably performed some form of list, cat, etc. operation.
If we know a peer is missing siblings, they probably performed a traversal through the DAG. Generally, I'd treat this as stronger evidence than (1). It's possible to have a few siblings by traversing a few paths through some node but it's unlikely to be missing siblings after reading/listing an entire file/directory.
If we know a peer has uncles to the node in question, but have little information on the siblings, we can infer that they're likely to have the entire tree. IMO, uncles give the least information but it's better than nothing.

In some cases, we could probably also learn from cousins but I'm not sure if there's anything we can learn from cousins that we can't learn from uncles without assuming a balanced tree.

So, without changing the interfaces/protocol, we can probably keep a small, in-memory, shared table (cache?) mapping cids we may want (children of nodes we've fetched) to peers that likely have them (not sure how we'll keep track/update how likely they are to have them).

Given this information, I'd:

Sort all of our peers (not just those in the session) by how likely they are to have the content we're looking for.
Take the top non-overloaded (we should really be tracking load per-peer) until we hit some threshold probability of receiving the block on the first try.
If we run out of peers to ask, we can then fall back on broadcasting to our session.

Basically, we'd:

Use the prediction table when we have enough information to predict which peers likely have which blocks based on the DAG structure.
Fall back on the session to try to find peers when the DAG structure doesn't give us enough information.

Stebalien · 2018-09-03T21:21:49Z

dup_blocks_test.go

+	})
+	t.Run("10Nodes-OnePeerPerBlock-UnixfsFetch", func(t *testing.T) {
+		subtestDistributeAndFetch(t, 10, 100, onePeerPerBlock, unixfsFileFetch)
+	})


We should probably have a test that looks more like reading entire files out of a large directory structure. I wonder if we should try building an actual dag.

whyrusleeping · 2018-09-07T20:06:44Z

Heres the results so far:

Test Run	Dups	MsgRecv	MsgSent	Time
10Nodes-AllToAll-AllConcurrent	1	788	888	1004	588738086
10Nodes-AllToAll-AllConcurrent	2	212	312	508	583817218
10Nodes-AllToAll-BatchFetchBy10	1	697	798	650	318375172
10Nodes-AllToAll-BatchFetchBy10	2	32	132	203	308959055
10Nodes-AllToAll-BigBatch	1	741	841	746	593989359
10Nodes-AllToAll-BigBatch	2	32	132	211	576777696
10Nodes-AllToAll-OneAtATime	1	791	891	1545	2317195336
10Nodes-AllToAll-OneAtATime	2	8	108	216	2253939404
10Nodes-AllToAll-UnixfsFetch	1	729	830	654	156870013
10Nodes-AllToAll-UnixfsFetch	2	8	108	167	142264026
10Nodes-OnePeerPerBlock-BigBatch	1	0	100	691	2566468029
10Nodes-OnePeerPerBlock-BigBatch	2	0	100	818	13188862723
10Nodes-OnePeerPerBlock-OneAtATime	1	0	100	1697	6751956059
10Nodes-OnePeerPerBlock-OneAtATime	2	0	100	1604	8987871527
10Nodes-OnePeerPerBlock-UnixfsFetch	1	0	100	483	1334532195
10Nodes-OnePeerPerBlock-UnixfsFetch	2	0	100	646	2486414479
AllToAll-BigBatch	1	96	196	167	583172799
AllToAll-BigBatch	2	4	104	125	576459570
AllToAll-OneAtATime	1	100	200	380	2297902493
AllToAll-OneAtATime	2	1	101	202	2277471653
Overlap1-OneAtATime	1	0	100	245	2835544709
Overlap1-OneAtATime	2	0	100	229	10793166952
Overlap2-BatchBy10	1	33	133	161	883447696
Overlap2-BatchBy10	2	2	102	124	3226089490
Overlap3-AllConcurrent	1	33	133	183	1343921547
Overlap3-AllConcurrent	2	32	132	209	1683851304
Overlap3-BatchBy10	1	33	133	142	867415276
Overlap3-BatchBy10	2	2	102	121	3815798655
Overlap3-BigBatch	1	34	134	178	1339814193
Overlap3-BigBatch	2	2	102	160	6131580369
Overlap3-OneAtATime	1	34	134	381	2274897167
Overlap3-OneAtATime	2	33	133	396	4503270140
Overlap3-UnixfsFetch	1	31	131	170	724474001
Overlap3-UnixfsFetch	2	1	101	120	1318204376

whyrusleeping · 2018-09-07T20:08:23Z

Several of the tests are slower now, which is mildly disconcerting. But they have lower duplicate blocks and message counts across the board, which is a win in my book.

Stebalien · 2018-09-07T23:02:16Z

@whyrusleeping we should test this against a gx install (leaving this here so we don't forget, I'll try to do this as soon as I get a non-mobile internet connection).

bonekill · 2018-09-12T06:05:52Z

Not sure if this would work or not, ignore it if it is silly.

Would sorting peers into latency groups then altering the behavior for low latency peers (~<20ms) to a 'Do you have?' vs a 'Send if you have!' strategy work?
The performance impact of an extra round trip with the low latency peers should be manageable while offering a much larger degree of control over incoming data.
If you get positive hits back on the query then you can just request the block from one of the respondents. If you don't get a response back or the low latency group is to small to justify the extra round trip continue with other strategies.

While this does not solve the problem it might help mitigate it.

whyrusleeping · 2018-09-12T19:49:21Z

@bonekill Yeah, the idea of having a 'do you have' protocol extension is interesting. There is also the question of 'will you send this to me?' which i think is the more important one. Note that this is only really important in the case where peers dont have the data. In the optimal case, with many peers who all have the data, there is no reason to add that extra overhead.

bonekill · 2018-09-13T00:14:12Z

Also back to the original idea you are working on, you could utilize the incoming requests as well to build the probability of nodes having data.
Ex. If a peer A recently requested a block (or related block) that you did not have at the time but now want, it likely that peer A now possesses the data.
This helps you build the 'active' set or a probability model passively without having to make requests of your own, given that the node has a chance to listen for a bit that is.

whyrusleeping · 2018-09-25T23:00:15Z

@bonekill Yeah, @Stebalien and I were chatting about having a generalized 'who might have X' subsystem that could be informed by many different inputs. I think that will help significantly moving forward.

In any case, I'd like to get this PR merged roughly as is. @Stebalien do you think you could give me a hand?

Stebalien · 2018-10-17T14:50:24Z

If this is the correct way to use a session, I've tried this patch when listing wikipedia on IPFS and, unfortunately, it still appeared to get 1:1 duplicate blocks and have a 2/1 upload to download speed.

We should do some more through tests to figure out what's actually going on. @hannahhoward?

Stebalien · 2018-10-23T04:28:38Z

Still not seeing any improvement.

schomatis · 2018-10-23T13:57:07Z

@schomatis have you for sure seen the PR get used linked to the main repo?

Yes. Besides updating go-ipfs-blocksutil in the main go-ipfs repo you'll also need to update it in the go-ipfs-exchange-offline repo you're using (linked or not). To be sure go to go-ipfs and run gx deps --tree and check if somebody else is using QmUf9x (the old 0.1.0 version).

Stebalien · 2018-10-23T15:02:14Z

Blocked on passing the new tests. The answer is probably to just skip them or relax them but it still needs to pass them.

Stebalien · 2018-10-23T16:08:01Z

So, I finally got this to work by just bubbling the changes (hopefully correctly, I didn't get any dupes). Unfortunately, the results are not encouraging. While pinning wikipedia:

Initially, I noticed that the upload bandwidth was significantly reduced.
After a few seconds, download bandwidth drops to match upload bandwidth.
I've seen no improvements in the number of duplicate blocks received (actually, it looked slightly worse but was still about 1:1 in all tests)

As for bandwidth, I tested 4 variants:

Current: ipfs master
Current + Large Wantlist: increase wantlist to 64, parallel fetch concurrency to 32
Patch: this patch as-is.
Patch + Large Wantlist: increase broadcast size to 16, targeted want limit to 64, parallel fetch concurrency to 32.

Current: ~217KiB/s
Current + Large Wantlist: ~272/s (starts much higher and drops off, probably due to upload)
Patch: ~47KiB/s
Patch + Large Wantlist: ~93KiB/s

(all of these start off slightly (except 2 which starts off significantly) higher and then drop down)

So, maybe I've just been testing this wrong but, at the very minimum, we need decent benchmarks showing some improvement before merging this.

hannahhoward · 2018-10-23T16:21:10Z

@Stebalien

I tend to agree. I think the first priority is real world benchmarks, so we have a more systematic way to test whether we're making a real world difference.

On the question of merging -- I am also skeptical of a large change that add complexity with a seemingly negative affect on transfer speed. I think once we have real world benchmarks, it's worth revisiting this PR to see if there's something we're missing, cause maybe there is just a small change needed to unlock the improvements.

I'm going to start looking at real world benchmark tests today. I'd recommend we either close this PR (it's still archived and can be re-opened later) or add a WIP label to indicate it needs further work.

Also the tests in this PR are valuable on their own (I think) and I wonder if they should be split off.

Stebalien · 2018-10-23T17:33:33Z

I'm going to start looking at real world benchmark tests today. I'd recommend we either close this PR (it's still archived and can be re-opened later) or add a WIP label to indicate it needs further work.

I'll add a WIP.

Also the tests in this PR are valuable on their own (I think) and I wonder if they should be split off.

I agree. Are you volunteering 😄?

schomatis · 2018-10-23T17:47:49Z

3. I've seen no improvements in the number of duplicate blocks received (actually, it looked slightly worse but was still about 1:1 in all tests)

Yes, what I meant with my previous (not very clear) comment was that the broadcast system (through the liveWants slice) seems to still largely dominate the requests made in the bitswap session. This patch just avoids sending the new wanted blocks directly to the broadcast want list (bcwl) but those blocks will end up there anyway every few seconds,

go-bitswap/session.go

Lines 209 to 218 in a2ce7a2

    
           case <-s.tick.C: 
        
           	live := make([]cid.Cid, 0, len(s.liveWants)) 
        
           	now := time.Now() 
        
           	for c := range s.liveWants { 
        
           		live = append(live, c) 
        
           		s.liveWants[c] = now 
        
           	} 
        
           	// Broadcast these keys to everyone we're connected to 
        
           	s.bs.wm.WantBlocks(ctx, live, nil, s.id)

How long that time is will depend on how fast new (requested) blocks arrive, and that in turn depends heavily only on the broadcast requests, since the targeted requests to specific peers through the duplicate block system manages absolute and relatively small values of 3-10 while the broadcast is relative to the number of peers, an order of magnitude (or more) than the duplicate blocks requests.

This is not reflected in the local tests because with the fake 10 ms delay in the VirtualNetwork the broadcast system doesn't have time to kick in, as a simple example, reducing the broadcast timer to (also a fake and impractical but) comparable 10 ms delay the duplicate blocks greatly increase,

go-bitswap/session.go

Lines 164 to 171 in a2ce7a2

    
           func (s *Session) resetTick() { 
        
           	if s.latTotal == 0 { 
        
           		s.tick.Reset(provSearchDelay) 
        
           	} else { 
        
           		avLat := s.latTotal / time.Duration(s.fetchcnt) 
        
           		s.tick.Reset(s.baseTickDelay + (3 * avLat)) 
        
           	} 
        
           }

I think that more insightful statistics (beyond the useful duplicate block count) will be needed to understand the interaction between the two request systems beyond black box testing dl/ul bandwidth.

whyrusleeping · 2018-10-25T03:08:55Z

@schomatis Yeah, we should try ramping up the baseTickDelay and provSearchDelay values. We should also set up a more 'real world' test case, and try to get some real numbers and reproducibility out of them.

- add a delay generator that similates real world latencies one might encounter on the internet - modify virtual network to accept different latencies for different peers based on using NextWaitTime on passed delay - modify dup_blocks_test subtestDistributeAndFetch to accept a custom delay - Add a real world benchmarks that simulates the kinds of problems one might encounter bitswaping with a long lived session and a large swarm of peers with real world latency distribution (that causes #8 not to function well in practice)

- add a delay generator that similates real world latencies one might encounter on the internet - modify virtual network to accept different latencies for different peers based on using NextWaitTime on passed delay - modify dup_blocks_test subtestDistributeAndFetch to accept a custom delay - Add a real world benchmarks that simulates the kinds of problems one might encounter bitswaping with a long lived session and a large swarm of peers with real world latency distributions (that causes #8 not to function well in practice)

eingenito · 2018-12-11T23:54:34Z

I'm going to close this as it has been incorporated into @hannahhoward's work in #26 #28 #29 #30 and is being tracked in #27

Of course feel free to reopen if I got that wrong.

ghost assigned whyrusleeping Aug 30, 2018

ghost added the status/in-progress In progress label Aug 30, 2018

whyrusleeping force-pushed the feat/reduce-dupes branch 2 times, most recently from 287e22f to ff672d6 Compare August 30, 2018 07:14

whyrusleeping force-pushed the feat/reduce-dupes branch from ff672d6 to 5bf791a Compare August 31, 2018 01:31

whyrusleeping force-pushed the feat/reduce-dupes branch from 8e5d6a9 to e70588a Compare September 3, 2018 05:52

Stebalien reviewed Sep 3, 2018

View reviewed changes

whyrusleeping force-pushed the feat/reduce-dupes branch 2 times, most recently from 08464d3 to 5aa8341 Compare September 17, 2018 22:36

ivan386 mentioned this pull request Sep 21, 2018

Bitswap Sessions ipfs/kubo#3786

Closed

Stebalien force-pushed the feat/reduce-dupes branch from 7f5f2f4 to 84a7035 Compare October 8, 2018 17:04

ghost assigned Stebalien Oct 8, 2018

eingenito mentioned this pull request Oct 16, 2018

Predict peers most likely to have blocks #14

Closed

whyrusleeping added 7 commits October 22, 2018 21:17

More test scenarios!

54bce93

reorg in dup tests

60e7992

improvement attempt number 1

832b3a3

add statistics for network messages sent/recvd

1ddefae

WIP: working on dupl factor scaling

ea5e626

move incrememnting of dupl factor, write out benchmark results

6d0f460

fix session exchange interface implementation

4447483

avoid using constants as both sentinel values and values.

5d4e4bb

Stebalien force-pushed the feat/reduce-dupes branch from 84a7035 to 5d4e4bb Compare October 23, 2018 15:01

Stebalien changed the title ~~Work towards improving the duplicate blocks issue~~ [WIP] Work towards improving the duplicate blocks issue Oct 23, 2018

hannahhoward mentioned this pull request Oct 25, 2018

Are Bitswap Sessions actually being used? ipfs/kubo#5640

Closed

hannahhoward mentioned this pull request Oct 29, 2018

Feat/bitswap pr improvements #19

Merged

schomatis mentioned this pull request Nov 2, 2018

[Not a real benchmark] 0.4.17 vs 0.4.18: File transfer over LAN ipfs/kubo#5715

Closed

hannahhoward mentioned this pull request Nov 2, 2018

Bitswap Improvement Plan ipfs/kubo#5723

Closed

9 tasks

hannahhoward mentioned this pull request Nov 14, 2018

feat(Benchmarks): Add real world dup blocks test #25

Merged

eingenito closed this Dec 11, 2018

ghost removed the status/in-progress In progress label Dec 11, 2018

hannahhoward mentioned this pull request Jan 27, 2023

Initial Version ipfs/lightning-storm#1

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Work towards improving the duplicate blocks issue #8

[WIP] Work towards improving the duplicate blocks issue #8

whyrusleeping commented Aug 30, 2018

whyrusleeping commented Aug 30, 2018

whyrusleeping commented Aug 30, 2018

whyrusleeping commented Aug 31, 2018

whyrusleeping commented Aug 31, 2018

whyrusleeping commented Sep 3, 2018

whyrusleeping commented Sep 3, 2018

whyrusleeping commented Sep 3, 2018

Stebalien left a comment

Stebalien Sep 3, 2018

whyrusleeping Sep 3, 2018

Stebalien Sep 3, 2018

Stebalien Sep 3, 2018

whyrusleeping Sep 3, 2018

Stebalien commented Sep 3, 2018

Stebalien Sep 3, 2018

whyrusleeping commented Sep 7, 2018 •

edited

Loading

whyrusleeping commented Sep 7, 2018

Stebalien commented Sep 7, 2018

bonekill commented Sep 12, 2018

whyrusleeping commented Sep 12, 2018

bonekill commented Sep 13, 2018

whyrusleeping commented Sep 25, 2018

Stebalien commented Oct 17, 2018

Stebalien commented Oct 23, 2018

schomatis commented Oct 23, 2018

Stebalien commented Oct 23, 2018

Stebalien commented Oct 23, 2018

hannahhoward commented Oct 23, 2018

Stebalien commented Oct 23, 2018

schomatis commented Oct 23, 2018

whyrusleeping commented Oct 25, 2018

eingenito commented Dec 11, 2018 •

edited

Loading

[WIP] Work towards improving the duplicate blocks issue #8

[WIP] Work towards improving the duplicate blocks issue #8

Conversation

whyrusleeping commented Aug 30, 2018

whyrusleeping commented Aug 30, 2018

whyrusleeping commented Aug 30, 2018

whyrusleeping commented Aug 31, 2018

whyrusleeping commented Aug 31, 2018

whyrusleeping commented Sep 3, 2018

whyrusleeping commented Sep 3, 2018

whyrusleeping commented Sep 3, 2018

Stebalien left a comment

Choose a reason for hiding this comment

Stebalien Sep 3, 2018

Choose a reason for hiding this comment

whyrusleeping Sep 3, 2018

Choose a reason for hiding this comment

Stebalien Sep 3, 2018

Choose a reason for hiding this comment

Stebalien Sep 3, 2018

Choose a reason for hiding this comment

whyrusleeping Sep 3, 2018

Choose a reason for hiding this comment

Stebalien commented Sep 3, 2018

Stebalien Sep 3, 2018

Choose a reason for hiding this comment

whyrusleeping commented Sep 7, 2018 • edited Loading

whyrusleeping commented Sep 7, 2018

Stebalien commented Sep 7, 2018

bonekill commented Sep 12, 2018

whyrusleeping commented Sep 12, 2018

bonekill commented Sep 13, 2018

whyrusleeping commented Sep 25, 2018

Stebalien commented Oct 17, 2018

Stebalien commented Oct 23, 2018

schomatis commented Oct 23, 2018

Stebalien commented Oct 23, 2018

Stebalien commented Oct 23, 2018

hannahhoward commented Oct 23, 2018

Stebalien commented Oct 23, 2018

schomatis commented Oct 23, 2018

whyrusleeping commented Oct 25, 2018

eingenito commented Dec 11, 2018 • edited Loading

whyrusleeping commented Sep 7, 2018 •

edited

Loading

eingenito commented Dec 11, 2018 •

edited

Loading