calculate message latency #386

dirkmc · 2020-04-29T22:26:42Z

Fixes #385

Stebalien

I'd double-check for possible deadlocks, but otherwise LGTM (except a few nits).

bitswap.go

internal/messagequeue/donthavetimeoutmgr.go

internal/messagequeue/messagequeue.go

dirkmc · 2020-04-30T14:44:22Z

The request-response cases are:

broadcast want-have (send-dont-have = false)
- HAVE
- block (if the block is very small)
regular want-have (send-dont-have = true)
- HAVE
- DONT_HAVE
- block (if the block is very small)
want-block (send-dont-have = true)
- block
- DONT_HAVE

There are a few scenarios to address:

No response followed by late response

Peers running new Bitswap should respond immediately in all cases, except for broadcast want-have (send-dont-have = false) when the peer doesn't have the block. Peers running old Bitswap won't respond unless they have the block.

In either of these two cases it's possible that

the peer does not have the block, so it doesn't respond
the peer receives the block later
the peer sends HAVE / block

We can mitigate this scenario by ignoring outlier latencies.

DONT_HAVE then HAVE / block

It's possible that a peer sends DONT_HAVE and then subsequently sends HAVE / block. To address this I think we should clear out the sent-at time when we receive a response, so that subsequent responses will be ignored for the purpose of latency calculation.

Simulated DONT_HAVE

If a peer doesn't respond to want-block, either because it's running old Bitswap or because it's overloaded, the timeout will fire an event that simulates receiving DONT_HAVE. We want to ignore this simulated DONT_HAVE for the purposes of latency tracking, ~~so we should clear out the sent-at time before the event propagates upwards.~~ Edit: Actually the simulated DONT_HAVE bypasses the message received event, so it will be ignored for the purposes of latency tracking.

dirkmc · 2020-04-30T18:25:31Z

With regards to checking for deadlocking, the possible places for slowness / deadlock that I can see are:

Routing the received message notification through the PeerManager
We need to take the peer queue lock, which is already quite highly contended.
I considered keeping a separate "event listener" on the Network abstraction that the MessageQueue can subscribe to, but it seemed overly complex so I went with this simpler solution.
Informing the MessageQueue of incoming responses
Because the peer queue lock is highly contended I wanted to make sure this would not be blocking, so it sends the incoming response keys on a channel, and if the channel is full it just drops them (it's just used to approximate latency so we don't need to measure every single response)
Informing DontHaveTimeoutManager of latency calculation updates
We need to acquire a lock but the timeout checking thread in DontHaveTimeoutManager only wakes up at the approximate moment that the oldest want is expected to timeout, so I don't expect much contention
DontHaveTimeoutManager firing timeouts
This happens in a go-routine so I think we should be ok

@Stebalien any other places we should be thinking about?

Stebalien · 2020-05-01T09:37:32Z

internal/messagequeue/messagequeue.go

+
+		for _, e := range peerEntries[:sentPeerEntries] {
+			if e.Cid.Defined() { // Check if want was cancelled in the interim
+				mq.peerWants.SentAt(e.Cid, now)


What if this happens after we receive the block? Is that an issue?

It should be ok - when the Session receives a block it sends cancel, and cancel removes the want from the sent wantlist. SentAt() checks that the sent wantlist still contains the want before recording the time a response was received:

func (r *recallWantlist) SentAt(c cid.Cid, at time.Time) { // The want may have been cancelled in the interim if _, ok := r.sent.Contains(c); ok { if _, ok := r.sentAt[c]; !ok { r.sentAt[c] = at } } }

Sounds right.

Stebalien · 2020-05-01T09:42:48Z

It's possible that a peer sends DONT_HAVE and then subsequently sends HAVE / block. To address this I think we should clear out the sent-at time when we receive a response, so that subsequent responses will be ignored for the purpose of latency calculation.

👍

With regards to checking for deadlocking, the possible places for slowness / deadlock that I can see are:

I think we're fine, I just haven't thought through this fully.

calculate message latency This commit was moved from ipfs/go-bitswap@165b154

feat: calculate message latency

6763be8

Stebalien approved these changes Apr 30, 2020

View reviewed changes

bitswap.go Outdated Show resolved Hide resolved

internal/messagequeue/donthavetimeoutmgr.go Show resolved Hide resolved

internal/messagequeue/messagequeue.go Outdated Show resolved Hide resolved

dirkmc added 3 commits April 30, 2020 11:39

fix: simplify latency timing

5c215f4

fix: only record latency for first response per want

af8cba8

fix: discard outliers in latency calculation

a7c7865

dirkmc requested a review from Stebalien April 30, 2020 18:25

Stebalien approved these changes May 1, 2020

View reviewed changes

dirkmc added 2 commits May 1, 2020 11:04

docs: MessageQueue docs

f005819

test: fix flaky test TestSessionBetweenPeers

373033e

Stebalien merged commit 165b154 into master May 2, 2020

Stebalien mentioned this pull request May 2, 2020

use ewma for session latency tracker #340

Closed

Stebalien mentioned this pull request May 26, 2020

Release 0.6.0 ipfs/kubo#7347

Closed

77 tasks

Jorropo pushed a commit to Jorropo/go-libipfs that referenced this pull request Jan 26, 2023

Merge pull request ipfs/go-bitswap#386 from ipfs/feat/msg-latency

486c683

calculate message latency This commit was moved from ipfs/go-bitswap@165b154

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calculate message latency #386

calculate message latency #386

dirkmc commented Apr 29, 2020

Stebalien left a comment

dirkmc commented Apr 30, 2020 •

edited

dirkmc commented Apr 30, 2020 •

edited

Stebalien May 1, 2020

dirkmc May 1, 2020

Stebalien May 2, 2020

Stebalien commented May 1, 2020

calculate message latency #386

calculate message latency #386

Conversation

dirkmc commented Apr 29, 2020

Stebalien left a comment

Choose a reason for hiding this comment

dirkmc commented Apr 30, 2020 • edited

dirkmc commented Apr 30, 2020 • edited

Stebalien May 1, 2020

Choose a reason for hiding this comment

dirkmc May 1, 2020

Choose a reason for hiding this comment

Stebalien May 2, 2020

Choose a reason for hiding this comment

Stebalien commented May 1, 2020

dirkmc commented Apr 30, 2020 •

edited

dirkmc commented Apr 30, 2020 •

edited