Expose task queue diagnostics #302

hannahhoward · 2021-12-08T04:01:46Z

Goals

Provide information about the Graphsync task queues in terms of what tasks are active and pending for a given peer

Implementation

Expose newly implemented PeerTopics on the TaskQueue itself
Add converted information from this method to the PeerStats method on the main implementation to expose the current active/pending requests
More importantly provide a diagnostic method that examples request states and the active and pending queues and provides information about which requests are in unexpected states

rvagg · 2021-12-08T04:11:09Z

OK, so all things being equal and stable, with this I should be able to get a PeerStats() and then call QueueDiagnostics() on either the outgoing or incoming request+queue state pair and get a len(x) == 0 result to indicate that there are no bugs and therefore dangling requests or worker queue jobs? So it's a len(x)>0 === 🚨 type diagnostic?

rvagg

seems fine to me, 3 comments though:

The reach-around for toRequestId() is unfortunate but I suppose impossible to deal with without exposing a larger surface area of those internal APIs or a major refactor. Maybe a warning that there's work to be done in the clarity of that API.
We don't have any duplicate ID checking at either collection or within QueueDiagnostics() (e.g. checking whether the ID exists in the matched* maps). I know there's a bit scattered around elsewhere during actual execution, is that enough to put aside concerns about possible duplicates?
PeerStats() doesn't do any synchronization so isn't it theoretically possible to catch mismatching states between the requests and the worker queue due to timing differences in execution of the collection at those points? How much does that undermine the utility of this QueueDiagnostics()?

hannahhoward · 2021-12-08T23:07:48Z

@rvagg I think you make some good points and I'm gonna finagle this a bit.

hannahhoward · 2021-12-09T00:07:27Z

@rvagg I think I've found a way to solve most of our problems. unless you object, i'm going to merge

rvagg

👌 will be interesting to see if this finds anything in production

feat: add WorkerTaskQueue#WaitForNoActiveTasks() for tests (#284) * feat: add WorkerTaskQueue#WaitForNoActiveTasks() for tests * fixup! feat: add WorkerTaskQueue#WaitForNoActiveTasks() for tests fix(responsemanager): fix flaky tests fix(responsemanager): make fix more global feat: add basic OT tracing for incoming requests Closes: #271 docs(tests): document tracing test helper utilities fix(test): increase 1s timeouts to 2s for slow CI (#289) * fix(test): increase 1s timeouts to 2s for slow CI * fixup! fix(test): increase 1s timeouts to 2s for slow CI testutil/chaintypes: simplify maintenance of codegen (#294) "go generate" now updates the generated code for us. The separate directory for a main package was unnecessary; a build-tag-ignored file is enough. Using gofmt on the resulting source is now unnecessary too, as upstream has been using go/format on its output for some time. Finally, re-generate the output source code, as the last time that was done we were on an older ipld-prime. ipldutil: use chooser APIs from dagpb and basicnode (#292) Saves us a bit of extra code, since they were added in summer. Also avoid making defaultVisitor a variable, which makes it clearer that it's never a nil func. While here, replace node/basic with node/basicnode, as the former has been deprecated in favor of the latter. Co-authored-by: Hannah Howard <hannah@hannahhoward.net> fix: use sync.Cond to handle no-task blocking wait (#299) Ref: #284 Peer Stats function (#298) * feat(graphsync): add impl method for peer stats add method that gets current request states by request ID for a given peer * fix(requestmanager): fix tested method Add a bit of logging (#301) * chore(responsemanager): add a bit of logging * fix(responsemanager): remove code change chore: short-circuit unnecessary message processing Expose task queue diagnostics (#302) * feat(impl): expose task queue diagnostics * refactor(peerstate): put peerstate in its own module * refactor(peerstate): make diagnostics return array

feat(impl): expose task queue diagnostics

e97485a

hannahhoward requested a review from rvagg December 8, 2021 04:01

rvagg approved these changes Dec 8, 2021

View reviewed changes

hannahhoward added 3 commits December 8, 2021 15:58

refactor(peerstate): put peerstate in its own module

49b47dd

refactor(peerstate): make diagnostics return array

3882181

Merge branch 'main' into feat/expose-peer-task-queue

645f95e

rvagg approved these changes Dec 9, 2021

View reviewed changes

hannahhoward merged commit f08c2ed into main Dec 9, 2021

mvdan deleted the feat/expose-peer-task-queue branch December 15, 2021 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose task queue diagnostics #302

Expose task queue diagnostics #302

hannahhoward commented Dec 8, 2021

rvagg commented Dec 8, 2021

rvagg left a comment

hannahhoward commented Dec 8, 2021

hannahhoward commented Dec 9, 2021

rvagg left a comment

Expose task queue diagnostics #302

Expose task queue diagnostics #302

Conversation

hannahhoward commented Dec 8, 2021

Goals

Implementation

rvagg commented Dec 8, 2021

rvagg left a comment

Choose a reason for hiding this comment

hannahhoward commented Dec 8, 2021

hannahhoward commented Dec 9, 2021

rvagg left a comment

Choose a reason for hiding this comment