Barbara wants insight into her running service #75

nikomatsakis · 2021-03-20T08:34:09Z

Brief summary

Hello, I'm Barbara, I just want to know how many tokio tasks are idling at any given moment and also how much memory they use, also I want to know which tasks haven't been pulled in the last 10 minutes ok I guess I want a bunch of things

Optional details

(Optional) Which character(s) would be the best fit and why?
- Alan: the experienced "GC'd language" developer, new to Rust
- Grace: the systems programming expert, new to Rust
- Niklaus: new programmer from an unconventional background
- [Barbara]: the experienced Rust developer
(Optional) Which project(s) would be the best fit and why?
- List some projects here.
(Optional) What are the key points or morals to emphasize?
- Write some morals here.
Sources:
- @fasterthanlime's tweet

fasterthanlime · 2021-03-21T12:32:41Z

Hi, I'm this Barbara! I'm going through a tough time personally so I'm not writing as much as I usually do (although it's been explicitly requested by wg-async-foundations).

I think most of my wishes are executor-specific. In my case, only tokio knows how many tasks are running, only mio knows which I/O resources are "idle", etc.

I've been wanting better memory profiling as well, something that may not be directly async-related, but tends to be a topic of interest for the same group of people: we heavy async users often write servers, servers are long-running processes, small memory leaks become large memory leaks over time.

I've had good experience with something like koute/memory-profiler (from a Nokia dev), but it's still not enough. My next steps are trying out Valgrind's DHAT (but expecting a 4x-200x performance hit, woo) and giving jemalloc another try (since it's no longer the default for Rust), as it also has profiling capabilities.

The async-related point all of these are in common are: the quality of stack traces. There's two main problems here afaict: async stack traces are "noisy", and they're lacking information like "what task spawned this"?

See this (truncated) stack trace:

Which task is doing http2? What client caused the error? How do I associate any sort of context to that? Error handling is another Lively Rust Topic, but we have options there to attach context (anyhow, jane-eyre, etc.).

For monitoring, the tracing family of crates go a long way, tokio even has an opt-in feature to attach a span to all tasks (although, again, with very little info - and it's extremely noisy), but it's still really hard to tell from just a bunch of stack traces what's going on when you're looking at a deadlock, a livelock, or a bunch of potential memory leaks.

Again, a lot of this really is executor-specific, and I don't even the beginning of a good answer to "how do we make it better". Just like V8 now tracks promise chains, it might be useful to track which task spawned which other task, and to be able to add metadata to tasks (and then wait a couple years for the whole ecosystem to adapt that). This also feels executor specific, but maybe there could be a standardized interface for that? Buuut that would probably require allocations, which is also a typical point of contention when discussing any sorts of additions/changes.

Anyway. I want better visibility into my async stuff, especially when it's sprawling and it goes wrong with memory/cpu usage.

nikomatsakis · 2021-03-22T14:44:42Z

@fasterthanlime thanks!

carols10cents · 2021-03-24T20:32:02Z

I was thinking about writing something along these lines, so I'm going to add my thoughts here:

Let's say I'm Barbara making my first foray into async, but slightly differently-- I start by writing a fully synchronous service, and then I want to make it async. I sprinkle async and await everywhere, pick an executor, and... did it work? How can I tell I did it right? Say it's slower on some benchmark for some operation than before I added async, how do I go about figuring out why?
How can I tell if I'm holding onto, say, locks longer than I need to be and thus overconstraining my system?
If I have a non-data race condition, and I manage to find some way to reproduce it, how do I see more details into how tasks are running (in what order, with what concurrency) to be able to see what the underlying problem is?

I'm not even sure if I'm using the right words here, or if what I want already exists in some form, but it'd be nice to have some way to get visibility into what my async Rust code is actually doing in various situations to know that it's doing what I expect it to do, or not.

rylev · 2021-04-09T08:56:50Z

@nikomatsakis does #114 sufficiently address this? Can this be closed?

nikomatsakis · 2021-04-09T16:23:18Z

i think so. Let's close.

nikomatsakis added good first issue Good for newcomers help wanted Extra attention is needed status-quo-story-ideas "Status quo" user story ideas labels Mar 20, 2021

eminence mentioned this issue Mar 26, 2021

[meta] Covering the full range of status-quo stories #98

Open

wesleywiser mentioned this issue Mar 29, 2021

Add status quo story about Alan trying to debug an app hang #106

Merged

emmanuelantony2000 mentioned this issue Apr 1, 2021

Barbara wants async insights #114

Merged

nikomatsakis closed this as completed Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Barbara wants insight into her running service #75

Barbara wants insight into her running service #75

nikomatsakis commented Mar 20, 2021

fasterthanlime commented Mar 21, 2021

nikomatsakis commented Mar 22, 2021

carols10cents commented Mar 24, 2021

rylev commented Apr 9, 2021

nikomatsakis commented Apr 9, 2021

Barbara wants insight into her running service #75

Barbara wants insight into her running service #75

Comments

nikomatsakis commented Mar 20, 2021

Brief summary

Optional details

fasterthanlime commented Mar 21, 2021

nikomatsakis commented Mar 22, 2021

carols10cents commented Mar 24, 2021

rylev commented Apr 9, 2021

nikomatsakis commented Apr 9, 2021