Idea: greenlet switch detection callback #5

Closed
snaury opened this Issue Mar 6, 2012 · 23 comments

Comments

Projects
None yet
5 participants
Owner

snaury commented Mar 6, 2012

First, a fairy story: I have a very big system based on greenlet at work, with dozens of plugins and thousands of greenlets, all in one process. Occasionaly, I notice that my system stalls with severe lags, where some instances of plugins might wait minutes in line for their turn to run. It's quite obvious that one of plugins contains a bug, likely doing something blocking, or maybe computing something for a bit too long. If only it was possible to know which plugin/function consumes CPU cycles the most!

As a side note: usual profiling tools don't work correctly. Others use hacks to detect greenlet switches, with workarounds for specific event libraries (e.g. gevent), and are not optimal. The most problematic case is when switch goes into C code. Python's settrace only traces on Python code, any switch to C code is invisible until trace resurfaces back in Python code, in some other greenlet, unknown time later.

I propose a new global function set_switch_callback that accepts a callback object as a parameter and returns previous callback object (None initially). The calback value of None would mean no callback is called. The callback would execute immediately before an actual C-level switch or throw with two parameters: current greenlet and target greenlet objects. The callback is only called when those differ, obviously. The callback is only set for the current thread, making it thread-safe and similar to settrace. Any exceptions raised by the callback will be raised by switch/throw instead of the intended action.

This way tracing and profiling code would be able to have a definite information about greenlet switches (and thus would be able to time them), even when switches are made between greenlets in C.

What do you think?

Owner

snaury commented Mar 7, 2012

Actually, I think the only needed callback argument is the target greenlet, since current greenlet can be requested with getcurrent().

Owner

schmir commented Mar 7, 2012

I think having both greenlets doesn't hurt and probably makes things easier. It also allows us to convert it to a "post-switch callback". I currently can't think of a reason why one would prefer one over the other however.

what should happen if the callback tries to switch greenlets?

other than that it looks good to me.

This would be amazing. Having both the current and target greenlet makes sense to me -- for the profiling use case, you'd always need both anyway. I also can't think of any reasons why post-switch would be better than pre-switch, but passing both greenlets leaves the option open, as you mentioned.

I can't think of any profiling-related things I'd want in this beyond the callback itself and the greenlet info.

I'm kind of torn about greenlet switching inside the callback. My first reaction was to prevent it and throw some horrible exception. After thinking about it for a bit I'm less sure about that. I can't think of any reason that greenlet switching in the callback (and thus reentrantly calling the callback again) would be a good thing, but maybe someone else can think of some awesome thing this enables. Maybe just let the user deal with it. If they do something that switches greenlets inside the callback, it's their problem. Probably not worth writing a lot of code to handle gracefully.

Anyway, this would be great and will make profiling much easier and more accurate.

Owner

snaury commented Mar 7, 2012

I just realized that for tracing/profiling case there's a possible major issue if callback is implemented trivially: trace would trigger inside the callback. It seems there are some protective fields in thread state though: use_tracing and tracing, which would prevent tracing from running.

As for switching inside callbacks, it might indeed cause a lot of problems, either recursive switching, or if callbacks are disabled while inside callbacks it might introduce logical error: from the point of view of switching greenlet the value was already sent, but switching may lead to delayed events, that may lead to unexpected throws back into the greenlet (e.g. timeouts). The easiest solution is a flag that current greenlet is inside callback right now, and switch/throw raising GreenletError if the flag is set. This would enforce that callbacks should be simple (no blocking queues or database queries), but keep strict switching ordering intact, impossible to break even accidentally.

Owner

snaury commented Mar 7, 2012

As for the implementation I think it should be quite easy. New slot in the thread dict for the callback itself, reentrancy check in g_switch, actual callback check/call in g_switchstack, would need to save/restore globals though, since other threads might stomp on them while callback is doing its sleep(777). And as soon as I mapped how this would be actually implemented switching inside callbacks is definitely out, as it would have to be restartable, which is nearly impossible, e.eg not all code paths are restartable: callback might successfully switch into new greenlets, breaking bootstrapping in g_initialstub completely.

Owner

snaury commented Mar 7, 2012

...or maybe not. There are some edge cases (like greenlet deallocations) that would prevent me from putting it into g_switchstack. And since deallocations may lead to switches (which would be disabled), those would need to be deferred just like across threads. Looks like more moving pieces than I thought initially...

Owner

snaury commented Mar 10, 2012

Ok, after a lot of thought, it turned out that disabling switching inside callbacks would be way too complicated (primarily because of switches on greenlet deallocations). This also meant that the only safe place to call callbacks is after doing a switch, when all passaround information is safely processed, but before user code in target greenlet is executed. A work in progress patch is at https://github.com/snaury/greenlet, no docs no tests yet, though:

  • Adds greenlet.settrace and greenlet.gettrace
  • Trace function is called with two arguments: event/reason ("initial", "switch", "throw") and a tuple of (origin, target)
  • Trace function is free to switch/throw, but it should except to be called recursively, unless it calls settrace(None) first (this is primarily so that switches on greenlet deallocations are not "lost" while in tracing)
  • If trace function raises an exception, then this exception will be raised in target greenlet and tracing will be disabled until another settrace call in that thread (I think this is what sys.settrace does)
  • Normal Python tracing/profiling is disabled while inside greenlet trace callback, while sorta ugly, this should help by not tracing/profiling tracing code itself... disadvantage would be that greenlet trace functions cannot be debugged (I'm not sure Python trace functions can be debugged though, so maybe it's no big deal)
Owner

schmir commented Mar 10, 2012

I've only took a short look at the code. One thing I noticed is that the 'reason' argument always get recreated on every switch. I think I would prefer some module level constants (or at least some interned strings).

Owner

snaury commented Mar 10, 2012

There was an even worse bug with switch origin (my logic failed somehow). I just pushed a version that fixes that (by reintroducing ts_origin, for tracing only this time though), plus interns callback strings. Also, "initial" is gone now, I think "switch" and "throw" are more appropriate. It's better if we later introduce proper "start" (after switch/throw trace) and "stop" (before the final switch out) with a single argument, instead of (origin, target) tuple.

One feature I really need in greenlets, is to be able to enumerate all active greenlets, and dump their stack.

Something like:

from greenlets import enumerate
import traceback

for g in enumerate():
    traceback.print_stack(g.gr_frame)    

Because right now it's a nightmare to debug deadlocks, or even to understand which piece of code is timing out etc.

saghul commented May 24, 2012

@tarekziade You could do that yourself as follows:

  • Every time you create a greenlet add it to a WeakSet
  • Iterate that weakset when you need all greenlets. Since only weak references are stored they'll vanish once the last reference to a greenlet disappears

Now, if greenlets are spawned by some library out of your control then you might be a bit out of luck, though.

@saghul Yes it's the latter : I don't control greenlets creation as they are spawned by various third party code that use gevent. I am trying to debug a weird lock by dumping the stack of each active greenlet when it happens

Owner

snaury commented May 24, 2012

One feature I really need in greenlets, is to be able to enumerate all
active greenlets, and dump their stack.

You can already do it with gc.get_objects. Just filter greenlet instances.
Greenlet module does not, and will not, track its instances. It's already
done by gc.

@snaury oh! I did not think about this. Trying it now, thx :)

Owner

schmir commented Jun 19, 2012

@snaury I'd prefer to have a slightly different function signature for the tracing function, i.e.

def trace(event, origin, target)

instead of

def trace(event, (origin, target))

The latter is a bit surprising to me. Or is there a good reason not to use the former signature?

Owner

snaury commented Jun 20, 2012

@schmir well, the primary reason is extensibility. The actual signature is:

def trace(event, arg)

Similar to Python's:

def trace(frame, event, arg)

Only when event is "switch" or "throw" then arg is a (origin, target) tuple. I can't say I like the current signature very much, but I didn't want to trap ourselves in a particular fixed number of events, i.e. what if we add other events later, that have no concept of origin or target?

Owner

snaury commented Jun 20, 2012

@schmir, also I'm thinking maybe in case of "throw" it should actually be (origin, target, (exc_type, exc_value, exc_traceback)), I haven't really decided on it yet.

Owner

schmir commented Jun 20, 2012

ok. We could also use a custom event class with target, origin, exc_info attributes:

def trace(event):
print event.type, event.origin, event.target

or use keyword arguments:

def trace(type, origin=None, target=None, exc_info=None, **kw):
....

What do you think of those? I like both of them better than passing tuples, but it may be overkill depending on what we end up passing to the trace function.

Maybe I should just release 0.4.0 as planned in its current shape..

Owner

schmir commented Jun 20, 2012

what about using the following signature

def trace(event, origin, target, extra)

with extra providing any additional information that you like to pass?

Owner

snaury commented Jun 20, 2012

@schmir if you say so, but what about (possible, future) trace events that don't have origin/target? If you're ready to commit to trace calls always having origin/target, then I'll change the code. Custom class will blow up the code significantly (struct layout, attributes, gc traverse/clear, alloc/dealloc, etc.), as for keyword arguments that's too restrictive. Anyway, please tell me what to do and I'll do it. :)

@srlindsay what do you think on any of these?

Owner

schmir commented Jun 20, 2012

Alexey Borzenkov writes:

@schmir if you say so, but what about (possible, future) trace events
that don't have origin/target? If you're ready to commit to trace
calls always having origin/target, then I'll change the code.

well, we pass in None as origin/target then.

Custom class will blow up the code significantly (struct layout,
attributes, gc traverse/clear, alloc/dealloc, etc.),

right, that's one of the reasons I proposed the 'extra' argument. It's
trivial to implement.

as for keyword arguments that's too restrictive.

why?

Anyway, please tell me what to do and I'll do it. :)

I hope I'm able to introduce the 'extra' argument on my own (at least
when passing only None) :)

@srlindsay what do you think on any of these?

yes, that would be nice to know.

Cheers
Ralf

I'd prefer the simplest option that still allows for future extensibility,
which seems like

def trace(event, arg)

with the assumption that once we establish what arg is passed for a
particular event, we don't change it later. (switch -> (origin,target),
etc). I can imagine some events where origin/target isn't applicable, so
having those in the signature doesn't feel right to me.

I don't see that using classes or keyword args allows anything that we
can't do with 'arg', so the added complexity feels unnecessary.

The remaining option,

def trace(type, origin, target, extra)

only makes sense for two use cases:

  1. that we want to add something to a pre-existing event (e.g. we later
    realize that 'switch' events need more than origin and target, so we just
    throw it in 'extra'. This feels ugly. In the (event,arg) case, we'd need
    to make a breaking change to arg going from (origin,target) to
    (origin,target,new_thing). I think I'm okay with that.
  2. we want to allow library users to attach some context that can be used
    in their callbacks. I can't think of how I would use this in the profiling
    case. This seems complicated and probably worth ignoring for the moment.

Anyway, I feel that a generic arg allows us to do everything that I'd want,
without requiring us to put in 'None' for arguments that aren't appropriate
for all event types.

On Wed, Jun 20, 2012 at 12:49 PM, Ralf Schmitt <
reply@reply.github.com

wrote:

Alexey Borzenkov writes:

@schmir if you say so, but what about (possible, future) trace events
that don't have origin/target? If you're ready to commit to trace
calls always having origin/target, then I'll change the code.

well, we pass in None as origin/target then.

Custom class will blow up the code significantly (struct layout,
attributes, gc traverse/clear, alloc/dealloc, etc.),

right, that's one of the reasons I proposed the 'extra' argument. It's
trivial to implement.

as for keyword arguments that's too restrictive.

why?

Anyway, please tell me what to do and I'll do it. :)

I hope I'm able to introduce the 'extra' argument on my own (at least
when passing only None) :)

@srlindsay what do you think on any of these?

yes, that would be nice to know.

Cheers
Ralf


Reply to this email directly or view it on GitHub:
#5 (comment)

Owner

schmir commented Jun 21, 2012

Shaun Lindsay writes:

I'd prefer the simplest option that still allows for future extensibility,
which seems like

def trace(event, arg)

with the assumption that once we establish what arg is passed for a
particular event, we don't change it later. (switch -> (origin,target),
etc). I can imagine some events where origin/target isn't applicable, so
having those in the signature doesn't feel right to me.

ok, then let's do that and keep the code as it is.

@schmir schmir closed this Jun 21, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment