Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make yappi grok greenlets #57

Merged
merged 15 commits into from
Sep 17, 2020

Conversation

Suhail-MOHD
Copy link
Contributor

@Suhail-MOHD Suhail-MOHD commented Aug 4, 2020

Proposal to make yappi grok greenlets

This PR is a proposal to make yappi understand greenlets. The change allows yappi to map each greenlet to a unique execution context. Further, it also adjusts CPU stats measured to account for interleaving greenlet execution on one thread.

Changes required

The following changes are required by the fix:

  • ensure _profile_thread is called only from the target thread to be profiled.

    _profile_thread invokes the _current_context_id operation which retrieves
    the execution context of a thread.

    Using _current_context_id to retrieve the context information from another thread is not guaranteed to work and is hard to get right when yappi is configured with set_context_id_callback because the callback is not passed the information about what thread it has to fetch information for.

    It is also to hard to achieve the above when using greenlets because there is no obvious way to retrieve the greenlet currently executing on another thread. i.e there is no simple way to retrieve the greenlet currently executing on threadA from threadB.

    To get around this, this change guarantees that _profile_thread and _ensure_thread_profiled is only called from the target thread by removing their invocations from the 'start' sequence when 'multithreading' is enabled and delaying it. This is done by attaching a separate profiler func callback (profile_event and profile_event_c) to the threads. These functions first run _ensure_thread_profiled in the target thread context and change the profiler func callback to _yapp_callback. This way the work done by the start sequence is reduced and _ensure_thread_profiled is delayed to the first time a callback is invoked on the target thread.

  • add set_ctx_backend and allow specifying threading backend as greenlets as suggested by @sumerc

  • modify _current_context_id to identify the greenlet currently being executed when backend is greenlets. This is done the same way as it is for threads, except here we use the dictionary of the greenlet rather than that of the thread.

  • account for interleaving greenlet executions on the same native thread

    greenlet is a co-operative multitasking framework that introduces user controlled threads called greenlets. Each native python thread can run a number of greenlets. Greenlets on the same thread co-operatively switch between one another using greenlet.switch. it is important to note that a greenlet can only execute on the thread it was spawned on. As a result, greenlets spawned on one thread can only switch between each other.

    Greenlets do not play well with yappi. Yappi uses thread local stats to track the CPU usage of each frame. It tracks the usage at the entry of each function and at the exit of each function and subtracts the values to get the total clock cycles spent in that frame. This does not work well with greenlets. Let's take an example of two greenlets A and B and view the following sequence of events.

    • greenletA is currently running
    • greenletA enters function foo
    • greenletA co-operatively switches to greenletB
    • greenletB consumes 1 second of CPU
    • greenletB switches over back to greenletA
    • greenletA exits function foo

    The CPU usage of function foo will be 1 second more than it should be because greenletB's execution is interleaved.

    To account for this we have to track the context switches and adjust stats. The solution implemented in this change is a derivative of https://github.com/ajdavis/GreenletProfiler with some modifications to account for multiple native threads

    To track context switches:

    • declare a new thread local variable tl_prev_ctx. This needs to be thread local since multiple threads can exist. Greenlet context switches should be tracked at a thread level rather than at a global level.
    • tl_prev_ctx is initialised in _ensure_thread_profiled. It is initialised to the greenlet currently running on that thread
    • use tl_prev_ctx in _yapp_callback to determine if the context changed since the last time it was set.

    To adjust stats, if a switch is noticed from prev->current, then:

    • pause prev. Store the time it was paused under the paused_at field of the context associated with prev
    • resume the current greenlet. use the paused_at field and the current value of tickcount to determine how long the greenlet was paused for. Since a greenlet can only execute on one thread, subtracting these values is an accurate measure.

NOTE: Another way to track context switches is by using greenlet.settrace (see https://greenlet.readthedocs.io/en/latest/). However installing this trace function for each native thread and uninstalling it did not seem easy to implement.

What is remaining?

This is not a complete fix, here's what is remaining:

  • make this work for wall clock time or document limitations of wall clock time measurement
  • make declaration of tl_prev_ctx platform independent. __thread is not portable to windows AFAIK
  • fix build system to install greenlet before compilation of yappi C extension. greenlet.h is required by yappi.
  • enumerate and add more test cases
  • there are a few TODOs added within the code. They are to be complete before closing. Some of them require discussion with @sumerc for a better understanding.

Tests performed

import gevent
import yappi

def burn_cpu(sec):
    t0 = yappi.get_clock_time()
    elapsed = 0
    while (elapsed < sec):
        for _ in range(1000):
            pass
        elapsed = yappi.get_clock_time() - t0


def foo():
    burn_cpu(0.1)
    gevent.sleep(1.0)


def bar():
    burn_cpu(0.1)
    gevent.sleep(1.0)


#yappi.set_clock_type("wall")
yappi.set_ctx_backend("greenlet")
yappi.start()
g1 = gevent.spawn(foo)
g2 = gevent.spawn(bar)
gevent.wait([g1, g2])

yappi.stop()
yappi.get_func_stats(
    filter_callback=lambda x: yappi.func_matches(x, [foo, bar])
).print_all()
Clock type: CPU
Ordered by: totaltime, desc

name                                  ncall  tsub      ttot      tavg
rand_test.py:19 bar                   1      0.000032  0.100202  0.100202
rand_test.py:14 foo                   1      0.000040  0.100171  0.100171
  • automated tests were executed with both python2.7 and python3.8

…ame thread

This change adds support for greenlets in yappi by doing the following:

- ensure '_profile_thread' is called only from the target thread to be profiled.

  '_profile_thread' invokes the '_current_context_id' operation which retrieves
  the execution context of a thread.

  Using '_current_context_id' to retrieve the context information from
  another thread is not guaranteed to work and is hard to get right
  when yappi is configured with 'set_context_id_callback' because the callback
  is not passed the information about what thread it has to fetch information for.

  It is also to hard to achieve the above when using greenlets because
  there is no obvious way to retrieve the greenlet currently executing
  on another thread. i.e there is no simple way to retrieve the greenlet
  currently executing on threadA from threadB.

  To get around this, this change guarantees that '_profile_thread' and
  '_ensure_thread_profiled' is only called from the target thread by removing
  their invocations from the 'start' sequence when 'multithreading' is enabled
  and delaying it. This is done by attaching a separate profiler func callback
  (profile_event and profile_event_c) to the threads. These functions first
  run '_ensure_thread_profiled' in the target thread context and change the profiler
  func callback to '_yapp_callback'. This way the work done by the 'start' sequence
  is reduced and '_ensure_thread_profiled' is delayed to the first time a callback
  is invoked on the target thread.

- add 'set_ctx_backend' and allow specifying threading backend as 'greenlets'

- modify '_current_context_id' to identify the greenlet currently being executed

- account for interleaving greenlet executions on the same native thread

  'greenlet' is a co-operative multitasking framework that introduces user controlled
  threads called 'greenlets'. Each native python thread can run a number of greenlets.
  Greenlets on the same thread co-operatively switch between one another using
  'greenlet.switch'. it is important to note that a greenlet can only execute on the
  thread it was spawned on. As a result, greenlets spawned on one thread can only swithc
  between each other.

  Greenlets does not play wel with yappi. Yappi uses thread local stats to track the CPU
  usage of each frame. It tracks the usage at the entry of each function and at the
  exit of each function and subtracts the values to get the total clock cycles spent
  in that frame. This does not work well with greenlets. Let's take an example of two
  greenlets A and B and view the following sequence of events.

  - greenletA is currently running
  - greenletA enters function 'foo'
  - greenletA co-operatively switches to greenletB
  - greenletB consumes 1 second of CPU
  - greenletB switches over back to greenletA
  - greenletA exits function 'foo'

  The CPU usage of function foo will be 1 second more than it should be.

  To account for this we have to track the context switches and adjust stats.
  The solution implemented in this change is a derivative of
  https://github.com/ajdavis/GreenletProfiler with some modifications to
  account for multiple native threads

  To track context switches:
    - declare a new thread local variable 'tl_prev_ctx'. This needs to be
      thread local since multiple threads can exist. Greenlet context switches
      should be tracked at a thread level rather than at a global level.
    - 'tl_prev_ctx' is initialised in '_ensure_thread_profiled'. It is
      initialised to the greenlet currently running on that thread
    - use 'tl_prev_ctx' in '_yapp_callback' to determine if the context
      changed since the last time it was set.

  To adjust stats, if a switch is noticed from 'prev'->'current', then:
    - pause 'prev'. Store the time it was paused under the '_ctx' struct
      associated with that greenlet
    - resume the current greenlet. use the 'paused_at' field to determine
      how long the greenlet was paused for. Add this value to all timestamps
      in the current callstack.
@Suhail-MOHD
Copy link
Contributor Author

@sumerc I have written down the proposal along with a change that reflects it. I am sending it along at this stage to get early feedback. There are still a few details like build issues that I am yet to fix, I have listed them down at the end of the last message. I hope what is done is enough to convey the idea and get feedback.

@sumerc
Copy link
Owner

sumerc commented Aug 5, 2020

Thanks for the awesome work!

I will hopefully look at this today.

yappi/_yappi.c Outdated
{
NATIVE_THREAD = 0x00,
GREENLET = 0x01,
} _subsystem_type_t;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ctx_type_t is better IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will change this. I had initially used the term 'subsystem' in place of 'backend'. Missed this while changing it to 'backend'. Thanks for catching!

yappi/_yappi.c Outdated
// TODO(suhail): Understand how it is possible to reach here when
// 'flags.multithreaded' is false. Also if multithreaded is false, then
// this is definitely not the initial thread that called 'start' and so
// we should not continue to invoke '_yapp_callback' as we do now.
if (flags.multithreaded) {
Copy link
Owner

@sumerc sumerc Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this only checks for multithreaded==true. False is not possible as you indicate. That if check is there just for extra readability/verbosity? Not sure :) YOu can remove it if you think it is confusing. THis is for the case, where we have started yappi and some other thread started running and a profile event is called for the new thread. _ensure_thread_profiled() is always valid for multithreaded profiling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh alright. I assumed it was added there to handle an edge case when multithreaded is false but I could neither understand how it would be possible nor how this would fix anything if hit. I'll remove it in case somebody else stumbles upon the same problem.

Copy link
Owner

@sumerc sumerc Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-thinking about the problem again I think the correct behaviour should be:

if (!flags.multithreaded) {
    Py_RETURN_NONE;
}

LIke you indicated, right? I was mistaken saying that there is no way this code might be called with multiple threads. It can be called, since profile_event will be called for every thread generated via using threading.set_profile API. However, there is still no problem. Because inside _current_context_id there is another check for this which fixes the problem:

if (!flags.multithreaded) {
     return 0;
}

This makes sure only single thread is profiled. But like you suggested, it would be better if we check for this condition early in the profile_event function.

Please disregard above. profile_event is only called when profile_threads is True:

def start(builtins=False, profile_threads=True):
    ...
    if profile_threads:
        threading.setprofile(_profile_thread_callback)

@sumerc
Copy link
Owner

sumerc commented Aug 5, 2020

First: Thanks for your work on this!

I have inspected the code a bit. There are still some unclear points to me but overall design seems fine. I would like to ask couple of questions, though:

1/ I have seen a Greenlet dependency has been added. Is it possible to remove it? Previously I was thinking like this: instead of getting the current greenlet id from C extension, I was thinking to call Python side to get the id and use it. In Python side, we can do it by late imports. This is for: for most of the users of Yappi, greenlet support will not be used and adding a hard dep. might not be a good idea. If greenlet is installed by the user however, Yappi will just work fine. Of course: we can show some warning if the version is not supported by us? WDYT?

2/ This one is important for me: why we don't utilize get_tag_cbk? Is there a reason we only use set_ctx_id_cbk for detecting greenlet switches? What we can do instead is we set tag callback to greenlet.current_id() and detect the context switch at that level. This also eliminates the need for thread local data as _ctx structure will already be thread local. I have implemented the same approach for asyncio. Each couroutine executing will set a unique tag and you can dump stats per-tag at the end.

3/ I also could not quite understand why we need _bootstrap_profile function. But my initial feeling tells me that it might be related with 2. That is why I postpone looking it in detail until I get an answer for 2.

I have also answered questions in the code and write few comments.

yappi/_yappi.c Outdated
Py_RETURN_NONE;
}

// TODO(suhail.muhammed): Is this the correct condition to check?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But I think we also need to add a check for: there should be no stats available. all stats cleared. See: yapphavestats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this as well.

@Suhail-MOHD
Copy link
Contributor Author

Hi @sumerc, thanks for the quick review!

1/ I have seen a Greenlet dependency has been added. Is it possible to remove it? Previously I was thinking like this: instead of getting the current greenlet id from C extension, I was thinking to call Python side to get the id and use it. In Python side, we can do it by late imports. This is for: for most of the users of Yappi, greenlet support will not be used and adding a hard dep. might not be a good idea. If greenlet is installed by the user however, Yappi will just work fine. Of course: we can show some warning if the version is not supported by us? WDYT?

Yes, I agree with you that a hard greenlet dependency is not a good idea. Users that don't profile greenlets must not be forced to install it. When using the delayed import model, could you clarify how it would work when a user who doesn't have greenlet installed specifies backend as greenlet. Will it work like the following example?

import yappi

yappi.set_ctx_backend("greenlet") --> this will raise YappiGreenletsNotInstalled exception (or a better name :) 

If this is how you envision it to work, then I believe same can be achieved while using the C api. I will try to demonstrate this in my next comment.

Also, we cannot use id(greenlet_object) because ids in python can be reused once that object is garbage collected. I did use this initially but hit errors where a greenlet object was garbage collected and another greenlet on a different native thread reused the same id again. This resulted in _log_err(15). I then moved it to the C extension and reused the same mechanism you used for threads (_yappi_tid and ycurthreadindex).

2/ This one is important for me: why we don't utilize get_tag_cbk? Is there a reason we only use set_ctx_id_cbk for detecting greenlet switches? What we can do instead is we set tag callback to greenlet.current_id() and detect the context switch at that level. This also eliminates the need for thread local data as _ctx structure will already be thread local. I have implemented the same approach for asyncio. Each couroutine executing will set a unique tag and you can dump stats per-tag at the end.

Using the tag concept to achieve this didn't cross my mind. To me, the context concept mapped naturally to greenlets because each greenlet has it's own separate stack much like native threads do. I will spend some more time understanding the tag concept and how you have used it to solve for coroutines before answering your question.

3/ I also could not quite understand why we need _bootstrap_profile function. But my initial feeling tells me that it might be related with 2. That is why I postpone looking it in detail until I get an answer for 2.

Yes this is related to 2. It was done to remove the invocation of _profile_thread from _start and delay it until we are in the the target thread. This way _current_context_id is only called from the thread whose context id we want.

@Suhail-MOHD
Copy link
Contributor Author

Update regarding the greenlet dependency. I am trying to use a python function instead of the C api to retrieve the context id as suggested by you, @sumerc . It is easier that way and removes the hard dependency on greenlets. I could not find way to achieve this with the C API.

Since we can't use id(greenlet) as mentioned in an earlier comment, I thought we could use something like this:

import itertools
from greenlet import getcurrent

counter = itertools.count().next

def get_greenlet_id():
    curr_greenlet = getcurrent()
    try:
        return curr_greenlet._yappi_tid
    except AttributeError:
        id = counter()
        curr_greenlet._yappi_tid = id
        return id

@sumerc
Copy link
Owner

sumerc commented Aug 10, 2020

Since we can't use id(greenlet) as mentioned in an earlier comment, I thought we could use something like this:

Yes. Exactly. I am not sure if this function needs to be thread-safe but yes. The idea is correct.

Please do not forget to use _call_funcobjargs when calling from C to Python, otherwise you might end up with hard to catch errors since a context switch can happen on Python side and update some Yappi globals and return back.

@sumerc
Copy link
Owner

sumerc commented Aug 10, 2020

I was thinking about using tag_cbk instead of ctx_cbk and found 2 more advantages supporting this design of choice (if possible of course):

  • the get_thread_stats() API retrieve per-thread information. if we mess up with ctx_id retrieval, this API will return per-greenlet information which might be confusing for users. What I suggest is to add a new API: get_greenlet_stats by using tag ids if it makes any sense to look at per-greenlet statistics. Not sure if you have a use case for that?

  • start(multithread) flag again is called multithread but if we use ctx_id it should be called multigreenlet. If there is a use case for profiling only a single greenlet (which I suspect) we can add a new parameter for that as well instead of this param.

The core of the problem is: yappi uses some convention in its APIs related to threading for years and I think they should be doing what they are already doing. Now adding a new functionality, IMHO we should add new params/APIs instead of changing the meaning of existing ones. WDYT?

I am still investigating any potential issues in parallel, too.

@sumerc
Copy link
Owner

sumerc commented Aug 10, 2020

Let me give some more information about the internals of Yappi:

Every function profiled has an entry in pits table and that is a list of _pit * entries.

_pit will be generated for every PyCodeObject or PyCFunctionObject. _pit will also be generated for every new context(thread). With latest asyncio support every pit also has one unique field, too: tags. So, together they form the uniqueness of a pit:

``PyCodeObjectorPyCFunctionObject` pointer + thread(context) + tag`

So, if we set a new tag_id for every greenlet, that means we can generate a new pit per thread/per greenlet/per function. One important issue with this uniqueness is recursion. For total time calculation: every recursive function should accumulate the timing info whenever a function exits the last one in the stack. This means we need to hold rec_level per function somehow. Now, in Yappi, rec_level variable is held in _ctx structure per pit. This means for a specific thread, every pit has a single recursion level variable. Now if we make sure that: every pit is unique per-threadand per-greenlet by setting a new tag id per-greenlet, I think rec_level variable will be fine. As a new pit will be generated per-greenlet, the rec_level variable will be unique to that pit and this means Yappi will identify the recursion level without problem.

So, I think using tag_cbk should work with greenlets in theory. If any unclear points with above, please inform.

@sumerc
Copy link
Owner

sumerc commented Aug 12, 2020

One last note about above:

#57 (comment) Even though it seems using tag_cbk might have some advantages, it is still perfectly fine to use ctx_id callback if it is working just fine for all the edge cases (we could disable get_threads API is backend is gevent). So, I am counting on you to make this decision @Suhail-MOHD, I just wanted to point out there is another way as well, too...

@Suhail-MOHD
Copy link
Contributor Author

@sumerc I understand what you are suggesting but I am not sure tag_cbk is the right way to go after my investigation. There are some major differences between asyncio tasks and greenlets which make using tag_cbk not so straight forward with greenlets. Please see my explanation below. It explains why there is no easy way to fit greenlet into tag_cbk the way you have done for coroutines.

The difference between greenlets and asyncio is that each greenlet has its own independent python callstack, whereas all tasks in a thread in asyncio work on the same callstack. The next sections elaborates further on this.

In asyncio, multiple tasks running on one thread context switch between each other but share the same python callstack - PyThreadState->frame. When one task yields the execution context, the PyFrameObjects in the stack pertaining to that task get popped off and the frames corresponding to the coroutines of the next task are pushed onto the stack again.

For asyncio, it makes sense to have all coroutines under one context because they all work on the same python callstack and in yappi, one context has one callstack. callstack tracing is not impacted by coroutines. There are other changes you made though for other reasons:

  • to differentiate between a yielding coroutine vs an exiting function/coroutine and hold coroutine state (timing information) to be used once the coroutine is resumed. I believe you have solved for this by leveraging the fact that 'PyFrameObject->f_stacktop' is non NULL to detect that a coroutine frame is yielding.
  • allow differentiating between different asyncio tasks on the same thread. I believe you are trying to solve this using contextvars and tag_cbk based on the implemention of this test - tests/test_asyncio_context_vars.py. This will allow for segregating _pits under the same thread by tag

Greenlets are quite different from python coroutines in that each greenlet has it's own separate python stack. All greenlets running on one thread use the same PyThreadState object but the stack held under PyThreadState->frame is different for each. When the greenlet library switches from greenlet g1 to g2, it stores the current 'PyThreadState->frame' object under g1 first. It then retrieves the stack stored under g2 (stored when g2 was last paused) and sets PyThreadState->frame to this stack. Here is the source for this - greenlet switch. This essentially tricks the python interpreter into executing multiple callstacks on the same thread. There are other details to be taken care of like switching the native stack and other stack related parameters, but that is not relevant to this discussion.

Yappi today associates one callstack with one context. Each context tracks the callstack under _ctx->cs and pushes and pops frames onto this stack when _yapp_callback is invoked in response to function entry or exit in the target program being profiled. If we associate multiple greenlets with the same callstack (i.e one context), then _push_frame and _pop_frame will misbehave and the callstack would be useless. When greenletA exits a frame, yappi could end up popping an entirely different frame from the stack which may have been pushed when running under some other greenlet called greenletB. I hope this aspect makes sense, please ask me to elaborate if I haven't explained it well enough.

To ensure that we trace callstacks properly, it is important that one greenlet maps back to one callstack in yappi. Since one callstack maps to one context today, I thought it would be easier to to map a greenlet to one context. So I propose that we go forwards with ctx_id and fix the API problems that you mentioned. One direction to fixing the API problems is to raise errors when people use thread related APIs when backend is greenlet. We can introduce new APIs for greenlets while retaining internal structure of storing the data. If that sounds alright to you I can work on the finer details to this.

If using ctx_id is still not acceptable and we have to use tag_cbk, the other option is to allow yappi to track one callstack for each unique context, tag pair but I believe that pollutes the definition of context and tag. Based on your current implementation, a context seems to be anything that has a unique callstack and the callstack tracing is performed per context. tags are a way of further segregating _pits under one callstack to account for different actors working on the same callstack. Mapping multiple greenlets to one context id would break this definition because one context has to handle multiple callstacks.

Please let me know how to proceed.

@sumerc
Copy link
Owner

sumerc commented Aug 12, 2020

So I propose that we go forwards with ctx_id and fix the API problems that you mentioned. One direction to fixing the API problems is to raise errors when people use thread related APIs when backend is greenlet. We can introduce new APIs for greenlets while retaining internal structure of storing the data. If that sounds alright to you I can work on the finer details to this.

Ok. Fair enough.

Thanks for your investigation on this. I just wanted to see the potential for tag_cbk since: it maps better with the current API interface. But: if using ctx_cbk makes implementation conceptually simpler, I would definitely prefer that.

The next job is to take a quick peek on the API and see if there are anything that does not make sense with greenlets. (e.g: start(multithreaded) and get_thread_stats()). Like you indicated, we should define some warnings or exceptions on where these do not work. I would leave the decision of having a separate API something like: get_greenlet_stats() to you.

And do extensive testing as much as possible, of course :)

@Suhail-MOHD
Copy link
Contributor Author

Thanks @sumerc, I will resume development for this and work on the following now:

  • addressing review comments
  • using python callback function to retrieve greenlet id
  • fixing the API (thread vs greenlet)
  • testing

yappi/_yappi.c Outdated Show resolved Hide resolved
yappi/_yappi.c Outdated Show resolved Hide resolved
yappi/_yappi.c Show resolved Hide resolved
yappi/_yappi.c Outdated Show resolved Hide resolved
@sumerc
Copy link
Owner

sumerc commented Aug 14, 2020

I have re-viewed your code with more attention and written some more comments as we made sure using ctx_id_cbk is the way forward.

- move greenlet specificd ata stored under _ctx to a separate field
- replace '_subsystem_type_y' with '_ctx_type_t'
- use yapphavestats instead of yapprunning as condition for changing backend
  type
@Suhail-MOHD
Copy link
Contributor Author

HI @sumerc , sorry for the long silence, I was slowed down last week by some other tasks. I have uploaded what I have completed so far.

Here's what is remaining:

  • introduce new APIs for greenlets
  • specify test dependency on gevents so that it is available while running tests. Is there any particular way you would like me to do this?
  • add more tests. I have added some now, will spend some more time thinking about it before closing on this. I would appreciate any suggestions you have as well!

@Suhail-MOHD
Copy link
Contributor Author

Suhail-MOHD commented Aug 22, 2020

I have another question, is invoking yappi.get_func_stats during the execution of the profiler supported? I see that we do pause and resume the profiler here, but I'm wondering whether the callstacks stored under '_ctx' will be re-usable once we resume since we have missed certain events while paused.

@Suhail-MOHD
Copy link
Contributor Author

Thanks @sumerc! I am working on adding the tests right now, will update soon.

@diogobaeder
Copy link

Hi folks,

Just wanted to leave here some words of encouragement: thanks a lot for the great work! I'm really excited to see that you're getting close to finishing giving yappi proper gevent/greenlet support - I was already trying to make yappi work for a flask+gevent based application that I maintain, with this work it will become piece of cake!

Awesome job!

Cheers,
Diogo

@leyuwei
Copy link

leyuwei commented Sep 5, 2020

Hi,
I am pretty excited by what is going on in this PR, because recently one of my studies (which uses the Gevent to simulate behaviors of multiple network nodes) needs to profile the memory and CPU usage.
However, after I tried merging the latest yappi with the current commit in this PR, get_thread_stats() does not seem to work properly, whereas the get_func_stats() works fine. Since I am new to yappi and not quite familiar with the progress, I am wondering whether get_thread_stats() could work together with greenlets, and how get_mem_usage() could be used to profile the changing of memory usage over time.

A simplified version of my codes is as follows.

def calc(a, b):
    gevent.sleep(2)
    return a+b

g = gevent.spawn(calc, 1, 2)
g.join()
yappi.stop()
yappi.get_thread_stats().print_all()
yappi.get_func_stats().print_all()
yappi.get_mem_usage()

and an Exception raised at the get_thread_stats():

  File "C:\Program Files\Python37\lib\site-packages\yappi-1.2.4-py3.7-win-amd64.egg\yappi.py", line 1233, in get_thread_stats
    return YThreadStats().get()
  File "C:\Program Files\Python37\lib\site-packages\yappi-1.2.4-py3.7-win-amd64.egg\yappi.py", line 1080, in get
    (self._BACKEND.lower(), backend.lower())
yappi.YappiError: Cannot retrieve stats for 'native_thread' when backend is set as 'greenlet'

@Suhail-MOHD
Copy link
Contributor Author

@diogobaeder Thanks for the words of encouragement :) !

@leyuwei thanks for trying out the patch. A new API get_greenlet_stats was introduced in this change. It is to be used in place of get_thread_stats when profiling greenlets, please try it out and let me know!

Also, in case you haven't already done so, before you start yappi, you must invoke yappi.set_context_backend("greenlet") to instruct yappi to profile greenlets. Please use the code snippet at the end of this comment as a reference - #57 (comment) .

@leyuwei
Copy link

leyuwei commented Sep 7, 2020

@Suhail-MOHD Thanks for the reply and useful recommendations therein! I have already set context backend to greenlet and tested the code in your provided link, and they all work well. After making some modifications to my project, I can now get the full greenlet_stats and func_stats.
Since the last question remains unsolved, I still want to ask whether yappi had the ability to record the memory usage of each greenlet over time?
Thank you. ^v^

@sumerc
Copy link
Owner

sumerc commented Sep 7, 2020

Since the last question remains unsolved, I still want to ask whether yappi had the ability to record the memory usage of each greenlet over time?

No. Unfortunately, Yappi do not measure memory as a metric for now.

@sumerc
Copy link
Owner

sumerc commented Sep 9, 2020

Hi, @Suhail-MOHD how do you feel about merging this to master? Are there any more tests coming. TBH, the current tests seem OK to me but I might be lacking info. about edge cases for gevent.

@Suhail-MOHD
Copy link
Contributor Author

@sumerc Sorry for the delay, I didn't get a continuous block of time to work on this in the past week. I wanted to add a few test-cases for gevents in monkey-patch mode and also a few test-cases to see if greenlet stats (context-stats) are tracked correctly. I will try to get it complete by the end of this Friday, 11th September. Does that date sound alright to you?

@sumerc
Copy link
Owner

sumerc commented Sep 10, 2020

@Suhail-MOHD , No prob. at all. Please take as much time as possible until you feel confident! I just wanted to make sure you are not waiting something from me :) That is all.

@Suhail-MOHD
Copy link
Contributor Author

I caught a bug in greenlet stats. The following script demonstrates this. It spawns three greenlets, each taking (1, 2, 3) seconds of CPU respectively.

test-script:

import gevent
import yappi

def burn_cpu(sec):
    t0 = yappi.get_clock_time()
    elapsed = 0
    while (elapsed < sec):
        for _ in range(1000):
            pass
        elapsed = yappi.get_clock_time() - t0

def test():

    gs = []

    # Spawn three greenlets
    # first greenlet takes 1 second of CPU
    # second greenlet takes 2 seconds of CPU
    # third greenlet takes 3 seconds of CPU
    for i in (1, 2, 3):
        g = gevent.Greenlet(burn_cpu, i)
        g.name = "burn_cpu-%d" % (i)
        g.start()
        gs.append(g)

    for g in gs:
        g.get()

def name_cbk():
    try:
        return gevent.getcurrent().name
    except AttributeError:
        return "AnonymousGreenlet"

yappi.set_clock_type("cpu")
yappi.set_context_backend("greenlet")
yappi.set_context_name_callback(name_cbk)

yappi.start()

test()

yappi.stop()

yappi.get_greenlet_stats().print_all()

Output:

(3) suhail@C02WJ1J2HV2R ~/C/yappi (gevent-profiling)> python3 tests/test_greenlet_stats.py

name           id     ttot      scnt
burn_cpu-1     3      6.000385  1
burn_cpu-2     4      5.000312  1
burn_cpu-3     5      3.000251  1
..ousGreenlet  1      0.002979  2

Expected Output:

(3) suhail@C02WJ1J2HV2R ~/C/yappi (gevent-profiling)> python3 tests/test_greenlet_stats.py

name           id     ttot      scnt
burn_cpu-1     3      1.000000  1
burn_cpu-2     4      2.000000  1
burn_cpu-3     5      3.000000  1
..ousGreenlet  1      0.002979  2

This may be a problem with - https://github.com/sumerc/yappi/blob/master/yappi/_yappi.c#L1418 . We use the tickcount of the thread that called get_greenlet_stats. We should be using the last time the context was seen instead, i.e tickcount of the last event (_call_enter or _call_leave) seen on that particular context.

I believe this may be a problem with native thread profiling as well, writing a test to confirm.

@Suhail-MOHD
Copy link
Contributor Author

Native thread version of above test:

import threading
import yappi

class TestThread(threading.Thread):
    def __init__(self, name, *args, **kwargs):
        super(TestThread, self).__init__(*args, **kwargs)
        self.my_name = name

    @classmethod
    def local_thread_name(cls):
        try:
            return threading.current_thread().my_name
        except AttributeError:
            return "Anonymous"

def burn_cpu(sec):
    t0 = yappi.get_clock_time()
    elapsed = 0
    while (elapsed < sec):
        for _ in range(1000):
            pass
        elapsed = yappi.get_clock_time() - t0

def test():

    ts = []
    for i in (1, 2, 3):
        name = "burn_cpu-%d" % (i)
        t = TestThread(name, target=burn_cpu, args=(i, ))
        t.start()
        ts.append(t)
    for t in ts:
        t.join()

yappi.set_clock_type("cpu")
yappi.set_context_name_callback(TestThread.local_thread_name)

yappi.start()

test()

yappi.stop()

yappi.get_thread_stats().print_all()

Output:

(3) suhail@C02WJ1J2HV2R ~/C/yappi (gevent-profiling)> python3 tests/test_native_threads.py

name           id     tid              ttot      scnt
burn_cpu-3     3      123145417895936  0.573743  332
burn_cpu-2     2      123145412640768  0.573718  338
burn_cpu-1     1      123145407385600  0.573575  168
Anonymous      0      4321620864       0.001654  7

Expected Output:

(3) suhail@C02WJ1J2HV2R ~/C/yappi (gevent-profiling)> python3 tests/test_native_threads.py

name           id     tid              ttot      scnt
burn_cpu-3     3      123145417895936  3.000000  332
burn_cpu-2     2      123145412640768  2.000000  338
burn_cpu-1     1      123145407385600  1.000000  168
Anonymous      0      4321620864       0.001654  7

Store the last seen tickcount for every context. Use this while
calculating 'ttot' for the context
@Suhail-MOHD
Copy link
Contributor Author

I have fixed the last bug mentioned. Please ignore the force-push, silly mistake on my part.

@Suhail-MOHD
Copy link
Contributor Author

Yappi in greenlets mode cannot catch thread-pool workers spawned by the gevent native threadpool API

Test:

from gevent import monkey
monkey.patch_all()

import gevent
import yappi

def burn_cpu(sec):
    t0 = yappi.get_clock_time()
    elapsed = 0
    while (elapsed < sec):
        for _ in range(1000):
            pass
        elapsed = yappi.get_clock_time() - t0

def b():
    burn_cpu(0.1)

def driver():
    g1 = gevent.get_hub().threadpool.apply_async(b)
    g2 = gevent.get_hub().threadpool.apply_async(b)
    g1.get()
    g2.get()

yappi.set_clock_type("cpu")
yappi.start(builtins=True)
driver()
yappi.stop()
stats = yappi.get_func_stats()
stats.print_all()

The function b does not show up in the stats at all. The problem is that gevent uses the lower-level thread API - start_new_thread which does not respect the profile function set by yappi here. As a result, the profiler is never attached to those threads. Note, this is only the case for threads spawned after yappi has been started. Threads spawned earlier will be profiled properly.

@Suhail-MOHD
Copy link
Contributor Author

Suhail-MOHD commented Sep 15, 2020

@sumerc, I could not find a way to hook into gevent's native threadpool to attach the profiler. I don't think there is a way to solve this within yappi.

One way I was exploring was to configure a separate threadpool class via -
threadpool class configuration. But I don't think there is a way to subclass the original threadpool class and override methods so that the profiler is attached before the thread runs. At least, there is no way to do it without overriding non-public methods.

So,

  • Should we continue and allow this change in with a known limitation and doc it? I can look into raising a request with the gevent team to get input on how to solve this after this change is in.
    OR
  • Get input from the gevent team and fix before closing this issue

I would go for option 1 considering that profiling greenlets in the native threadpool may not be a very common use-case.

Apart from that, I am done with testing and everything else seems alright. Let me know if you need me to update anything else. Should I update the usage guide for this feature as well?

@sumerc
Copy link
Owner

sumerc commented Sep 15, 2020

Hi @Suhail-MOHD , sorry for late response. I could only find some time. You seem to have fixed the issue you mentioned. That is great! :)

I completely agree going with Option1 like you suggested. I assume gevent native threads does not conform to threading.setprofile() API. That is why we cannot capture those.

Anyway, I do not see this issue a blocker at all. It would be nice if you could update the usage guide including the limitation you mentioned above. Please do not spend too much time on doc. update, just roughly write down notes, I will go over them in detail. You already have worked all the way for this:)

My plan:

  • merge all these changes into master.
  • update the missing parts of documentation on master.
  • release a new version.

If this sounds ok, I am merging this?

@diogobaeder
Copy link

Hi guys,

So this means that we won't be able to profile specific greenlets? Isn't there a way around this, so that yappi can be friendly on a gevent usage situation?

Thanks,
Diogo

@Suhail-MOHD
Copy link
Contributor Author

@sumerc Great. I'll add the limitation to the docs right now.

@diogobaeder Yes, but we will only miss greenlets spawned on the native threadpool - http://www.gevent.org/api/gevent.threadpool.html . This threadpool is used to run functions that don't context switch at all and hence could end up blocking all other greenlets. Example: if you have a C extension that performs some blocking system calls and gevent has no way of monkey-patching this to fit into its event loop, then you can spawn this function in the native threadpool so that it does not block other greenlets from running. Only a limited set of functions should be run on the threadpool. I am still searching for an answer about how to fix this limitation - gevent/gevent#1670

But, this is not a very common use-case and so we don't want to block the entire support for greenlets for this use-case. The plan is to get this change in and after that work on getting a way around this limitation with the help of the gevents team.

@diogobaeder
Copy link

Ah, got it! Sorry for my confusion :-)

Thanks again, guys!

@Suhail-MOHD
Copy link
Contributor Author

@sumerc , I have added one last and final change to retrieve context name from greenlet class by default and some tests for it. While writing the docs, I realised it made sense to do this by default instead of expecting the users to provide the context name callback.

I have updated docs as well: added a usage guide under docs/ and updated README with a snippet for greenlet profiling.

I think we can merge this now 😄 , Unless you have comments about the last few commits of course.

@sumerc
Copy link
Owner

sumerc commented Sep 17, 2020

@Suhail-MOHD Thanks for your great work on this! :)

I will merge this to master. Tweak docs/play a bit more and release a new version.

@sumerc sumerc merged commit 30292a9 into sumerc:gevent-profiling Sep 17, 2020
@Suhail-MOHD
Copy link
Contributor Author

Great @sumerc! Thanks for answering all my questions during the course of this change! Please feel free to loop me in again in case there are any bugs / questions you want me to answer.

@Suhail-MOHD Suhail-MOHD deleted the gevent-profiling branch September 17, 2020 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants