-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make yappi grok greenlets #57
Make yappi grok greenlets #57
Conversation
…ame thread This change adds support for greenlets in yappi by doing the following: - ensure '_profile_thread' is called only from the target thread to be profiled. '_profile_thread' invokes the '_current_context_id' operation which retrieves the execution context of a thread. Using '_current_context_id' to retrieve the context information from another thread is not guaranteed to work and is hard to get right when yappi is configured with 'set_context_id_callback' because the callback is not passed the information about what thread it has to fetch information for. It is also to hard to achieve the above when using greenlets because there is no obvious way to retrieve the greenlet currently executing on another thread. i.e there is no simple way to retrieve the greenlet currently executing on threadA from threadB. To get around this, this change guarantees that '_profile_thread' and '_ensure_thread_profiled' is only called from the target thread by removing their invocations from the 'start' sequence when 'multithreading' is enabled and delaying it. This is done by attaching a separate profiler func callback (profile_event and profile_event_c) to the threads. These functions first run '_ensure_thread_profiled' in the target thread context and change the profiler func callback to '_yapp_callback'. This way the work done by the 'start' sequence is reduced and '_ensure_thread_profiled' is delayed to the first time a callback is invoked on the target thread. - add 'set_ctx_backend' and allow specifying threading backend as 'greenlets' - modify '_current_context_id' to identify the greenlet currently being executed - account for interleaving greenlet executions on the same native thread 'greenlet' is a co-operative multitasking framework that introduces user controlled threads called 'greenlets'. Each native python thread can run a number of greenlets. Greenlets on the same thread co-operatively switch between one another using 'greenlet.switch'. it is important to note that a greenlet can only execute on the thread it was spawned on. As a result, greenlets spawned on one thread can only swithc between each other. Greenlets does not play wel with yappi. Yappi uses thread local stats to track the CPU usage of each frame. It tracks the usage at the entry of each function and at the exit of each function and subtracts the values to get the total clock cycles spent in that frame. This does not work well with greenlets. Let's take an example of two greenlets A and B and view the following sequence of events. - greenletA is currently running - greenletA enters function 'foo' - greenletA co-operatively switches to greenletB - greenletB consumes 1 second of CPU - greenletB switches over back to greenletA - greenletA exits function 'foo' The CPU usage of function foo will be 1 second more than it should be. To account for this we have to track the context switches and adjust stats. The solution implemented in this change is a derivative of https://github.com/ajdavis/GreenletProfiler with some modifications to account for multiple native threads To track context switches: - declare a new thread local variable 'tl_prev_ctx'. This needs to be thread local since multiple threads can exist. Greenlet context switches should be tracked at a thread level rather than at a global level. - 'tl_prev_ctx' is initialised in '_ensure_thread_profiled'. It is initialised to the greenlet currently running on that thread - use 'tl_prev_ctx' in '_yapp_callback' to determine if the context changed since the last time it was set. To adjust stats, if a switch is noticed from 'prev'->'current', then: - pause 'prev'. Store the time it was paused under the '_ctx' struct associated with that greenlet - resume the current greenlet. use the 'paused_at' field to determine how long the greenlet was paused for. Add this value to all timestamps in the current callstack.
@sumerc I have written down the proposal along with a change that reflects it. I am sending it along at this stage to get early feedback. There are still a few details like build issues that I am yet to fix, I have listed them down at the end of the last message. I hope what is done is enough to convey the idea and get feedback. |
Thanks for the awesome work! I will hopefully look at this today. |
yappi/_yappi.c
Outdated
{ | ||
NATIVE_THREAD = 0x00, | ||
GREENLET = 0x01, | ||
} _subsystem_type_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ctx_type_t
is better IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will change this. I had initially used the term 'subsystem' in place of 'backend'. Missed this while changing it to 'backend'. Thanks for catching!
yappi/_yappi.c
Outdated
// TODO(suhail): Understand how it is possible to reach here when | ||
// 'flags.multithreaded' is false. Also if multithreaded is false, then | ||
// this is definitely not the initial thread that called 'start' and so | ||
// we should not continue to invoke '_yapp_callback' as we do now. | ||
if (flags.multithreaded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this only checks for multithreaded==true
. False is not possible as you indicate. That if check is there just for extra readability/verbosity? Not sure :) YOu can remove it if you think it is confusing. THis is for the case, where we have started yappi and some other thread started running and a profile event is called for the new thread. _ensure_thread_profiled() is always valid for multithreaded profiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh alright. I assumed it was added there to handle an edge case when multithreaded is false but I could neither understand how it would be possible nor how this would fix anything if hit. I'll remove it in case somebody else stumbles upon the same problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-thinking about the problem again I think the correct behaviour should be:
if (!flags.multithreaded) {
Py_RETURN_NONE;
}
LIke you indicated, right? I was mistaken saying that there is no way this code might be called with multiple threads. It can be called, since profile_event will be called for every thread generated via using threading.set_profile
API. However, there is still no problem. Because inside _current_context_id
there is another check for this which fixes the problem:
if (!flags.multithreaded) {
return 0;
}
This makes sure only single thread is profiled. But like you suggested, it would be better if we check for this condition early in the profile_event
function.
Please disregard above. profile_event is only called when profile_threads is True:
def start(builtins=False, profile_threads=True):
...
if profile_threads:
threading.setprofile(_profile_thread_callback)
First: Thanks for your work on this! I have inspected the code a bit. There are still some unclear points to me but overall design seems fine. I would like to ask couple of questions, though: 1/ I have seen a Greenlet dependency has been added. Is it possible to remove it? Previously I was thinking like this: instead of getting the current greenlet id from C extension, I was thinking to call Python side to get the 2/ This one is important for me: why we don't utilize 3/ I also could not quite understand why we need I have also answered questions in the code and write few comments. |
yappi/_yappi.c
Outdated
Py_RETURN_NONE; | ||
} | ||
|
||
// TODO(suhail.muhammed): Is this the correct condition to check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But I think we also need to add a check for: there should be no stats available. all stats cleared. See: yapphavestats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this as well.
Hi @sumerc, thanks for the quick review!
Yes, I agree with you that a hard greenlet dependency is not a good idea. Users that don't profile greenlets must not be forced to install it. When using the delayed import model, could you clarify how it would work when a user who doesn't have greenlet installed specifies backend as greenlet. Will it work like the following example?
If this is how you envision it to work, then I believe same can be achieved while using the C api. I will try to demonstrate this in my next comment. Also, we cannot use
Using the
Yes this is related to 2. It was done to remove the invocation of |
Update regarding the greenlet dependency. I am trying to use a python function instead of the C api to retrieve the context id as suggested by you, @sumerc . It is easier that way and removes the hard dependency on greenlets. I could not find way to achieve this with the C API. Since we can't use
|
Yes. Exactly. I am not sure if this function needs to be thread-safe but yes. The idea is correct. Please do not forget to use |
I was thinking about using
The core of the problem is: yappi uses some convention in its APIs related to threading for years and I think they should be doing what they are already doing. Now adding a new functionality, IMHO we should add new params/APIs instead of changing the meaning of existing ones. WDYT? I am still investigating any potential issues in parallel, too. |
Let me give some more information about the internals of Yappi: Every function profiled has an entry in
``PyCodeObject So, if we set a new So, I think using |
One last note about above: #57 (comment) Even though it seems using |
@sumerc I understand what you are suggesting but I am not sure The difference between greenlets and asyncio is that each greenlet has its own independent python callstack, whereas all tasks in a thread in asyncio work on the same callstack. The next sections elaborates further on this. In asyncio, multiple tasks running on one thread context switch between each other but share the same python callstack - For asyncio, it makes sense to have all coroutines under one context because they all work on the same python callstack and in yappi, one context has one callstack. callstack tracing is not impacted by coroutines. There are other changes you made though for other reasons:
Greenlets are quite different from python coroutines in that each greenlet has it's own separate python stack. All greenlets running on one thread use the same Yappi today associates one callstack with one context. Each context tracks the callstack under To ensure that we trace callstacks properly, it is important that one greenlet maps back to one callstack in yappi. Since one callstack maps to one context today, I thought it would be easier to to map a greenlet to one context. So I propose that we go forwards with If using Please let me know how to proceed. |
Ok. Fair enough. Thanks for your investigation on this. I just wanted to see the potential for The next job is to take a quick peek on the API and see if there are anything that does not make sense with greenlets. (e.g: And do extensive testing as much as possible, of course :) |
Thanks @sumerc, I will resume development for this and work on the following now:
|
I have re-viewed your code with more attention and written some more comments as we made sure using |
- move greenlet specificd ata stored under _ctx to a separate field - replace '_subsystem_type_y' with '_ctx_type_t' - use yapphavestats instead of yapprunning as condition for changing backend type
HI @sumerc , sorry for the long silence, I was slowed down last week by some other tasks. I have uploaded what I have completed so far. Here's what is remaining:
|
I have another question, is invoking |
Thanks @sumerc! I am working on adding the tests right now, will update soon. |
Hi folks, Just wanted to leave here some words of encouragement: thanks a lot for the great work! I'm really excited to see that you're getting close to finishing giving yappi proper gevent/greenlet support - I was already trying to make yappi work for a flask+gevent based application that I maintain, with this work it will become piece of cake! Awesome job! Cheers, |
Hi, A simplified version of my codes is as follows.
and an Exception raised at the
|
@diogobaeder Thanks for the words of encouragement :) ! @leyuwei thanks for trying out the patch. A new API Also, in case you haven't already done so, before you start yappi, you must invoke |
@Suhail-MOHD Thanks for the reply and useful recommendations therein! I have already set context backend to |
No. Unfortunately, Yappi do not measure memory as a metric for now. |
Hi, @Suhail-MOHD how do you feel about merging this to |
@sumerc Sorry for the delay, I didn't get a continuous block of time to work on this in the past week. I wanted to add a few test-cases for gevents in monkey-patch mode and also a few test-cases to see if greenlet stats (context-stats) are tracked correctly. I will try to get it complete by the end of this Friday, 11th September. Does that date sound alright to you? |
@Suhail-MOHD , No prob. at all. Please take as much time as possible until you feel confident! I just wanted to make sure you are not waiting something from me :) That is all. |
I caught a bug in greenlet stats. The following script demonstrates this. It spawns three greenlets, each taking (1, 2, 3) seconds of CPU respectively. test-script:
Output:
Expected Output:
This may be a problem with - https://github.com/sumerc/yappi/blob/master/yappi/_yappi.c#L1418 . We use the I believe this may be a problem with native thread profiling as well, writing a test to confirm. |
Native thread version of above test:
Output:
Expected Output:
|
a12e8f7
to
422d3c7
Compare
Store the last seen tickcount for every context. Use this while calculating 'ttot' for the context
422d3c7
to
66e1ad8
Compare
I have fixed the last bug mentioned. Please ignore the force-push, silly mistake on my part. |
Yappi in greenlets mode cannot catch thread-pool workers spawned by the gevent native threadpool API Test: from gevent import monkey
monkey.patch_all()
import gevent
import yappi
def burn_cpu(sec):
t0 = yappi.get_clock_time()
elapsed = 0
while (elapsed < sec):
for _ in range(1000):
pass
elapsed = yappi.get_clock_time() - t0
def b():
burn_cpu(0.1)
def driver():
g1 = gevent.get_hub().threadpool.apply_async(b)
g2 = gevent.get_hub().threadpool.apply_async(b)
g1.get()
g2.get()
yappi.set_clock_type("cpu")
yappi.start(builtins=True)
driver()
yappi.stop()
stats = yappi.get_func_stats()
stats.print_all() The function |
@sumerc, I could not find a way to hook into gevent's native threadpool to attach the profiler. I don't think there is a way to solve this within yappi. One way I was exploring was to configure a separate threadpool class via - So,
I would go for option 1 considering that profiling greenlets in the native threadpool may not be a very common use-case. Apart from that, I am done with testing and everything else seems alright. Let me know if you need me to update anything else. Should I update the usage guide for this feature as well? |
Hi @Suhail-MOHD , sorry for late response. I could only find some time. You seem to have fixed the issue you mentioned. That is great! :) I completely agree going with Option1 like you suggested. I assume gevent native threads does not conform to Anyway, I do not see this issue a blocker at all. It would be nice if you could update the usage guide including the limitation you mentioned above. Please do not spend too much time on doc. update, just roughly write down notes, I will go over them in detail. You already have worked all the way for this:) My plan:
If this sounds ok, I am merging this? |
Hi guys, So this means that we won't be able to profile specific greenlets? Isn't there a way around this, so that yappi can be friendly on a gevent usage situation? Thanks, |
@sumerc Great. I'll add the limitation to the docs right now. @diogobaeder Yes, but we will only miss greenlets spawned on the native threadpool - http://www.gevent.org/api/gevent.threadpool.html . This threadpool is used to run functions that don't context switch at all and hence could end up blocking all other greenlets. Example: if you have a C extension that performs some blocking system calls and gevent has no way of monkey-patching this to fit into its event loop, then you can spawn this function in the native threadpool so that it does not block other greenlets from running. Only a limited set of functions should be run on the threadpool. I am still searching for an answer about how to fix this limitation - gevent/gevent#1670 But, this is not a very common use-case and so we don't want to block the entire support for greenlets for this use-case. The plan is to get this change in and after that work on getting a way around this limitation with the help of the gevents team. |
Ah, got it! Sorry for my confusion :-) Thanks again, guys! |
@sumerc , I have added one last and final change to retrieve context name from greenlet class by default and some tests for it. While writing the docs, I realised it made sense to do this by default instead of expecting the users to provide the context name callback. I have updated docs as well: added a usage guide under I think we can merge this now 😄 , Unless you have comments about the last few commits of course. |
@Suhail-MOHD Thanks for your great work on this! :) I will merge this to master. Tweak docs/play a bit more and release a new version. |
Great @sumerc! Thanks for answering all my questions during the course of this change! Please feel free to loop me in again in case there are any bugs / questions you want me to answer. |
Proposal to make yappi grok greenlets
This PR is a proposal to make yappi understand greenlets. The change allows yappi to map each greenlet to a unique execution context. Further, it also adjusts CPU stats measured to account for interleaving greenlet execution on one thread.
Changes required
The following changes are required by the fix:
ensure
_profile_thread
is called only from the target thread to be profiled._profile_thread
invokes the_current_context_id
operation which retrievesthe execution context of a thread.
Using
_current_context_id
to retrieve the context information from another thread is not guaranteed to work and is hard to get right when yappi is configured withset_context_id_callback
because the callback is not passed the information about what thread it has to fetch information for.It is also to hard to achieve the above when using greenlets because there is no obvious way to retrieve the greenlet currently executing on another thread. i.e there is no simple way to retrieve the greenlet currently executing on
threadA
fromthreadB
.To get around this, this change guarantees that
_profile_thread
and_ensure_thread_profiled
is only called from the target thread by removing their invocations from the 'start' sequence when 'multithreading' is enabled and delaying it. This is done by attaching a separate profiler func callback (profile_event and profile_event_c) to the threads. These functions first run_ensure_thread_profiled
in the target thread context and change the profiler func callback to_yapp_callback
. This way the work done by thestart
sequence is reduced and_ensure_thread_profiled
is delayed to the first time a callback is invoked on the target thread.add
set_ctx_backend
and allow specifying threading backend asgreenlets
as suggested by @sumercmodify
_current_context_id
to identify the greenlet currently being executed when backend isgreenlets
. This is done the same way as it is for threads, except here we use the dictionary of the greenlet rather than that of the thread.account for interleaving greenlet executions on the same native thread
greenlet
is a co-operative multitasking framework that introduces user controlled threads calledgreenlets
. Each native python thread can run a number of greenlets. Greenlets on the same thread co-operatively switch between one another usinggreenlet.switch
. it is important to note that a greenlet can only execute on the thread it was spawned on. As a result, greenlets spawned on one thread can only switch between each other.Greenlets do not play well with yappi. Yappi uses thread local stats to track the CPU usage of each frame. It tracks the usage at the entry of each function and at the exit of each function and subtracts the values to get the total clock cycles spent in that frame. This does not work well with greenlets. Let's take an example of two greenlets A and B and view the following sequence of events.
greenletA
is currently runninggreenletA
enters functionfoo
greenletA
co-operatively switches to greenletBgreenletB
consumes 1 second of CPUgreenletB
switches over back to greenletAgreenletA
exits functionfoo
The CPU usage of function foo will be 1 second more than it should be because greenletB's execution is interleaved.
To account for this we have to track the context switches and adjust stats. The solution implemented in this change is a derivative of https://github.com/ajdavis/GreenletProfiler with some modifications to account for multiple native threads
To track context switches:
tl_prev_ctx
. This needs to be thread local since multiple threads can exist. Greenlet context switches should be tracked at a thread level rather than at a global level.tl_prev_ctx
is initialised in_ensure_thread_profiled
. It is initialised to the greenlet currently running on that threadtl_prev_ctx
in_yapp_callback
to determine if the context changed since the last time it was set.To adjust stats, if a switch is noticed from
prev->current
, then:prev
. Store the time it was paused under thepaused_at
field of the context associated withprev
paused_at
field and the current value oftickcount
to determine how long the greenlet was paused for. Since a greenlet can only execute on one thread, subtracting these values is an accurate measure.NOTE: Another way to track context switches is by using
greenlet.settrace
(see https://greenlet.readthedocs.io/en/latest/). However installing this trace function for each native thread and uninstalling it did not seem easy to implement.What is remaining?
This is not a complete fix, here's what is remaining:
tl_prev_ctx
platform independent.__thread
is not portable to windows AFAIKgreenlet.h
is required by yappi.Tests performed