Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profiling support for Gevent #53

Merged
merged 18 commits into from
Sep 17, 2020
Merged

Add profiling support for Gevent #53

merged 18 commits into from
Sep 17, 2020

Conversation

sumerc
Copy link
Owner

@sumerc sumerc commented Apr 14, 2020

This PR aims to add support profiling Gevent greenlets with respect to multithreading.

@mguijarr
Copy link

@sumerc so, the added tests are similar to those for asyncio : covering both single and multi threaded.

@advance512
Copy link

Is there any help that can be offered here?
This would be extremely useful for us.

@sumerc
Copy link
Owner Author

sumerc commented Jun 17, 2020

Is there any help that can be offered here?

Help would be really appreciated! I could not find time to work on this. But it will take some time to test and tweak. We can implement something working in few days, but manual testing/tweaking/automating tests will take time. We should also throw whatever we can to it. Basically, I can implement a PoC, and it would really be helpful your guys experience on gevent to help me break it? Then we could tweak it more.

Let me think few more days on this and I will throw something to play with, hopefully. @mguijarr has already done great work on helping with the tests and explaining the internals of gevent in the relevant issue: #47

@sumerc
Copy link
Owner Author

sumerc commented Jun 17, 2020

Ok guys.

Below is the first problem we need to solve:

import gevent
import yappi


def burn_cpu(sec):
    t0 = yappi.get_clock_time()
    elapsed = 0
    while (elapsed < sec):
        for _ in range(1000):
            pass
        elapsed = yappi.get_clock_time() - t0


def foo():
    burn_cpu(0.1)
    gevent.sleep(1.0)


def bar():
    burn_cpu(0.1)
    gevent.sleep(1.0)


#yappi.set_clock_type("wall")
yappi.start()
g1 = gevent.spawn(foo)
g2 = gevent.spawn(foo)
gevent.wait([g1, g2])
yappi.get_func_stats(
    filter_callback=lambda x: yappi.func_matches(x, [foo, bar])
).print_all()

Now if the clock type is CPU(which is by default) the output is as following:

Clock type: CPU
Ordered by: totaltime, desc

name                                  ncall  tsub      ttot      tavg      
gevent_yappi.py:14 foo                1      0.000038  0.200483  0.200483
gevent_yappi.py:19 bar                1      0.000034  0.100193  0.100193

CPU time of foo becomes 0.2 as gevent switches to another greenlet (bar func) and accumulates that CPU time to foo.

I will be working on a patch to fix this, then we will move onto wall time.

@mguijarr , @advance512 How can I make foo and bar run in different threads as well in this simple example? monkey_patch and monkey_patch without threads? I would like to try all kinds of possible gevent running modes?

@advance512
Copy link

Why do you want to run foo and bar on different threads? That isn't a common use-case for gevent.

What you can do is use gevent's native threadpool:
http://www.gevent.org/api/gevent.threadpool.html

Or, create two gevent Hub() objects, each on a different thread, and have each spawn either foo or bar.

@sumerc
Copy link
Owner Author

sumerc commented Jun 18, 2020

Why do you want to run foo and bar on different threads? That isn't a common use-case for gevent.

I would like to make tests where same function is being called concurrently from different greenlets(same thread+different threads), or some other tests as well related to multiple threads. That is why I asked how to do it.

What you can do is use gevent's native threadpool:
http://www.gevent.org/api/gevent.threadpool.html
...

Thanks for the information. Will look into those!

@sumerc
Copy link
Owner Author

sumerc commented Jul 23, 2020

I have been trying to allocate time to this issue. The problem is: this issue is not easy and requires continuous/focused work which I currently don't have and it is really pissing me off :)

Here are the steps that needs to support Gevent from Yappi:

1/ Choose a minimum gevent version that we would like to support.
2/ Add a new API set_ctx_backend. THat both supports gevent and asyncio. default will be asyncio.
3/ Implement a new set_ctx_id callback that returns the current greenlet id on every function call enter/exit.
4/ On C side, if you detect a greenlet switch we will be correcting the calculations. SInce multiple greenlets can run in a single thread, all CPU time values gets accumulated over each other. I have also seen issues with wall-time too. THis is where we need to debug to understand what is happening when greenlet switch occurs. What values need to be corrected in minimum to give correct results. (to same thread or to different thread). by putting hooks into call_enter/call_leave events. This is the hardest part to do correct.
5/ Write lots of tests. (think about edge cases)
6/ Update docs.
7/ Release.

If anybody would like to work on this I can help/assist in the implementation with any of above steps/unclear points. Otherwise I will be setting this to Blocked and wait few more months unfortunately. :(

@Suhail-MOHD
Copy link
Contributor

@sumerc Our team at nutanix would like to help!

Our service is built upon gevent and its monkey patch feature. Lately we have been searching for a profiling tool and have only found statistical profilers to work well with gevents. We were excited to find GreenletProfiler, but after further investigation we realised it cannot report CPU usage reliably for the same reason you've mentioned above.

We would like to contribute to this issue and help fix it.

As a next step, I would like to provide a fix proposal with a POC alongside it. May I proceed and pick this up?

@sumerc
Copy link
Owner Author

sumerc commented Jul 25, 2020

Hi,

I would be more than happy to accept a fix proposal and a POC!

There are already some tests written by mguijarr, for both single/multithreaded cases. I hope they might help you for edge cases.

Please proceed :)

@advance512
Copy link

Exciting! Can't wait to see what @Suhail-MOHD cooks up.

…ame thread

This change adds support for greenlets in yappi by doing the following:

- ensure '_profile_thread' is called only from the target thread to be profiled.

  '_profile_thread' invokes the '_current_context_id' operation which retrieves
  the execution context of a thread.

  Using '_current_context_id' to retrieve the context information from
  another thread is not guaranteed to work and is hard to get right
  when yappi is configured with 'set_context_id_callback' because the callback
  is not passed the information about what thread it has to fetch information for.

  It is also to hard to achieve the above when using greenlets because
  there is no obvious way to retrieve the greenlet currently executing
  on another thread. i.e there is no simple way to retrieve the greenlet
  currently executing on threadA from threadB.

  To get around this, this change guarantees that '_profile_thread' and
  '_ensure_thread_profiled' is only called from the target thread by removing
  their invocations from the 'start' sequence when 'multithreading' is enabled
  and delaying it. This is done by attaching a separate profiler func callback
  (profile_event and profile_event_c) to the threads. These functions first
  run '_ensure_thread_profiled' in the target thread context and change the profiler
  func callback to '_yapp_callback'. This way the work done by the 'start' sequence
  is reduced and '_ensure_thread_profiled' is delayed to the first time a callback
  is invoked on the target thread.

- add 'set_ctx_backend' and allow specifying threading backend as 'greenlets'

- modify '_current_context_id' to identify the greenlet currently being executed

- account for interleaving greenlet executions on the same native thread

  'greenlet' is a co-operative multitasking framework that introduces user controlled
  threads called 'greenlets'. Each native python thread can run a number of greenlets.
  Greenlets on the same thread co-operatively switch between one another using
  'greenlet.switch'. it is important to note that a greenlet can only execute on the
  thread it was spawned on. As a result, greenlets spawned on one thread can only swithc
  between each other.

  Greenlets does not play wel with yappi. Yappi uses thread local stats to track the CPU
  usage of each frame. It tracks the usage at the entry of each function and at the
  exit of each function and subtracts the values to get the total clock cycles spent
  in that frame. This does not work well with greenlets. Let's take an example of two
  greenlets A and B and view the following sequence of events.

  - greenletA is currently running
  - greenletA enters function 'foo'
  - greenletA co-operatively switches to greenletB
  - greenletB consumes 1 second of CPU
  - greenletB switches over back to greenletA
  - greenletA exits function 'foo'

  The CPU usage of function foo will be 1 second more than it should be.

  To account for this we have to track the context switches and adjust stats.
  The solution implemented in this change is a derivative of
  https://github.com/ajdavis/GreenletProfiler with some modifications to
  account for multiple native threads

  To track context switches:
    - declare a new thread local variable 'tl_prev_ctx'. This needs to be
      thread local since multiple threads can exist. Greenlet context switches
      should be tracked at a thread level rather than at a global level.
    - 'tl_prev_ctx' is initialised in '_ensure_thread_profiled'. It is
      initialised to the greenlet currently running on that thread
    - use 'tl_prev_ctx' in '_yapp_callback' to determine if the context
      changed since the last time it was set.

  To adjust stats, if a switch is noticed from 'prev'->'current', then:
    - pause 'prev'. Store the time it was paused under the '_ctx' struct
      associated with that greenlet
    - resume the current greenlet. use the 'paused_at' field to determine
      how long the greenlet was paused for. Add this value to all timestamps
      in the current callstack.
@achauvin-wish
Copy link

We were excited to find GreenletProfiler, but after further investigation we realised it cannot report CPU usage reliably for the same reason you've mentioned above.

I was going to run some tests and saw your reply here so I wanted to ask about the results you obtained. Does the GreenletProfiler not perform accurately on python 2.7?

@Suhail-MOHD
Copy link
Contributor

@achauvin-wish I would like to amend my previous statement after having spent some time understanding both GreenletProfiler and yappi source. GreenletProfiler did not return accurate results for the service my team works on . Our service uses multiple native threads, one thread running greenlets and other threads running operations that cannot work well if run in the greenlet thread. I can write a test that demonstrates how GreenletProfiler does not work in such a system.

For python2.7, I believe GreenletProfiler may work in a system with one native thread running multiple greenlets but I have not tested it out.

@sumerc
Copy link
Owner Author

sumerc commented Aug 13, 2020

For reference: the issue is ongoing with another PR: #57

Suhail-MOHD and others added 9 commits August 24, 2020 22:41
This change:
 - Introduces new flag profile_greenlets under 'start'
 - Introduces new API get_greenlet_stats to retrieve greenlet stats
 - get_thread_stats will fail when profiling greenlets and profile_threads
   is ignore. Conversely, get_greenlet_stats will fail when profiling threads
   and profile_greenlets is ignored.
Store the last seen tickcount for every context. Use this while
calculating 'ttot' for the context
@sumerc
Copy link
Owner Author

sumerc commented Sep 17, 2020

#57 has been merged. Thanks to @Suhail-MOHD, Yappi has initial support for gevent profiling!

A new release will hopefully coming in few days!

@sumerc sumerc merged commit ba0a264 into master Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants