Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timings wrong #112

Closed
nschloe opened this issue Jun 27, 2018 · 16 comments
Closed

timings wrong #112

nschloe opened this issue Jun 27, 2018 · 16 comments

Comments

@nschloe
Copy link
Contributor

nschloe commented Jun 27, 2018

SnakeViz tries to display the full timed call tree, but the profile file doesn't have the data for it. This leads to bogus information in SnakeViz. Consider

import time


def a(t0, t1):
    c(t0)
    d(t1)
    return


def b():
    return a(1, 4)


def c(t):
    time.sleep(t)
    return


def d(t):
    time.sleep(t)
    return


if __name__ == "__main__":
    a(4, 1)
    b()

SnakeViz gives

sv

which is wrong: c() really takes 4 seconds when called via root -> a() -> c(), and d() takes 1 second when called via root -> a() -> d(); SnakeViz shows both as 5 seconds. More timings are wrong in the above screenshot.

To work around this sort of bugs, I created tuna which gives only the information that can be deduced from the profile:

t

@jiffyclub
Copy link
Owner

This is an important callout and such a frustrating limitation of the cProfile data. In 302fbc7 I added a note the docs describing this limitation and linking to this issue as an explainer, thanks for posting it!

@nschloe
Copy link
Contributor Author

nschloe commented Jun 27, 2019

I don't know if this should be closed; it's certainly not fixed.

The documentation update speaks of missing information, but really SnakeViz shows incorrect information without warning. (See the example above.) I think that's pretty bad since people are just taking it for granted.

In my opinion, a documentation update doesn't cut it here; very few people will actually read it. One solution would be to remove the wrong entries from the visualization.

@jiffyclub
Copy link
Owner

But then you wouldn't be showing the entire call tree, right?

@nschloe
Copy link
Contributor Author

nschloe commented Jun 27, 2019

Indeed. There is no way to know the entire call tree either, that's just one property of the Python profile files.

@jiffyclub
Copy link
Owner

Yeah, true.

sigh

@rwarren
Copy link

rwarren commented Jun 19, 2020

FWIW, this false reporting just confused the heck out of me and caused me to spend some time trying to figure out why calls were being made that actually weren't. Oops! 🙄

Years ago I used to use runsnakerun for profiling, and I don't ever recall being bit by this, and I used it quite a lot. Was this just luck, or did runsnakerun do something trickier to infer nested timing?

FWIW, slides 8-12 in the deck for this ancient talk show an example of how runsnakerun handled nesting:
https://speakerdeck.com/rwarren/a-brief-intro-to-profiling-in-python?slide=8

I'll give tuna a try, and may also spark up a python2 venv to see how runsnakerun would have handled the example above.

@rwarren
Copy link

rwarren commented Jun 19, 2020

FWIW, here is what runsnakerun (unfortunately python2 only) shows for the sample script in the first post of this issue:

image

There is clearly a MUCH higher priority on where time is spent vs trying to display a call tree with imperfect information available (although the call tree is still available by hovering). It still has the the same fundamental limitations due to the available data in the profile output, but it seems less obvious. Perhaps this is why I never ran into issues with runsnakerun, since the primary thing I care about is reducing time spent when doing time optimizations, not the call tree.

A squaremap visual would be an awesome addition to snakeviz!!

@pmirallesr
Copy link

pmirallesr commented Jul 1, 2020

I just ran into this issue. It's pretty major in my opinion, I'm being lied to about much of the information that I actually wanted to learn about in the first place. Tuna is great at solving the false timings, but it's just much less user friendly than snakeviz (no offense). I personally don't see why this is a closed issue, am I missing something?

@nschloe
Copy link
Contributor Author

nschloe commented Jul 2, 2020

@pmirallesr

Tuna is great at solving the false timings, but it's just much less user friendly than snakeviz (no offense)

If you could detail what you mean at https://github.com/nschloe/tuna/issues, it'd be happy to take a look.

@pmirallesr
Copy link

@nschloe Sure! I just did, I hope you find it useful

@matkoniecz
Copy link

matkoniecz commented Jul 31, 2020

I personally don't see why this is a closed issue, am I missing something?

The profiler still lies about timings, this issue was closed in commit that just documented that core functionality is broken - see 302fbc7 ("jiffyclub closed this in 302fbc7 on Jun 22, 2019 ") instead of not displaying unknown timings

@Andrej730
Copy link

Just to be fair, snakeviz does highlight all entries of the same block indicating that the time comes from sum of all those calls.

image

@EmilianoG-byte
Copy link

What is the current best solution to find out the time distribution of a function with multiple calls? Using snakeviz I cannot figure out where are the other 6s coming from for cost_function_jax_jit

Screenshot 2025-01-15 at 14 43 57

I tried tuna as suggested by @nschloe, but this gives me even less information:

Screenshot 2025-01-15 at 14 45 34

I tried looking into other branches of the call tree "hunting" for those 6s, but had no luck. Thank you for any help in advance!

@nschloe
Copy link
Contributor Author

nschloe commented Jan 15, 2025

@EmilianoG-byte Python's profile just doesn't contain that information. snakeviz's info is incorrect. tuna shows you exactly what you get from the profile.

@jiffyclub
Copy link
Owner

jiffyclub commented Jan 15, 2025

@EmilianoG-byte You might be able to find what you're looking for by working with the pstats module directly in Python. With that you can directly inspect the source data used by snakeviz, including getting a complete list of what-called-what. Just note that the ncalls, tottime, and cumtime metrics capture the caller/callee relationship globally and if that relationship shows up in multiple places you cannot separately see metrics for each place. Here are a couple examples:

In [1]: import pstats

In [3]: p = pstats.Stats('s3.cprof')

In [12]: p.print_callees()
   Random listing order was used

Function                                          called...
                                                      ncalls  tottime  cumtime
slee.py:22(sleepn5)                               ->       1    0.000    0.511  slee.py:13(sleepn)
slee.py:4(sleep1)                                 ->      10    1.028    1.028  {built-in method time.sleep}
slee.py:8(sleep3)                                 ->       3    0.000    0.313  slee.py:4(sleep1)
slee.py:18(sleepn1)                               ->       1    0.000    0.104  slee.py:13(sleepn)
slee.py:13(sleepn)                                ->       7    0.000    0.715  slee.py:4(sleep1)
{built-in method builtins.exec}                   ->       1    0.000    1.028  slee.py:1(<module>)
{built-in method time.sleep}                      ->
slee.py:1(<module>)                               ->       1    0.000    1.028  slee.py:26(root)
slee.py:26(root)                                  ->       1    0.000    0.313  slee.py:8(sleep3)
                                                           1    0.000    0.100  slee.py:13(sleepn)
                                                           1    0.000    0.104  slee.py:18(sleepn1)
                                                           1    0.000    0.511  slee.py:22(sleepn5)
{method 'disable' of '_lsprof.Profiler' objects}  ->


Out[12]: <pstats.Stats at 0x10e86bcb0>

In [13]: p.print_callers()
   Random listing order was used

Function                                          was called by...
                                                      ncalls  tottime  cumtime
slee.py:22(sleepn5)                               <-       1    0.000    0.511  slee.py:26(root)
slee.py:4(sleep1)                                 <-       3    0.000    0.313  slee.py:8(sleep3)
                                                           7    0.000    0.715  slee.py:13(sleepn)
slee.py:8(sleep3)                                 <-       1    0.000    0.313  slee.py:26(root)
slee.py:18(sleepn1)                               <-       1    0.000    0.104  slee.py:26(root)
slee.py:13(sleepn)                                <-       1    0.000    0.104  slee.py:18(sleepn1)
                                                           1    0.000    0.511  slee.py:22(sleepn5)
                                                           1    0.000    0.100  slee.py:26(root)
{built-in method builtins.exec}                   <-
{built-in method time.sleep}                      <-      10    1.028    1.028  slee.py:4(sleep1)
slee.py:1(<module>)                               <-       1    0.000    1.028  {built-in method builtins.exec}
slee.py:26(root)                                  <-       1    0.000    1.028  slee.py:1(<module>)
{method 'disable' of '_lsprof.Profiler' objects}  <-


Out[13]: <pstats.Stats at 0x10e86bcb0>

In [14]: p.all_callees
Out[13]:
{('slee.py',
  22,
  'sleepn5'): {('slee.py', 13, 'sleepn'): (1,
   1,
   0.00011399999999999999,
   0.511143)},
 ('slee.py', 26, 'root'): {('slee.py', 22, 'sleepn5'): (1, 1, 8e-06, 0.511151),
  ('slee.py', 8, 'sleep3'): (1, 1, 2.2e-05, 0.312701),
  ('slee.py', 18, 'sleepn1'): (1, 1, 9.999999999999999e-06, 0.10384),
  ('slee.py', 13, 'sleepn'): (1, 1, 2.1e-05, 0.100128)},
 ('slee.py',
  4,
  'sleep1'): {('~', 0, '<built-in method time.sleep>'): (10,
   10,
   1.027525,
   1.027525)},
 ('slee.py',
  13,
  'sleepn'): {('slee.py', 4, 'sleep1'): (7, 7, 6.8e-05, 0.714942)},
 ('slee.py',
  8,
  'sleep3'): {('slee.py', 4, 'sleep1'): (3, 3, 2.8e-05, 0.312679)},
 ('slee.py',
  18,
  'sleepn1'): {('slee.py', 13, 'sleepn'): (1,
   1,
   2.4e-05,
   0.10382999999999999)},
 ('~',
  0,
  '<built-in method builtins.exec>'): {('slee.py', 1, '<module>'): (1,
   1,
   6e-06,
   1.0278669999999999)},
 ('~', 0, '<built-in method time.sleep>'): {},
 ('slee.py',
  1,
  '<module>'): {('slee.py', 26, 'root'): (1, 1, 4.1e-05, 1.027861)},
 ('~', 0, "<method 'disable' of '_lsprof.Profiler' objects>"): {}}

The extra calls might be getting hidden in the snakeviz display by the depth cutoff or the size cutoff.

@EmilianoG-byte
Copy link

Thank you for the prompt answer @jiffyclub and @nschloe! It seems like pstats might solve my problem, I will check it out :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants