Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: Clarify filter function behaviors #43

Closed
vEpiphyte opened this issue Jan 9, 2020 · 4 comments
Closed

Documentation: Clarify filter function behaviors #43

vEpiphyte opened this issue Jan 9, 2020 · 4 comments

Comments

@vEpiphyte
Copy link

Filters appear to behave as negative filters, removing data from the output. Is this correct/intended?

This is a test harness I was using to determine filter behavior (called genr.py):

import time
import random
import asyncio

import yappi

basename = 'genr'
debug = ''
if __debug__:
    debug = '_debug'
statsname = f'./{basename}{debug}.pstats'

N = 4
SLEEP = 0.5

times = [random.random() for i in range(N)]

async def genr():
    for t in times:
        await asyncio.sleep(t)
        yield t

async def sleeploop(i):
    time.sleep(i)
    # await asyncio.sleep(i)

async def run():
    values = []

    async for t in genr():
        values.append(t)

    await sleeploop(SLEEP)

    return values

async def main():
    print('start')

    yappi.set_clock_type('wall')
    tick = time.time()
    yappi.clear_stats()
    yappi.start(builtins=False)

    values = await run()

    yappi.stop()
    tock = time.time()
    filter = {}
    ##########################
    # DEFINE FILTER HERE 
    # filter['module'] = 'genr.py'
    ##########################
    stats = yappi.get_func_stats(filter=filter)  # type: yappi.YFuncStats
    for stat in stats:
        print("%s %s %0.6f" % (stat.name, stat.module, stat.ttot))
    # stats.print_all()
    # stas.print_all()
    # obj = yappi.convert2pstats(stats)  # type: profile.Stats
    # obj.dump_stats(statsname)

    print(f'took: {tock - tick}')

    assert values == times
    print(times)
    print(f'Total random time: {sum(times)}')
    print(f'Total run time: {sum(times) + SLEEP}')
    print('fin')

if __name__ == '__main__':
    asyncio.run(main())

If this is intended behavior, would you accept a PR updating the filter docs to mention this ?

@vEpiphyte
Copy link
Author

Actually, I'm not even certain if that example is setting the module correctly at all.

@sumerc
Copy link
Owner

sumerc commented Jan 9, 2020

Ok. Let me clarify a bit.

I think I can see where the confusion is coming from: the filter param is actually optional.

As an example: you can always do following:

def foo():
    pass

def bar():
    pass

import yappi
yappi.start()
foo()
bar()
stats = yappi.get_func_stats()
for t in stats:
    print(t)
'''
{0: 'bar', 1: 'x.py', 2: 5, 3: 1, 4: 1, 5: False, 6: 3e-06, 7: 3e-06, 8: 1, 9: <yappi.YChildFuncStats object at 0x10d67d1d0>, 10: 0, 11: '_MainThread', 12: 0, 13: 3e-06, 14: 'x.py:5 bar'}
{0: 'foo', 1: 'x.py', 2: 1, 3: 1, 4: 1, 5: False, 6: 3e-06, 7: 3e-06, 8: 0, 9: <yappi.YChildFuncStats object at 0x10d67d210>, 10: 0, 11: '_MainThread', 12: 0, 13: 3e-06, 14: 'x.py:1 foo'}
'''

Here you can see the element keyed with 1 is the module's name which is: x.py. get_func_stats return YFuncStats object which contains above dict in a more user friendly way. See here.

Let's see:

for t in stats:
    print(t.__dict__)
'''
{'name': 'foo', 'module': 'x.py', 'lineno': 1, 'ncall': 1, 'nactualcall': 1, 'builtin': False, 'ttot': 3e-06, 'tsub': 3e-06, 'index': 0, 'children': <yappi.YChildFuncStats object at 0x105a181d0>, 'ctx_id': 0, 'ctx_name': '_MainThread', 'tag': 0, 'tavg': 3e-06, 'full_name': 'x.py:1 foo'}
{'name': 'bar', 'module': 'x.py', 'lineno': 5, 'ncall': 1, 'nactualcall': 1, 'builtin': False, 'ttot': 2e-06, 'tsub': 2e-06, 'index': 1, 'children': <yappi.YChildFuncStats object at 0x105a18190>, 'ctx_id': 0, 'ctx_name': '_MainThread', 'tag': 0, 'tavg': 2e-06, 'full_name': 'x.py:5 bar'}
'''

Now, it is better right?

With this kind of introspection, I think any kind of filtering is possible easily? WDYT?

The reason we have filter param in get_func_stats is that: it gives very fast filtering on C side for some params(especially for tag and ctx_id) and I completely agree it causes confusion. (I do not want to play with it at least right now for backward compat.)

While discussing it, I feel that this also needs to be documented. I said docs are poor, right? :)

Coming to your original question again:

Filters appear to behave as negative filters, removing data from the output.

I am not sure if I understand you correctly but say you filter on module=django, that means that all the output that does not contain django as a module name will be skipped.

@vEpiphyte
Copy link
Author

@sumerc This is definitely helpful. I'll play around with this more.

filter on module=django, that means that all the output that does not contain django as a module name will be skipped.

That is correct by what I meant as a "negative" filter.

@sumerc
Copy link
Owner

sumerc commented Apr 8, 2020

We have changed this whole filtering with the branch improve-filtering. All docs updated. Please see and comment if you see any more improvement on that.

@sumerc sumerc closed this as completed Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants