New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds get_new_ids and at_addrs functions #36

Merged
merged 10 commits into from Feb 13, 2018

Conversation

Projects
None yet
2 participants
@spacether
Copy link
Contributor

spacether commented Jan 8, 2018

In conducting a memory leak investigation, I found it helpful to look at my new objects created between calls to show_growth. So I made two new functions to help users out here.

get_new_ids is like show_growth, except that it stores dictionaries which hold sets of object address ids.
So now one can grab a set of all the new 'list' ids.

at_addrs: Once one has the new 'list' ids, one can call at_addrs to go from a set of ids to the objects at those addresses. One can then make graphs of the back references on those new objects.

@mgedmin
Copy link
Owner

mgedmin left a comment

Could you please revert the changes to generated images (and the one .dot file) in docs/? These clutter the diff and make it difficult to review the PR.

I also did not notice any tests for the new functions? Currently objgraph has 100% test coverage and I do not want to lose that.

@mgedmin
Copy link
Owner

mgedmin left a comment

Now that I finally understand what this is for, I quite like the idea. Thanks for sharing it!

The PR will need some massaging to be mergeable. I'll try to help.

objgraph.py Outdated
__version__ = '3.3.1.dev0'
__date__ = '2017-12-28'
__version__ = '3.3.2.dev0'
__date__ = '2018-01-08'

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

New function are new features, so it'll be 3.4.0.dev0, according to SemVer.

@@ -19,6 +19,7 @@ Statistics

.. autofunction:: show_growth([limit=10, peak_stats={}, shortnames=True, file=sys.stdout])

.. autofunction:: get_ids([skip_update=False, limit=10, sortby='deltas'])

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

Oh my, I don't remember how my own documentation works! I totally forgot to include growth() and perhaps a few other new functions in here!

objgraph.py Outdated
weakref 3807 3864 +57 +57
dict 6892 6947 +73 +55
frame 34 70 +53 +36
...

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

Ah, you have some tests! Sorry I missed them on the first glance.

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

API-wise I'm not very happy about functions that both print and return values.

Despite having read the description, I don't quite get how this one differs from growth()/show_growth().

Also, the name (get_ids) is too generic, and doesn't describe what the function does.

(I'm afraid my criticism is not very constructive at this point. I'll have to think about this more.)

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

Ah, so this function also observes (or tries to observe) the object churn! Interesting.

Given how object IDs can be reused by Python, are the numbers accurate?

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

Suggestion for the function name: get_new_ids(). It could also return just the NEW_IDs, for simplicity's sake.

I'm thinking the three parameters for tracking state are a bit unwieldy (e.g. you cannot do OLD_IDS = CURRENT_IDS; CURRENT_IDS = defaultdict(set)), so how about one _state={} parameter that you initialize like this:

def get_new_ids(..., _state={}):
    ...
    _state['old'] = old = _state.get('current', defaultdict(set))
    _state['current'] = current = defaultdict(set)
    _state['new'] = new = defaultdict(set)
    ...
    return new

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

Given how object IDs can be reused by Python, are the numbers accurate?

I'm convinced now that given the purpose of this function that doesn't matter. Objects that were freed and reallocated at the same memory address (while maintaining the same type name) aren't contributing to memory leaks.

This comment has been minimized.

@spacether

spacether Jan 23, 2018

Contributor

If we change the code to:
_state['old'] = old = _state.get('current', defaultdict(set))
Then the id of _state['old'] will change to the id of _state['current'].
This would also require a new dict be created every time for _state['current'] with new sets under every class_name key.

I want to minimize the creation of new ids to store this info, so I prefer the old_dict[class_name].clear(), old_dict[class_name].update(current_dict[class_name])
current_dict[class_name].clear(), current_dict[class_name].add(new_id)
pattern that way we re-use dictionaries and sets that are already reserved in memory.

This comment has been minimized.

@mgedmin

mgedmin Jan 24, 2018

Owner

That is a very good point!

objgraph.py Outdated
if ``skip_update`` is True, the sets of [OLD_IDS, CURRENT_IDS, NEW_IDS]
will be returned from when the function was last run without examining the
objects currently iin memory.

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

Typo: iin -> in.

objgraph.py Outdated
(width, 'Type', 'Old_ids', 'Current_ids', 'New_ids', 'Count_Deltas'))
print('='*(width+13*4))
for row in rows:
row_class, old, current, new, delta = row

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

You can do for row_class, old, current, new, delta in rows: directly.

objgraph.py Outdated
for k, v in CURRENT_IDS.items():
OLD_IDS[k].update(v)
for k in CURRENT_IDS.keys():
CURRENT_IDS[k].clear()

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner
OLD_IDS.clear()
OLD_IDS.update(CURRENT_IDS)
CURRENT_IDS.clear()

might be simpler than the three loops.

objgraph.py Outdated
for o in objects:
CURRENT_IDS[type(o).__name__].add(id(o))
for k in NEW_IDS.keys():
NEW_IDS[k].clear()

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

Or just NEW_IDS.clear().

objgraph.py Outdated
for k in NEW_IDS.keys():
NEW_IDS[k].clear()
rows = []
for class_name in CURRENT_IDS.keys():

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

for class_name in CURRENT_IDS: would skip generating an unnecessary intermediate list object on Python 2.

objgraph.py Outdated
rows = []
for class_name in CURRENT_IDS.keys():
new_ids_set = CURRENT_IDS[class_name] - OLD_IDS[class_name]
NEW_IDS[class_name].update(new_ids_set)

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

You're clearing NEW_IDs before this loop, and class names will not repeat, so you can do NEW_IDS[...] = new_ids_set and skip the update.

objgraph.py Outdated
>>> [old, current, new_ids] = get_ids()
new_lists = at_addrs(new_ids['list'])
for new_list in new_lists:
call show_chain on each new_list object

This comment has been minimized.

@mgedmin

mgedmin Jan 9, 2018

Owner

Doctests need a >>> in front of every statement, and a ... in front of every continuation line of a compound statement such as a for loop.

Doctests are executed by the test runner, and so cannot contain pseudocode.

@mgedmin mgedmin added the enhancement label Jan 9, 2018

@spacether spacether force-pushed the spacether:master branch from 663bef1 to 2670b64 Jan 23, 2018

@spacether spacether changed the title Adds get_ids and at_addrs functions Adds get_new_ids and at_addrs functions Jan 23, 2018

@spacether

This comment has been minimized.

Copy link
Contributor

spacether commented Jan 23, 2018

@mgedmin

  • I reverted my changes so there are now no new generated images in the docs folder
  • Version number is fixed
  • Function name is changed from get_ids to get_new_ids
  • Object ids are now stored in a _state argument in get_new_ids
  • Working tests added to the docstrings of get_new_ids and at_addrs

Any other updates?

@spacether

This comment has been minimized.

Copy link
Contributor

spacether commented Jan 29, 2018

@mgedmin Any other updates?

@mgedmin
Copy link
Owner

mgedmin left a comment

Sorry! I'm trying to find the time to look at the code again.

objgraph.py Outdated
# remove the key from our dicts if we don't have any old or
# curent class_name objects
del old_ids[class_name]
del current_ids[class_name]

This comment has been minimized.

@mgedmin

mgedmin Jan 30, 2018

Owner

You're modifying a dictionary while iterating over it. This can cause problems (RuntimeError: dictionary changed size during iteration).

objgraph.py Outdated
rows.sort(key=lambda row: row[index_by_sortby[sortby]], reverse=True)
if limit:
rows = rows[:limit]
width = max(len(row[0]) for row in rows)

This comment has been minimized.

@mgedmin

mgedmin Jan 30, 2018

Owner

This could raise ValueError: max() arg is an empty sequence if rows is empty (which is probably unlikely, but could happen if e.g. a user accidentally passes limit=0).

spacether added some commits Jan 30, 2018

@spacether

This comment has been minimized.

Copy link
Contributor

spacether commented Jan 30, 2018

@mgedmin

  • Key deletion has been moved outside of the iteration loop
  • limit = None or >= 0 may now be passed in to get_new_ids
@mgedmin

mgedmin approved these changes Feb 5, 2018

Copy link
Owner

mgedmin left a comment

Looks good to me. I've a couple of small suggestions, and I'll wait a couple of days before merging so you could implement them if you wish to do so.

Thank you for your patience!

objgraph.py Outdated
for class_name in current_ids:
current_ids[class_name].clear()
for o in objects:
class_name = type(o).__name__

This comment has been minimized.

@mgedmin

mgedmin Feb 5, 2018

Owner

I still think this should support long names, but that can be added later.

This comment has been minimized.

@mgedmin

mgedmin Feb 5, 2018

Owner

(TBH it doesn't handle old-style classes on Python 2 well either. _short_typename is there for a reason.)

objgraph.py Outdated
new_ids[class_name].update(new_ids_set)
num_new = len(new_ids_set)
num_delta = num_current - num_old
row = [class_name, num_old, num_current, num_new, num_delta]

This comment has been minimized.

@mgedmin

mgedmin Feb 5, 2018

Owner

I would make this a tuple.

objgraph.py Outdated
del current_ids[key]
del new_ids[key]
index_by_sortby = {'old': 1, 'current': 2, 'new': 3, 'deltas': 4}
rows.sort(key=lambda row: row[index_by_sortby[sortby]], reverse=True)

This comment has been minimized.

@mgedmin

mgedmin Feb 5, 2018

Owner

You could simplify this to rows.sort(key=operator.itemgetter(index_by_sortby[sortby]), reverse=True).

objgraph.py Outdated
def at_addrs(address_set):
"""Returns a list of objects for a given set of memory addresses.
The reverse of [id(obj1), id(obj2), ...]

This comment has been minimized.

@mgedmin

mgedmin Feb 5, 2018

Owner

Please mention that the objects are returned in an arbitrary order.

spacether added some commits Feb 13, 2018

@spacether

This comment has been minimized.

Copy link
Contributor

spacether commented Feb 13, 2018

@mgedmin I made the updates that you suggested. Can you merge them in and post the new version on pypi?

@mgedmin mgedmin merged commit c8a8015 into mgedmin:master Feb 13, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@mgedmin

This comment has been minimized.

Copy link
Owner

mgedmin commented Feb 13, 2018

Thank you!

@mgedmin

This comment has been minimized.

Copy link
Owner

mgedmin commented Feb 13, 2018

objgraph 3.4.0 is out on PyPI.

@spacether

This comment has been minimized.

Copy link
Contributor

spacether commented Feb 14, 2018

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment