Resource id refactor #2091

d-chambers · 2018-03-20T04:41:52Z

This PR represents a few days of work trying to track down a particularly hard to find bug with the "event scoped" object referrals of ResourceIdentifier class. The issues were related to premature deletions of the __resource_id_weak_dict due to bugs in the resource_id's manual reference counting. While debugging the code I ended up performing some major refactoring and adding some additional features, namely:

Moved the ResourceIdentifier class from obspy.core.event.base to its own module obspy.core.event.resourceid as the class definition was getting quite large.
Added a context manager for safely (and temporarily) replacing the ResourceIdentifier class state
called _debug_class_state (a private method for debugging only). This allowed the removal of
all manual clearing of the __resource_id_weak_dict (which is explicitly warned against doing in the docs).
Removed the manual id reference counting by ResourceIdentifier and replace it instead with a semi-singleton class (only one instance ever exists per id at one time) called _ResourceSingleton. The resource singleton reference are kept in a weak value dict, and are used as keys to the resource_id_weak dict. This results in the same behavior as before, but pushes the burden of reference counting back to the standard library/python implementation.
Added several more tests for the event-scoped behavior of the ResourceIdentifier.
Added the MegaCatalog class to the testing module of utils for creating a Catalog object with most, if not all, the classes contained in obspy.event.
Added a helper function for allowing context managers to be used in the 'setUpmethod of unittest.TestCase` instances to the testing module.

PR Checklist

Correct base branch selected? master for new fetures, maintenance_... for bug fixes
This PR is not directly related to an existing issue (which has no PR yet).
If the PR is making changes to documentation, docs pages can be built automatically.
Just remove the space in the following string after the + sign: "+DOCS"
If any network modules should be tested for the PR, add them as a comma separated list
(e.g. clients.fdsn,clients.arclink) after the colon in the following magic string: "+TESTS:"
(you can also add "ALL" to just simply run all tests across all modules)
All tests still pass.
Any new features or fixed regressions are be covered via new tests.
Any new or changed features have are fully documented.
Significant changes have been added to CHANGELOG.txt .
First time contributors have added your name to CONTRIBUTORS.txt .

krischer

Thanks a lot for spending the time to tackle all these nasty issues!

I only have some minor comments - otherwise this looks pretty good. Does pickling one of the child objects work correctly with the chosen approach?

krischer · 2018-03-26T10:25:37Z

obspy/core/event/event.py

+    @rlock
+    def __setstate__(self, state_dict):
+        super(Event, self).__setstate__(state_dict)
+        ResourceIdentifier.bind_resource_ids()


Just writing down my thoughts: There are now two separate thread locks for the Event objects but I think this if fine as __setstate__() and __deepcopy__() cannot really be called at the same time so there are no race conditions.

Should bind_resource_ids() also be called for all the other event objects like Origin, Magnitude, ...?

No, the idea is to call it only on the event level to bind the inter-event related resource ids (eg picks and arrivals). Since origins are on the same nested level as picks an origin could get constructed before the picks that the arrivals refer to, in which case calling bind_resource ids would cause the arrivals to be bound to the wrong picks.
This whole thing can get confusing I know...

Tbh I am not really sure if we need an rlock on __setstate__ or not. When I am feeling really ambitious I should setup a nasty multi-threaded test case to make sure it all still works with many threads.

krischer · 2018-03-26T10:39:54Z

obspy/core/tests/test_resource_identifier.py

+        self.r_dict = state['rdict']  # the weak resource dict
+        self.unbound = state['unbound']  # the ubound resource ids
+
+    def print_state(self):


Can you add a nocover pragma here, e.g.:

def print_state(self): # pragma: nocover ...

krischer · 2018-03-26T10:41:56Z

obspy/core/tests/test_resource_identifier.py

+        """
+        # setup temporary id dict for tests in this case and bind to self
+        context = ResourceIdentifier._debug_class_state()
+        state = setup_context_testcase(self, context)


Just for my understanding: __exit__() of the context manager would be call once the instance is garbage collected?

I am using Ned Bachelder's recipe for using context managers with unit test, which calls the setup and cleanup function before and after every test. So the context manager enters and exits for every test thanks to this helper function.

Pytest has an elegant solution for kind of thing with yield fixtures, but this was the best I could find for unittest.

krischer · 2018-03-26T10:42:50Z

obspy/core/tests/test_resource_identifier.py

+    def test_id_without_reference_not_in_global_list(self):
+        """
+        This tests some internal workings of the ResourceIdentifier class.
+        NEVER modify the __resource_id_weak_dict!


This sentence is obsolete now.

krischer · 2018-03-26T10:44:47Z

obspy/core/tests/test_resource_identifier.py

+            "specified in the QuakeML manual section 3.1 and in particular "
+            "exclude colons for the final part.")
+
+    def test_dubug_class_state(self):


d-chambers · 2018-03-27T04:19:54Z

This is still a work in progress, I may would like to think about it a bit more before it gets merged.

krischer · 2018-04-11T12:13:07Z

Just a quick ping on this. No need to hurry but it would be nice if this does not get lost in a stale PR.

d-chambers · 2018-04-11T15:27:19Z

Hey @krischer,
I am still working on this. I have a solution that is robust and thread-safe but I need a bit longer to clean it up and to look at ways of speeding it up (reading/copying catalogs currently take ~15% performance hit).

d-chambers · 2018-04-12T05:11:11Z

I am struggling a bit with this one. I ended up abandoning the implementation form #1644 that depended on event creation order because it was causing all sorts of hard to debug problems, and multi-threading didn't work despite rlocking key functions. IMO it was not a good design.

To replace it, I implemented a method of scoping resource ids based on using a parent id (eg the id of the event object) as a namespace of sorts for resource_ids and their referred objects. This works by calling a recursive function to find all resource ids in an objects attribute tree and assigning them to the parent namespace. This approach is thread-safe and seems to be much more reliable.

However, recursion is not particularly efficient in python. This new method slows catalog creation operations (read, copy, unpickle) down by around 20%. This will hamper other efforts to improve efficiency (eg #1910).

Perhaps it might be better to leave the resource_ids un-scoped (ie they will behave how they originally did before #1644) but provide a public method on the catalog and event object in order to scope the resource ids? I imagine many catalog/event interactions don't even make use of the resource_ids' so it would just be a waste of computation in those cases, but it would allow the user the option to ensure correct resource_id behavior when it is needed. It could be rolled up into a general function called something like "validate" that will also try to detect/fix other issues that might occur in event objects.

I will push the work I have so far, but it still will require a bit of cleanup. If anyone has any suggestions they would be much appreciated.

d-chambers · 2018-04-15T07:00:26Z

I spent some time revisiting this and optimizing and I am pleased with the results. Doing the event scoping of the resource_ids still has a cost, but now it is not all that much. From my profiling on python 3.6 and Ubuntu 16.04 I get the following (no event_scoping, event_scoping):

read quakeml: 13.3 / 13.6 ms
copy catalog: 4.8 / 5.07 ms
unpickle 1.71 / 2.21 ms

Here is the ipython notebook I used to do the profiling. In order to run it you need to be on this branch with your current obspy installation.

It is interesting that unpickling an event object takes less than half the time that copying does, but that is an issue for a different day.

krischer

First of all sorry for being so late to review this but I just could not find the time.

I'd feel pretty comfortable merging this as the tests are very comprehensive. The only downside I see is that this is really hard to understand - I've been staring at it for a while now and still don't fully get all aspects. But I do see that this is definitely necessary to get the references to finally work correctly.

I rebased on top of the lastest master and force pushed and made some small additional changes. Aside from this I have some clarification questions/some small suggestions.

I'm gonna look at this a bit more tomorrow but would tend to merge it fairly soon if there are no objections.

Thanks a lot for this @d-chambers!

krischer · 2018-07-03T18:24:19Z

obspy/core/util/misc.py

@@ -639,6 +638,111 @@ def buffered_load_entry_point(dist, group, name):
    return _ENTRY_POINT_CACHE[hash_str]


+def yield_obj_parent_attr(obj, cls=None, is_attr=None, has_attr=None):


This is no longer needed with the specialized form below. Are you fine with removing it? If we ever need it again its still in the history.

If there are not any issues I would like to keep it; I find this function very useful for transversing Catalog and Inventory instances and am planning to use it in #2154. I can always re-implement it if that would be cleaner. It is used several places in the new tests anyhow.

That argument is good enough for me. I'll prefix is with an underscore to mark it as private just so we don't commit to maintain it and are free to change its API.

krischer · 2018-07-03T18:26:36Z

obspy/core/util/misc.py

+        if obj is not None and (hasattr(obj, '__dict__') or
+                                isinstance(obj, (list, tuple))):
+            id_tuple = (id(obj), id(parent))
+            if id_tuple not in ids:


Stylistically its IMHO a bit nicer to do the inverse test and continue early - then the following code can be dedented one level.

Good idea but this is a bit tricky; this line is not in the scope of a loop so the continue keyword isn't valid. Since this is a generator we can just return (which is the new way of raising StopIteration according to PEP 479). I think this will work in 2.7, let's see what the CI says.

Ah yea sorry you are right. CI seems to like in on 2.7 (which I think works only when one does not return a value).

krischer · 2018-07-03T19:00:42Z

obspy/core/util/misc.py

+
+    ids = set()  # id cache to avoid circular references
+
+    def func(obj, attr=None, parent=None):


Its slightly confusing that this function takes obj, attr, parents but the iterator yields obj, parent, attr. But yea - super minor internal detail so feel free to ignore it.

This function is only used in the _yield_resource_id_parent_attr scope, but I agree it would be more consistent to switch the order.

krischer · 2018-07-03T19:13:32Z

obspy/core/event/resourceid.py

+    def __set__(self, instance, value):
+        # if an object was passed, use the id of the object for hash
+        if value is not None:
+            if not isinstance(value, (int, str, native_str)):


Is this important? I feel like this even induced (a very very very tiny) chance of collusion if an int happens to be identical to an object's id.

Yes, some of the objects used as parents are (rightfully) not hashable. For example:

import obspy.core.event as ev hash(ev.Event()) # raises TypeError

This little hack gets around it. In this case, we are actually just interested in ensuring the key objects are unique (in memory) so the mutability considerations that normally should guide if an object is hashable are not really applicable.

You bring up a good point about possible collisions, although I think it would have to be deliberate or just a very very bad day. I guess if we are concerned about it I could also check if value is an int and cast it to a str or float so that the only way an int gets into the hash table is through this if clause? It may incur a slight hit to performance though.

Also, looking at the flow of the program, I would be surprised if an int ever makes it here, but it wouldn't be impossible.

This little hack gets around it. In this case, we are actually just interested in ensuring the key objects are unique (in memory) so the mutability considerations that normally should guide if an object is hashable are not really applicable.

I was more thinking to just always hash id(value). I agree that we don't really have to worry about this minuscule chance and it could still happen with any hash.

krischer · 2018-07-03T19:16:23Z

obspy/core/event/resourceid.py

+    >>> # Deleting it, or letting the garbage collector handle the object will
+    >>> # invalidate the reference.
+    >>> del event
+    >>> print(res_id.get_referred_object())  # doctest: +SKIP


Why do we skip this?

krischer · 2018-07-03T19:16:41Z

obspy/core/event/resourceid.py

+    >>> del obj_b  # obj_b gets garbage collected
+    >>> assert ref_b.get_referred_object() is obj_a  # doctest: +SKIP
+    >>> del obj_a  # now no object with res_id exists
+    >>> assert ref_b.get_referred_object() is None  # doctest: +SKIP


Why is it being skipped?

krischer · 2018-07-03T22:05:16Z

obspy/core/tests/test_resource_identifier.py

+        del r1
+        self.assertEqual(r2.get_referred_object(), t1)
+        self.assertIn(r2._resource_key, r_dict)
+        # Deleting the second one should (r_dict should not be empty)


Sorry, I am not sure what you mean.

Sorry I should have been more clear. s/SEARCH/REPLACE is ack syntax to search and replace strings. It should be r_dict should now be empty.

krischer · 2018-07-03T22:16:51Z

obspy/core/event/resourceid.py

+        if self._object_id is not None:
+            msg = ("The object with identity of: %d no longer exists, "
+                   "returning the most recently created object with a"
+                   " resource id of: %s") % (self._object_id, self.id)


This warning message is not fully correct as it would also be triggered if the object is found in the parent scope.

krischer · 2018-07-03T22:23:47Z

obspy/core/event/resourceid.py

+        self._object_id = id(referred_object)
+        # Use parent-scoping feature if parent is not None.
+        if parent is not None:
+            self._parent_key = parent


What would happen if this is called multiple times with different parent objects?

I am trying to remember (I had this really well thought out at one point) but I think this does happen when an event is copied. I don't think this will cause issues since the resource_id instance will now point to the correct object in the new scope.

krischer · 2018-07-03T22:27:56Z

obspy/core/event/event.py

+                                                warn=False)
+            else:
+                resource_id._parent_key = self
+                resource_id._object_id = None


How are things like the derived origin ids scoped after this function has been called? _parent_key would be set but _parent_id_tree and things like this might not be set. But I just be misunderstanding some things.

Any object that is referenced by a resource_id should have the attribute resource_id. Lines 332-334 take care of this case, which would involve logging the identities of all the objects with resource_id attributes into the proper place in the _parent_id_tree. Then, resource ids that refer to other objects in the same event (but not their parent) such as the pick_id of the Arrival object will find the correct object when get_referred_object is called because it was previously put into _parent_id_tree. Because _object_id on has been set to None it will not raise a warning. I added another test to specifically check for warnings being raised in this case.

👍 Thanks for the explanation. After spending some more time I think I now fully understand your changes and how everything works. Nice job!

d-chambers · 2018-07-04T04:52:41Z

No problem, thanks for taking a look at it. I really hope this finally fixes all the issues (it seems to have worked so far).

The only downside I see is that this is really hard to understand.

I agree. It is more complicated than I would like, but I couldn't come up with something simpler that still solved all the edge cases. The basic gist, which I am sure you know, is to have each ID scoped to some parent, in this case the event object. The only way I found to reliably do this is wait until an event and all related attributes are created, then recurse and set each resource_id in the event to the correct parent.

krischer · 2018-07-04T16:40:34Z

I cleaned up a bit more and fixed two things that got lost during my rebase. Double checking could not hurt but I think I found everything missing.

Also all event plugin's now scope the resource identifiers just to make sure everything works.

I still have trouble understanding the whole _ResourceKey + descriptor business. So I see that the _ResourceKey object is a necessary wrapper object so it can be used in the weakref dictionaries. But why does it have to be a singleton? Its only used as keys for the dictionary so why does a deterministic hash like hash(id(value)) not work? Also where do the object live? They don't seem to get garbage collected as I would expect, see e.g.

In [1]: import obspy

In [2]: cat = obspy.read_events()

In [3]: cat
Out[3]:
3 Event(s) in Catalog:
2012-04-04T14:21:42.300000Z | +41.818,  +79.689 | 4.4 mb | manual
2012-04-04T14:18:37.000000Z | +39.342,  +41.044 | 4.3 ML | manual
2012-04-04T14:08:46.000000Z | +38.017,  +37.736 | 3.0 ML | manual

In [4]: dict(cat[0].resource_id._parent_key._singleton_cache)
Out[4]:
{112059039136: <obspy.core.event.resourceid._ResourceKey at 0x1a173d6908>,
 112059050304: <obspy.core.event.resourceid._ResourceKey at 0x1a173f52e8>,
 112059175544: <obspy.core.event.resourceid._ResourceKey at 0x1a173f5748>,
 'quakeml:eu.emsc/NetworkMagnitude/rts/261017/782476/796628': <obspy.core.event.resourceid._ResourceKey at 0x1a173d6ba8>,
 'quakeml:eu.emsc/NetworkMagnitude/rts/261018/782478/796631': <obspy.core.event.resourceid._ResourceKey at 0x1a173f5438>,
 'quakeml:eu.emsc/NetworkMagnitude/rts/261020/782484/796646': <obspy.core.event.resourceid._ResourceKey at 0x1a173c7b00>,
 'quakeml:eu.emsc/event/20120404_0000038': <obspy.core.event.resourceid._ResourceKey at 0x1a173f5320>,
 'quakeml:eu.emsc/event/20120404_0000039': <obspy.core.event.resourceid._ResourceKey at 0x1a173f5b00>,
 'quakeml:eu.emsc/event/20120404_0000041': <obspy.core.event.resourceid._ResourceKey at 0x1a173d6240>,
 'quakeml:eu.emsc/origin/rts/261017/782476': <obspy.core.event.resourceid._ResourceKey at 0x1a173d66a0>,
 'quakeml:eu.emsc/origin/rts/261018/782478': <obspy.core.event.resourceid._ResourceKey at 0x1a173f53c8>,
 'quakeml:eu.emsc/origin/rts/261020/782484': <obspy.core.event.resourceid._ResourceKey at 0x1a173d3b00>,
 'smi://eu.emsc/unid': <obspy.core.event.resourceid._ResourceKey at 0x1a173f5b38>,
 'smi:eu.emsc-csem/origin_method/NA': <obspy.core.event.resourceid._ResourceKey at 0x1a173c7da0>,
 'smi:smi-registry/organization/DDA': <obspy.core.event.resourceid._ResourceKey at 0x1a173d6eb8>,
 'smi:smi-registry/organization/EMSC': <obspy.core.event.resourceid._ResourceKey at 0x1a173c7978>,
 'smi:smi-registry/organization/NNC': <obspy.core.event.resourceid._ResourceKey at 0x1a173d6a58>}

In [5]: del cat[0]; del cat[1]

In [6]: cat
Out[6]:
1 Event(s) in Catalog:
2012-04-04T14:18:37.000000Z | +39.342,  +41.044 | 4.3 ML | manual

In [7]: import gc; gc.collect()
Out[7]: 249

In [8]: dict(cat[0].resource_id._parent_key._singleton_cache)
Out[8]:
{112059039136: <obspy.core.event.resourceid._ResourceKey at 0x1a173d6908>,
 112059050304: <obspy.core.event.resourceid._ResourceKey at 0x1a173f52e8>,
 112059175544: <obspy.core.event.resourceid._ResourceKey at 0x1a173f5748>,
 'quakeml:eu.emsc/NetworkMagnitude/rts/261017/782476/796628': <obspy.core.event.resourceid._ResourceKey at 0x1a173d6ba8>,
 'quakeml:eu.emsc/NetworkMagnitude/rts/261018/782478/796631': <obspy.core.event.resourceid._ResourceKey at 0x1a173f5438>,
 'quakeml:eu.emsc/NetworkMagnitude/rts/261020/782484/796646': <obspy.core.event.resourceid._ResourceKey at 0x1a173c7b00>,
 'quakeml:eu.emsc/event/20120404_0000038': <obspy.core.event.resourceid._ResourceKey at 0x1a173f5320>,
 'quakeml:eu.emsc/event/20120404_0000039': <obspy.core.event.resourceid._ResourceKey at 0x1a173f5b00>,
 'quakeml:eu.emsc/event/20120404_0000041': <obspy.core.event.resourceid._ResourceKey at 0x1a173d6240>,
 'quakeml:eu.emsc/origin/rts/261017/782476': <obspy.core.event.resourceid._ResourceKey at 0x1a173d66a0>,
 'quakeml:eu.emsc/origin/rts/261018/782478': <obspy.core.event.resourceid._ResourceKey at 0x1a173f53c8>,
 'quakeml:eu.emsc/origin/rts/261020/782484': <obspy.core.event.resourceid._ResourceKey at 0x1a173d3b00>,
 'smi://eu.emsc/unid': <obspy.core.event.resourceid._ResourceKey at 0x1a173f5b38>,
 'smi:eu.emsc-csem/origin_method/NA': <obspy.core.event.resourceid._ResourceKey at 0x1a173c7da0>,
 'smi:smi-registry/organization/DDA': <obspy.core.event.resourceid._ResourceKey at 0x1a173d6eb8>,
 'smi:smi-registry/organization/EMSC': <obspy.core.event.resourceid._ResourceKey at 0x1a173c7978>,
 'smi:smi-registry/organization/NNC': <obspy.core.event.resourceid._ResourceKey at 0x1a173d6a58>}

d-chambers · 2018-07-04T18:42:25Z

Why does the _ResouceKey have to be a singleton? Why not use hash(id(value))?

The _ResourceKey is a trick to enable the python gc to handle some of the work the resource_id was doing (unreliably) on its own before in the __del__ method, so we need to use the _ResourceKey instances in the wekref dicts. The _ResourceKey.get_resource_key method handles two distinct cases:

a. An object that is not an int/str is passed. The _ResourceKey instances needs to be unique to this object only (using its identity) and return a different _ResourceKey instance if a similar (even equal) object is passed to _ResourceKey.get_resource_key. This enables the parent scoping of equal yet different Event objects for example.

b. An object that is an int/str is passed (note ResourceIdentifier.id will always be a str or a type error is raised by the setter). The same _ResourceKey instance should be used with equal int/str but that have different identities. This enables different ResourceIdentifier instances with the same id to refer the same object. The strings in the following example should return the same instance of '_ResourceKey' if passed to _ResourceKey.get_resource_key. If we used the id in the hash this would not work.

a = "'lion roars at stale pull requests'"
b = "'lion roars at stale pull requests'"
assert a == b
assert id(a) != id(b)

I suppose _ResourceKey could be a non-singleton as long as it implemented a hash based on the correct input (the identity of an object or the str/int) but this could greatly increase the number of _ResourceKey instances that python has to create/manage in the weakvalue dict, as well as require an addition setattr. All of this could result in a performance/memory usage hit. In this case I figured it is better to do more with less.

I should note too, I don't think singleton is exactly the right term (but I don't know a better one) since a true singleton should only have one instance, not just one instance per unique input.

Where do the _ResourceKey objects live?

The only strong reference to these should be attached to the ResourceIdentifier instances as the _parent_key or _resource_key attributes. In your example I think there is some caching going on in the read_events call as the following simpler case does work as expected:

import gc

import obspy.core.event as ev

event = ev.Event()
print(dict(ev.ResourceIdentifier._parent_id_tree))  # dict with one entry

del event
gc.collect()

print(dict(ev.ResourceIdentifier._parent_id_tree))  # empty dict

In fact, if we count the references there seem to be many in the default catalog that are being held onto by something (which is why the del isn't working as you expected):

import gc
import sys

import obspy
import obspy.core.event as ev

def print_ref_counts(obj):
    rc = sys.getrefcount(obj)
    # subtract 3 due to temporary reference created in this function/getrefcount
    print(f'obj has {rc -3} refcounts')

event = ev.Event()
print_ref_counts(event.resource_id._parent_key)  # prints "obj has 1 refcounts"

cat = obspy.read_events()
print_ref_counts(cat[0].resource_id._parent_key)  # prints "obj has 13 refcounts"

What is the deal with the descriptor?

Since there is some logic required to figure out the key for caching the _ResourceKey instances (selecting the object identity or the str/int) the descriptor just lets me put that logic in one place rather than every time I set a value to the _parent_key or _resource_key. Of course it could be implemented as a getter/setter (which I think is the same thing in the end) but since there are two of them I figured it would be better to have an external descriptor to avoid writing the same code twice.

d-chambers · 2018-07-04T18:45:51Z

Another thing to perhaps do in this PR, maybe in a new one, is to rework convert_id_to_quakeml_uri to return a copy of the resource_id with the new id rather than modifying it in place. Since the resource_ids are hashed on the id attr we should treat it as immutable.

krischer · 2018-07-04T19:43:33Z

Thanks a bunch for the explanations! I especially did not get the distinction between 1 a) and 1 b) but now everything is clear.

All of this could result in a performance/memory usage hit. In this case I figured it is better to do more with less.

I actually tested this and it is indeed a small bit but measurably (~3%) slower when running test_events.py.

Of course it could be implemented as a getter/setter (which I think is the same thing in the end) but since there are two of them I figured it would be better to have an external descriptor to avoid writing the same code twice.

Yea that's the same thing. Python internally implements properties as descriptors. I do like the descriptor approach as it (as you say) forces a single implementation.

Another thing to perhaps do in this PR, maybe in a new one, is to rework convert_id_to_quakeml_uri to return a copy of the resource_id with the new id rather than modifying it in place. Since the resource_ids are hashed on the id attr we should treat it as immutable.

👍 for dealing with in a separate PR. We have to be a bit careful with this and maybe raise a few new warnings. Some people might use ResourceIdentifiers as dictionary keys and then write out a QuakeML file which (in your proposal) would create new objects and the dictionary keys might thus become meaningless. But lets discuss it elsewhere.

Regarding the references: I investigated a bit more (gc.get_referrers() is actually pretty helpful) and it turns out that all the spurious references are coming from ipython and especially all its magic like the jedi code-completion. Just running it in the normal interpreter completely fixes all my issues and the high refcount in your example comes internal references (mostly _parent_keys). So all is well.

import gc
import obspy
from obspy.core.event.resourceid import _ResourceKey

cat = obspy.read_events()
print(len(_ResourceKey._singleton_cache))

del cat[0]
gc.collect()
print(len(_ResourceKey._singleton_cache))

del cat
gc.collect()
print(len(_ResourceKey._singleton_cache))

Output:

17
12
0

@d-chambers If you have no objections I would merge this now.

d-chambers · 2018-07-04T20:28:03Z

I have not objections, go ahead and merge 🎉

d-chambers · 2018-07-04T21:14:13Z

Actually hold off merging for now. I think I found another (small) issue.

d-chambers · 2018-07-04T22:40:27Z

Ok, I made a slight change to the logic that governs when a warning is issued and added a test for it. If you approve of the change go ahead and merge.

Thanks again @krischer for all the time you spent reviewing this!

krischer · 2018-07-05T08:21:34Z

🎉

Thanks @d-chambers!

d-chambers added .core.event issues related to our Catalog/Event functionality cleanup code refactoring etc. labels Mar 20, 2018

krischer requested changes Mar 26, 2018

View reviewed changes

d-chambers force-pushed the resource_id_refactor branch from a676a5d to 42c2309 Compare April 15, 2018 03:25

d-chambers added this to the 1.2.0 milestone Apr 19, 2018

d-chambers added the ready for review PRs that are ready to be reviewed to get marked ready to merge label Apr 19, 2018

d-chambers mentioned this pull request Apr 20, 2018

nordic: Bug-fix amplitude and wavefile write #2021

Merged

5 tasks

d-chambers mentioned this pull request May 19, 2018

Implement validators for core obspy classes #2154

Open

d-chambers added 17 commits July 3, 2018 17:58

begin refactor

763c447

continue refactor of ResourceID

886e923

fix docstrings, use _debug_class_state context manager

54bc75e

python 2 support

e566026

small style fixes

81f244a

add code and tests for pickle support

a18356f

address krischer review

2f14d37

add test for multithreaded catalog creation

56f56be

start parent_key refactor

1480ef4

work on resource_id refactor

c68311e

continue refactor

d920cf9

fix resource_id doctest and flake issue in test

02d3d45

cleanup test catalog creation

0d14ce0

update resource_id docs

09b8491

refactor recursive object finder

8c878ab

exclude built-ins from unique id tests

6b36a38

implement custom function for resource_id recursion

5c94faa

krischer added 2 commits July 3, 2018 20:11

Removing the now obsolete rlock() decorator.

977120c

Also removing the _decorate_polyfill() function.

160d8dc

krischer reviewed Jul 3, 2018

View reviewed changes

d-chambers added 5 commits July 3, 2018 20:34

de-indent _yield_resource_id_parent_attr

83bca69

change order of fun inputs in _yield_resource_id_parent_attr

9c01708

re-enable skipped doctests

207b856

add test for warnings not being raised

63e7817

fix warning message

aa26cfd

krischer added 6 commits July 4, 2018 15:50

Fixing an issue introduced via the rebasing.

2fcbbad

Readding another test that got lost in the rebase.

5b87da9

Some cleanup.

a9d923b

Adding changelog entry.

62e52dd

Scope the resource ids for each event reading plugin.

be8a415

Learning the alphabet.

ba9ecaf

fix typo in warning

33e0f8c

d-chambers added 2 commits July 4, 2018 15:36

add test for better warning

dffd40d

fix resource_id warning logic

fce9aa5

krischer merged commit a2689ca into master Jul 5, 2018

krischer deleted the resource_id_refactor branch July 5, 2018 08:21

d-chambers mentioned this pull request Jul 16, 2018

Request/question: Slow QuakeML reading #1910

Open

d-chambers mentioned this pull request May 25, 2021

Resource-ID linking sometimes fails when reading Quakeml-files in parallel #2842

Closed

		@@ -639,6 +638,111 @@ def buffered_load_entry_point(dist, group, name):
		return _ENTRY_POINT_CACHE[hash_str]


		def yield_obj_parent_attr(obj, cls=None, is_attr=None, has_attr=None):


		ids = set() # id cache to avoid circular references

		def func(obj, attr=None, parent=None):

Resource id refactor #2091

Resource id refactor #2091

Conversation

d-chambers commented Mar 20, 2018

PR Checklist

krischer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d-chambers commented Mar 27, 2018

krischer commented Apr 11, 2018 • edited

d-chambers commented Apr 11, 2018

d-chambers commented Apr 12, 2018

d-chambers commented Apr 15, 2018 • edited

krischer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d-chambers commented Jul 4, 2018

krischer commented Jul 4, 2018

d-chambers commented Jul 4, 2018

d-chambers commented Jul 4, 2018

krischer commented Jul 4, 2018

d-chambers commented Jul 4, 2018

d-chambers commented Jul 4, 2018

d-chambers commented Jul 4, 2018

krischer commented Jul 5, 2018

krischer commented Apr 11, 2018 •

edited

d-chambers commented Apr 15, 2018 •

edited