Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stronger references in CachedRepresentation #24954

Closed
jdemeyer opened this issue Mar 12, 2018 · 24 comments
Closed

Stronger references in CachedRepresentation #24954

jdemeyer opened this issue Mar 12, 2018 · 24 comments

Comments

@jdemeyer
Copy link

There is some very bad behaviour related to CachedRepresentation caching that I'm observing on #24742 (but this is otherwise unrelated to that ticket):

sage: timeit('MatrixSpace(ZZ,3,3)')
625 loops, best of 3: 117 µs per loop

Now, we try again but we first create a strong reference:

sage: S = MatrixSpace(ZZ,3,3)
sage: timeit('MatrixSpace(ZZ,3,3)')
625 loops, best of 3: 4.13 µs per loop

This is much faster the second time! In the first example, the caching
of CachedRepresentation.__classcall__ is pointless since there is no
strong reference to the entry in the cache, so it gets deleted
immediately whenever the MatrixSpace(ZZ,3,3) is deleted.

This is just the usual Py_DECREF of Python objects, it has nothing to do with the cyclic garbage collector: the behaviour remains the same even
with gc.disable().

This makes me think that we might need to change CachedRepresentation
to use semi-strong references: these would only be deleted by the
cyclic garbage collector but not by a simple Py_DECREF().

Component: misc

Author: Jeroen Demeyer

Branch/Commit: a802f12

Reviewer: Marc Mezzarobba

Issue created by migration from https://trac.sagemath.org/ticket/24954

@jdemeyer jdemeyer added this to the sage-8.2 milestone Mar 12, 2018
@simon-king-jena
Copy link
Member

comment:1

Idea: SemiWeakValueDictionary will initially create a strong reference to each value (say, by means of a class attribute holding a list containing the items), and gc.collect() will be monkey-patched so that it first removes all strong references kept by SemiWeakValueDictionary before doing the cyclic collection.

In that way, the values of SemiWeakValueDictionary would be permanent till the next cyclic garbage collection occurs.

Would that work?

@simon-king-jena
Copy link
Member

comment:2

Proof of concept:

sage: from weakref import WeakValueDictionary
sage: from gc import collect
sage: import gc
sage: class MyThing(object):
....:     def __del__(self):
....:         print "<{}> gone".format(id(self))
....:     def __init__(self, n):
....:         print "new thing for {}".format(n)
....:         
sage: class MyWVD(object):
....:     refs = []
....:     def __init__(self):
....:         self.D = WeakValueDictionary()
....:     def __getitem__(self, k):
....:         return self.D[k]
....:     def __setitem__(self, k, v):
....:         self.D[k] = v
....:         MyWVD.refs.append(v)
....:         
sage: D = MyWVD()
sage: def my_collect(generation=None):
....:     MyWVD.refs = []
....:     if generation is not None:
....:         collect(generation)
....:     else:
....:         collect()
....:         
sage: gc.collect = my_collect
sage: a = D[1] = MyThing(1); del a
new thing for 1
sage: a = D[2] = MyThing(2); del a
new thing for 2
sage: gc.collect()
<140253944715856> gone
<140253944736848> gone

@simon-king-jena
Copy link
Member

comment:3

Timing:

sage: class MyThing(object):
....:     pass
....:         
sage: D1 = MyWVD()
sage: D2 = WeakValueDictionary()
sage: def test1(n):
....:     try:
....:         return D1[n]
....:     except KeyError:
....:         a = D1[n] = MyThing()
....:         return a
....:     
sage: def test2(n):
....:     try:
....:         return D2[n]
....:     except KeyError:
....:         a = D2[n] = MyThing()
....:         return a
....:     
sage: %timeit test1(5)
The slowest run took 54.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.14 µs per loop
sage: %timeit test2(5)
The slowest run took 15.23 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.87 µs per loop

vs.

sage: %timeit test1(5); gc.collect()
10 loops, best of 3: 60 ms per loop
sage: %timeit test2(5); gc.collect()
10 loops, best of 3: 60 ms per loop

@embray
Copy link
Contributor

embray commented Mar 12, 2018

comment:4

You can't "just" monkey-patch the gc module, since what you can reach from Python are just Python-level wrappers for lower-level C functions used internally by the Python interpreter. This only changes what happens if one manually calls gc.collect() (in which case there's not much point in monkey-patching).

@simon-king-jena
Copy link
Member

comment:5

Replying to @embray:

You can't "just" monkey-patch the gc module, since what you can reach from Python are just Python-level wrappers for lower-level C functions used internally by the Python interpreter. This only changes what happens if one manually calls gc.collect() (in which case there's not much point in monkey-patching).

You're right. Hm. Then one could perhaps construct a guardian that will not be collected unless a cyclic garbage collection happens, with a __del__ method that triggers weakening of the SemiWeakValueDictionary. I'll try to construct it...

@jdemeyer
Copy link
Author

comment:6

I see two possible implementations, both quite simple:

(A) Keep strong references to the last 100 (or whatever number) values inserted in a WeakValueDictionary. Basically a Python list lastvalues with a wrapping-aroud counter i such that, whenever an item is inserted, it is also inserted as lastvalues[i]. This requires changing only WeakValueDictionary.

(B) Keep a strong reference from the value to itself (self.__self = self) forcing the object to be part of a reference cycle. This way, it can be deleted only by the garbage collector. This requires changing the values, i.e. CachedRepresentation.

I'll try to implement (A).

@simon-king-jena
Copy link
Member

comment:7

Replying to @jdemeyer:

I see two possible implementations, both quite simple:

(A) Keep strong references to the last 100 (or whatever number) values inserted in a WeakValueDictionary. Basically a Python list lastvalues with a wrapping-aroud counter i such that, whenever an item is inserted, it is also inserted as lastvalues[i]. This requires changing only WeakValueDictionary.

(B) Keep a strong reference from the value to itself (self.__self = self) forcing the object to be part of a reference cycle. This way, it can be deleted only by the garbage collector. This requires changing the values, i.e. CachedRepresentation.

I'll try to implement (A).

I will try to implement (C), which is:

(C) create a "guardian" with a reference to itself (so that it will only be deleted during cyclic garbage collection) and create a weak reference with callback from the class SemiWeakValueDictionary to the guardian. The callback function will both restore the weak reference to the guardian and clears the strong references to the values -- and apparently the callback function is only called when a cyclic collection happens.

@jdemeyer
Copy link
Author

comment:8

So, in practice, the main difference between (A) and (C) is when are the strong references cleared? With (A), it would happen after 100 objects have been inserted in a WeakValueDictionary and with (C) it would happen upon garbage collection. I have no idea what works best.

@simon-king-jena
Copy link
Member

comment:9

I thought (C) would work as follows:

sage: from weakref import WeakValueDictionary, ref
sage: class Guardian(object):
....:     def __init__(self):
....:         self._self = self
....:         
sage: class SWVD(object):
....:     @staticmethod
....:     def collect(*foo):
....:         print "collect"
....:         SWVD.guard = ref(Guardian(), cls.collect)
....:         SWVD.refs = []
....:     refs = []
....:     guard = ref(Guardian(), collect)
....:     def __init__(self):
....:         self.D = WeakValueDictionary()
....:     def __getitem__(self, k):
....:         return self.D[k]
....:     def __setitem__(self, k, v):
....:         self.D[k] = v
....:         SWVD.refs.append(v)
....:         
sage: class MyThing(object):
....:     def __del__(self):
....:         print "<{}> gone".format(id(self))
....:     def __init__(self, n):
....:         print "new thing for {}".format(n)
....:         
sage: D = SWVD()
sage: a = D[2] = MyThing(2); del a
new thing for 2
sage: a = D[2] = MyThing(2); del a
new thing for 2
sage: a = D[2] = MyThing(2); del a
new thing for 2

That's fine. But something went wrong:

sage: import gc
sage: gc.collect()
Exception TypeError: "'staticmethod' object is not callable" in <staticmethod object at 0x7f8cda18b2f0> ignored
58

So, why is the static method not callable? What requirements are made for the callback of a weak reference?

@simon-king-jena
Copy link
Member

comment:10

Ouch, I found the problem: An unreferenced variable "cls" in the callback function that shouldn't belong there...

Nonetheless, so far, I don't get it to work.

@simon-king-jena
Copy link
Member

comment:11

Got it.

Problem: During creation of the class SWVD, I wanted to refer to an attribute of that class. Apparently it didn't work. So, now, I am setting the class attribute during initialisation of the first instance of that class.

sage: from weakref import WeakValueDictionary, ref
sage: import gc
sage: class Guardian(object):
....:     def __init__(self):
....:         self._self = self
....:         
sage: class SWVD(object):
....:     @classmethod
....:     def collect(cls, *foo):
....:         print "collect"
....:         cls.guard = ref(Guardian(), cls.collect)
....:         cls.refs = []
....:     refs = []
....:     guard = None
....:     def __init__(self):
....:         if self.__class__.guard is None:
....:             self.__class__guard = ref(Guardian(), self.__class__.collect)
....:         self.D = WeakValueDictionary()
....:     def __getitem__(self, k):
....:         return self.D[k]
....:     def __setitem__(self, k, v):
....:         self.D[k] = v
....:         SWVD.refs.append(v)
....:         
sage: class MyThing(object):
....:     def __del__(self):
....:         print "<{}> gone".format(id(self))
....:     def __init__(self, n):
....:         print "new thing for {}".format(n)
....:         
sage: D = SWVD()

Elements of a SemiWeakValueDictionary aren't immediately deleted:

sage: a = D[2] = MyThing(2); del a
new thing for 2
sage: a = D[2] = MyThing(2); del a
new thing for 2
sage: a = D[2] = MyThing(2); del a
new thing for 2

But they are deleted when a garbage collection happens:

sage: gc.collect()
collect
<139776980263696> gone
<139776980313808> gone
<139776980313104> gone
71

I guess the above would basically work, although I would prefer to define SWVD.guard during creation of the class and not during creation of its first instance.

Edit: I also tested a = D[2] = MyThing(2); del a in a loop, and indeed a "spontaneous" garbage collection does result in the deletion of the dictionary values, just as it should. So, this time I am not just touching the Python layer of the gc module...

@jdemeyer
Copy link
Author

@jdemeyer
Copy link
Author

comment:13

This is my proof-of-concept implementation. It simply adds strong references to the last 64 objects added to the CachedWeakValueDictionary.


New commits:

99717ebCachedWeakValueDictionary with strong references to last values

@jdemeyer
Copy link
Author

Commit: 99717eb

@vbraun
Copy link
Member

vbraun commented Mar 12, 2018

comment:14

Thats the obvious solution, keep a finite-length lru cache around.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 12, 2018

Changed commit from 99717eb to bd9e368

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 12, 2018

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

bd9e368CachedWeakValueDictionary with strong references to last values

@jdemeyer
Copy link
Author

comment:17

Replying to @vbraun:

Thats the obvious solution, keep a finite-length lru cache around.

I'm not using a LRU cache, but a much simpler FIFO.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 15, 2018

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a802f12CachedWeakValueDictionary with strong references to last values

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 15, 2018

Changed commit from bd9e368 to a802f12

@jdemeyer
Copy link
Author

comment:20

Fixed a few test failures.

@mezzarobba
Copy link
Member

comment:21

This solves the issue (random slowdowns in the test suite of ore_algebra) that initially led me to the examples mentioned on #24742. And a way to cache the results of the last few calls to a function is something I was missing for other reasons as well.

@mezzarobba
Copy link
Member

Reviewer: Marc Mezzarobba

@vbraun
Copy link
Member

vbraun commented May 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants