Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory problems with the cache #6321

Closed
sympy-issue-migrator opened this issue Apr 13, 2012 · 19 comments
Closed

Memory problems with the cache #6321

sympy-issue-migrator opened this issue Apr 13, 2012 · 19 comments

Comments

@sympy-issue-migrator
Copy link

When creating a lot of symbols, the memory assigned to them is not freed when there is no longer any reference to them. The following program use a lot of memory (more than 1.7GB ) although it only has one reference to the object.

-------------
import sympy 
n = 400000
for i in xrange(n):
    u=sympy.symbols('Re%d' %i, each_char=False)
    4.3*u
    7.8*u
    6.3*u
    9.12*u
    14.3*u
    17.48*u
--------------

The last 6 lines are not saved anywhere; nonetheless if you remove them the program uses less memory.


I am using debian squeeze with python 2.6.6 and sympy 0.6.7-1.1., and I also tried it on a Mac with python 2.7.2 and sympy 0.7.1

Best, Pablo.

Original issue for #6321: http://code.google.com/p/sympy/issues/detail?id=3222
Original author: https://code.google.com/u/114923721616334512865/

@asmeurer
Copy link
Member

This is probably the cache.  Try setting the environment variable SYMPY_USE_CACHE=no and see if it uses less memory.  If so, you can fix it by manually clearing the cache when you know you won't need it (see https://github.com/sympy/sympy/wiki/faq , "How do I clear the cache?").

**Labels:** Caching  

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c1
Original author: https://code.google.com/u/asmeurer@gmail.com/

@sympy-issue-migrator
Copy link
Author

Exactly, it was the cache, I tried with SYMPY_USE_CACHE=no and there is no longer any problem. Thanks.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c2
Original author: https://code.google.com/u/114923721616334512865/

@asmeurer
Copy link
Member

Great.  Like I said, you'll probably want to use judicious calls to clear_cache() rather than just disabling it completely, as some parts of SymPy can be quite slow without the cache.

I'm going to leave this issue open.  There are two issues with the cache here.  One is that it keeps references to objects alive, even if there are no other references.  We could solve this by using weakrefs.  

The second is that the size of the cache grows without bound.  We should consider putting a cap on the cache size, and performing some kind of cleanup if it gets too large.

**Summary:** Memory problems with the cache  
**Status:** Valid  

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c3
Original author: https://code.google.com/u/asmeurer@gmail.com/

@sympy-issue-migrator
Copy link
Author

The other problem with the cache is that people who don't know about it (or didn't know, like me) may spend a lot of time using profilers and generally blaming python (or their own source) for memory leaks.

I found the sympy "cache" concept minutes ago and this solved issues I had for the last two months (my computations would eat all memory, and I could not locate the leak using profilers).

Using "SYMPY_USE_CACHE=no" slowed my computations 50x and memory issues were gone; using clear_cache() periodically fixed memory issues at no performance cost.

The ideal solution would be that cache would be "on" by default (as it is now), but would wisely use available memory (who knows, maybe auto-clearing cache on each garbage collection by default? users would be able to change this behavior by "SYMPY_USE_CACHE=persistent". Or clearing cache when available RAM is nearly entirely used - as long as this can be reliably detected).

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c4
Original author: https://code.google.com/u/106540128096796395835/

@asmeurer
Copy link
Member

I agree that this should be documented better.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c5
Original author: https://code.google.com/u/asmeurer@gmail.com/

@sympy-issue-migrator
Copy link
Author

Documentation is one thing, but there is something more important. The cache mechanism can actually hurt sympy because for some people, "sympy crashes" or "sympy is not able to solve my problem". All this because of the cache.

This issue is not limited to "big" and "complicated" problems, but it also concerns simple problems where sympy functions are called many times. Any kind of an adaptive mechanism like the ones suggested in previous posts would be highly recommended so that a (hidden) part of sympy never hurts sympy.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c6
Original author: https://code.google.com/u/117410852413913309498/

@asmeurer
Copy link
Member

I completely agree. There was some discussion on the mailing list about this issue a month or two ago. The idea is to make the cache more local, so that it does its job for individual computations, but is cleared in between. Unfortunately, Tom, the person who was discussing it, was not accepted for GSoC (he was accepted by another organization), so this may not be fixed anytime soon.  You can read his GSoC proposal https://github.com/sympy/sympy/wiki/GSOC-2013-Application-Tom-Bachmann:-Removing-the-old-assumptions-module to see what the ideas were. 

Of course, a much more tractable solution for right now would be to automatically clear the cache when it reaches a certain size. Do you know how to efficiently and accurately check how much memory is used by a Python object?

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c7
Original author: https://code.google.com/u/asmeurer@gmail.com/

@sympy-issue-migrator
Copy link
Author

No, I don't have enough experience in python (yet :-) ). http://stackoverflow.com/questions/33978/find-out-how-much-memory-is-being-used-by-an-object-in-python Checking for the number of objects in the cache could be quick and maybe a reasonable approximation though.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c8
Original author: https://code.google.com/u/117410852413913309498/

@asmeurer
Copy link
Member

Oh wow, that's too complicated. Though actually we only care about the memory from objects that have a refcount of 1. 

I guess just counting the number of objects should work.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c9
Original author: https://code.google.com/u/asmeurer@gmail.com/

@sympy-issue-migrator
Copy link
Author

For python 2.6+, there is:

>>> import sys
>>> sys.getsizeof(2)
12
>>> sys.getsizeof(2.0)
16
>>> sys.getsizeof([1,2,3])
48
>>> sys.getsizeof([1,2,3.0])
48
>>> sys.getsizeof([1,2,3,"a"])
52
>>> sys.getsizeof([1,2,3,"abcd"])
52
>>> sys.getsizeof("a")
22
>>> sys.getsizeof("abcd")
25

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c10
Original author: https://code.google.com/u/117410852413913309498/

@asmeurer
Copy link
Member

But does it work on objects? That's what's stored in the cache: SymPy objects.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c11
Original author: https://code.google.com/u/asmeurer@gmail.com/

@sympy-issue-migrator
Copy link
Author

Looks like it is a "shallow" sizeof. But even if it worked with objects, this is not the right way to go IMO. How much would you expect the cache to be allowed to use? 100 KB? 10 MB? 200 MB? It all depends on the system, and how many sympy instances are running at the same time, etc. Unless you *know* that cache does not help much when it grows bigger than X megabytes.

So the change in cache should be either conceptual (like the "make it local" idea or "clear on every garbage collection" or "keep only 100 most recently used items") or adaptive ("clear cache when 10% of available memory is used" or "clear when there is no more than 100 MB left"), or a hybrid of both.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c12
Original author: https://code.google.com/u/117410852413913309498/

@sympy-issue-migrator
Copy link
Author

A simple yet efficient idea for the cache is employing the least-recently-used-like (LRU-like) list. Every cache "hit" moves an item to the top of the list. When a new item (due to the cache "miss") is added to the cache (to the top of the list) and the limit of the cache (e.g. 1000 items) is exceeded, the bottommost item is removed.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c13
Original author: https://code.google.com/u/117410852413913309498/

@asmeurer
Copy link
Member

asmeurer commented Jul 2, 2013

Yes, from my understanding LRU is quite effective. We should just make a length limit, because that is easiest to implement, and fast (unless you want to try something more complex).

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c14
Original author: https://code.google.com/u/asmeurer@gmail.com/

@asmeurer
Copy link
Member

asmeurer commented Jul 7, 2013

By the way, the cache actually makes memory usage better in some cases, because it prevents multiple instances of the same object from being created. We should probably more to more structured ways to do this, though, like function local caches and SingletonRegistry.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c15
Original author: https://code.google.com/u/asmeurer@gmail.com/

@asmeurer
Copy link
Member

asmeurer commented Mar 5, 2014

We have moved issues to GitHub https://github.com/sympy/sympy/issues .

**Labels:** Restrict-AddIssueComment-Commit  

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c16
Original author: https://code.google.com/u/asmeurer@gmail.com/

@asmeurer
Copy link
Member

asmeurer commented Mar 5, 2014

We have moved issues to GitHub https://github.com/sympy/sympy/issues .

Original comment: http://code.google.com/p/sympy/issues/detail?id=3222#c17
Original author: https://code.google.com/u/asmeurer@gmail.com/

@pbrady
Copy link
Member

pbrady commented Aug 20, 2014

Fixed with #7464. Closing in 24 hrs if no objections.

@asmeurer
Copy link
Member

Yes, please open a new issue if you still have memory problems with the new cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants