Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more efficient core #5007

Open
vks opened this issue Apr 20, 2010 · 10 comments
Open

more efficient core #5007

vks opened this issue Apr 20, 2010 · 10 comments

Comments

@vks
Copy link
Contributor

vks commented Apr 20, 2010

This should result in a considerable speed improvement. I think it could be
done using a dict (or set) for Add() and commutative Mul() or a list for
noncummutative Mul() (maybe another type should be created for this). 

The main motivation is that we can reuse already created objects. This
would annihilate the current overhead of recreating the whole symbolic tree
after any operation. Example:

Add(<1000 args>) + x

currently recreates an Add() with 1001 args for immutability's sake. This
could be done in-place, just adding x.

Quoting Pearu from issue 3461 :

"""
1) __hash__ needs to use include class name to hash computation, otherwise
you'll get the same hash for expressions like 'x*y', 'x+y'. Also, I would
sort Add.items() before computing hash, this ensures that one always
gets the same hash value. Saving hash value is a nice trick. I wonder
how this compares to the code in sandbox/core?
2) using dict as Add base makes Add mutable, there should be a big warning
about not changing Add objects in place.
3) in the current core Add.flatten, Mul.flatten actually use dict-s to
canonize expressions.
4) the order of dict keys may be variable for the same expression, not sure
if it will be a problem in other places than __hash__...
"""

So Add() already uses dicts internally, just to make tuples after any
operation instead of reusing them. What a waste.

Fabian told me that ginac also uses dicts internally. And he also thinks
that this is essential to become comparative to other CAS. Opposed to him,
I think that this should not be to hard to implement. :) But a painful
amount of test-fixing is probably inevitably.

We need to think how to prevent people from doing nasty things with the
internal mutable data structure without regenerating the hash. A thin
wrapper could solve this. Or we just force args() to return a tuple (which
would offer great backward compatibility), but I think we should expose
more flexibility in the long run. Maybe only as a private attribute for
those who know what they are doing?

Original issue for #5007: http://code.google.com/p/sympy/issues/detail?id=1908
Original author: https://code.google.com/u/Vinzent.Steinberg@gmail.com/
Referenced issues: #3461

@vks
Copy link
Contributor Author

vks commented Apr 19, 2010

See also http://code.google.com/p/sympycore/ , which probably implemented this.

Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c1
Original author: https://code.google.com/u/Vinzent.Steinberg@gmail.com/

@sympy-issue-migrator
Copy link

You are very brave by opening this issue :-)

BTW, ginac uses lists in trees (which is similar to dicts, but not the same): http://www.ginac.de/tutorial/Internal-representation-of-products-and-sums.html

Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c2
Original author: https://code.google.com/u/103948515893177094863/

@rlamy
Copy link
Member

rlamy commented Apr 20, 2010

You're mixing two separate issues: efficient implementation of Add/Mul and mutability. 

The first issue has already been mostly solved in the (easy) case of boolean
operations, look at the code for LatticeOp in core/operations.py. Your example
doesn't require mutability since the result of Add(...) + x is necessarily a copy
("+" doesn't mutate its arguments).

Mutability would be required for making things like Add(...) += x efficient.

Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c3
Original author: https://code.google.com/u/101272611947379421629/

@vks
Copy link
Contributor Author

vks commented Apr 20, 2010

I don't think it would be only useful for '+='-style operations. If we have

Add(x, y, z) + Add(x, a)

and would use dictionaries instead of tuples for args, we could merge the the two
dicts more efficiently (reusing the first or second add) than by just recreating a
tuple, if I understand it correctly.

But please feel free to clarify how an efficient implementation can be achieved
without mutability.

Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c4
Original author: https://code.google.com/u/Vinzent.Steinberg@gmail.com/

@rlamy
Copy link
Member

rlamy commented Apr 20, 2010

Of course, an efficient implementation has to use mutable structures internally. But
this is quite unrelated to the mutability of the object itself. To get an "immutable"
object, you only need a __hash__() and to ensure that the equality relation is
constant wrt. time (i.e. if a == b at one time, then it is true at all times).

IIRC, the sympycore solution is to consider objects initially mutable until their
hash is computed, and immutable afterwards.

Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c5
Original author: https://code.google.com/u/101272611947379421629/

@asmeurer
Copy link
Member

I don't understand how you plan on achieving the efficient implementation along with mutability.  The 
dictionaries that you are talking about have SymPy objects as their keys.  For example, x*y + 2*z is represented 
as {Mul(x, y):1, z:2} in the Add. They have to be immutable.  Are you suggesting that it would be somehow 
possible to rehash those objects once they are in a dictionary somewhere?  

By the way, when I met Ondrej last summer and hacked on the core a bit, we essentially came to the conclusion 
that Add and Mul should just use the dictionary straight.  The work is at http://github.com/certik/sympyx/ , and 
the bits from last summer are on the handler branch (we also were looking at having each class know how to 
combine itself wrt to Add, Mul, etc.).  I don't know to what extent it uses dictionaries more than the current Add.  
I do know that Ondrej was telling me that he tried using glib as a replacement to Python dicts in Cython, only to 
find that the Python dicts were faster.  So this is definitely an efficient implementation if we can get it to work.  

Ondrej, do you remember any of this?

**Cc:** ondrej.certik  

Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c6
Original author: https://code.google.com/u/asmeurer@gmail.com/

@vks
Copy link
Contributor Author

vks commented Apr 21, 2010

sympyx seems to be what we want. Do we need to get rid of the old assumptions before 
merging it?

**Summary:** more efficient core  

Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c7
Original author: https://code.google.com/u/Vinzent.Steinberg@gmail.com/

@asmeurer
Copy link
Member

Yeah, that was the other conclusion we came to.

Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c8
Original author: https://code.google.com/u/asmeurer@gmail.com/

@vks
Copy link
Contributor Author

vks commented Apr 21, 2010

**Blockedon:** 4983  

Referenced issues: #4983
Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c9
Original author: https://code.google.com/u/Vinzent.Steinberg@gmail.com/

@asmeurer
Copy link
Member

**Status:** Valid  

Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c10
Original author: https://code.google.com/u/asmeurer@gmail.com/

@asmeurer
Copy link
Member

**Blocking:** 5040  

Referenced issues: #5040
Original comment: http://code.google.com/p/sympy/issues/detail?id=1908#c11
Original author: https://code.google.com/u/asmeurer@gmail.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants