Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Fix #5072: hash(Integer) should return the int #2973

Merged
merged 2 commits into from

3 participants

@skirpichev

fix #5072

@smichr, please take look on changes in add.py. _addsort and _mulsort were introduced by 8942676. Why cmp_to_key(Basic.compare) wasn't used as a key in both places?

@asmeurer
Owner

So the issue noted on the issue page no longer appears?

@skirpichev
@skirpichev

Yes. Test in test_nseries.py - works.

@smichr
Collaborator

I wasn't involved with the decision to use hashes, I just tried to make the change in one place so that whereever it was being used we would be consistent. The comment says that uses hashes was fast, so I imagine that things will have slowed down now that you use Basic.compare.

@smichr
Collaborator

I suspect the better change would be to use hash(self.p) but to leave the sorting alone.

@skirpichev

The comment says that uses hashes was fast, so I imagine that things will have slowed down now that you use Basic.compare.

The problem was that using hash doesn't guarantee to produce some well-defined canonical order. In general there are hash collisions. That's why we have (this problem)[https://code.google.com/p/sympy/issues/detail?id=1973#c3] with hash, but not with Basic.compare.

I suspect the better change would be to use hash(self.p) but to leave the sorting alone.

That's impossible. The only question is: how to change sorting function.

@smichr
Collaborator

Maybe use

def _addsort(args):
    # in-place sorting of args

    # Currently we sort things using hashes, as it is quite fast. A better
    # solution is not to sort things at all - but this needs some more
    # fixing.
    from sympy.core.compatibility import cmp_to_key, ordered
    hash=cmp_to_key(lambda x, y: 0 if x == y else (1 if tuple(ordered([x, y], warn=True))[0] == y else -1))
    args.sort(key=hash)

then you'll never worry about hash collisions again since it will tell you if there was a collision.

I made this change and ran the core suite and no tests failed (though wester was still running).

@skirpichev

then you'll never worry about hash collisions again since it will tell you if there was a collision.

I don't sure I understand you here. Is this code working with hash collisions?

If so, it's faster then Basic.compare? Should we change both places (i.e., mul.py - too)?

BTW, can we use lambda x: x.sort_key() as a key? (This seems to be working on our test suite as well.)

@asmeurer
Owner

BTW, can we use lambda x: x.sort_key() as a key? (This seems to be working on our test suite as well.)

That's what default_sort_key was. Somehow it got more complicated, though.

@smichr
Collaborator

You can check the speed of ordered -- when things are more complicated than a singleton it will likely be faster since it sorts on the basis of the number of nodes, first. Then only for ties it applies the additional key. This is all discussed in its docstring, I believe. I think it would be fine to use for Mul, too. In either place, however, there may be changes to tests as a result of new orderings.

@skirpichev

@smichr, what do you think about using default_sort_key? (last commit)

btw, I can change mul.py as well, but there is subtle failures in sympy/solvers/ode.py docstrings with hint='linear_coefficients'. Could you take a look?

@skirpichev skirpichev added the Core label
@smichr
Collaborator

It fails because it doesn't collect on the derivative as master does:

>>> var('x');f=Function('f')
x
>>>
>>>
>>> q=x*Derivative(f(x), x) - 6*x + f(x)*Derivative(f(x), x) + f(x)
>>> collect(q,f(x).diff(x))
x*Derivative(f(x), x) - 6*x + (Derivative(f(x), x) + 1)*f(x)

instead of (in master)

>>> collect(q,f(x).diff(x))
-6*x + (x + f(x))*Derivative(f(x), x) + f(x)

The fix is to use exact=True, I believe. It is used in other places where f or df are being used as the collecting terms. (That change should probably be made anyway.)

@smichr
Collaborator

I didn't set up a separate PR to this, but there are 3 commits in my ode branch that you might want to use. Modifying Dummy's sort_key is important (especially if ordered is going to be used). And an anecdotal observation about using default_sort_key vs ordered, the latter runs core tests in 306 seconds while the former took 338.

@skirpichev

@smichr, thank you. Now, some problems (e.g., for ode docstrings) are solved in my 1973-integer-hash-DRAFT branch. Here is test run.

@smichr
Collaborator

The first two expressions involving a and k are the same as shown by unrad; the last one involving theta simply has arguments (as expected) in a different order but is otherwise identical.

@skirpichev

Now, there should be only failures from meijerint module:
https://travis-ci.org/skirpichev/sympy/builds/20123489

@smichr
Collaborator

Just a different order of args, I suppose, for the meijer failures?

Are you going to add the sort_key commit for Dummy?

@smichr
Collaborator

nevermind about the Dummy sort key -- I opened another pull request

@skirpichev

Just a different order of args, I suppose, for the meijer failures?

Apparently, yes. But I'm not sure where exactly it was broken.

@smichr
Collaborator

I assume it's just in the new sorting of Mul args. So just put the new expression in the test so it passes. (Or am I misunderstanding something?)

@skirpichev
@smichr
Collaborator

If I replace default_sort_key with the ordered key I gave you the first two failures pass but the line 460 failure does not.

If I just change the Mul sorting back to the way it is in master then it works. If I change the Mul sorting to use hash then it passes but the test at line 481 in test_transforms fails.

Maybe somewhere an assumption is made that a certain argument always appears last but it no longer does. We already know that the non-commutative terms are always placed at the end and the Rational is always placed first. My guess is that somewhere an assumption about the last non-commutative term is being made which is no longer true.

@smichr
Collaborator

The meijerint algorithm(s) don't try to obtain the optimal solution so it is expected that different results will be obtained. So I think it's ok to simply change the tests. (You changed a trig test to tan(x)**2/2 in your WIP-1973 branch and that's not correct.) Please see my 1973a branch for a build where I change how the mul args are handled; see 1973 for an almost working version that uses ordered (but there are some interesting test failures there, too, https://travis-ci.org/smichr/sympy/builds/20335796 ).

@skirpichev

You changed a trig test to tan(x)**2/2 in your WIP-1973 branch and that's not correct.

It looks as a mathematically correct output (same value, up to constant term).

too, https://travis-ci.org/smichr/sympy/builds/20335796 ).

Try to rebase your branch.

@smichr
Collaborator

correct output

Hmm, so it is!

rebase your branch

I'm giving up on this. My 1973a branch has some docstring and minor code edits that you can feel free to take. (This is a rabbit chase that's taking me too far afield. As you said about the meijerint failures, "it's not so simple".)

@skirpichev skirpichev changed the title from Fix issue 1973: hash(Integer) should return the int to Fix #5072: hash(Integer) should return the int
@skirpichev

Ok. Any objections against merge of these two commits? I have not observe any performance penalty yet.

Everything else is in my 1973-integer-hash-DRAFT branch.

@smichr
Collaborator

Test pass. This looks like a good step forward. It's in.

@smichr smichr merged commit 1cb4fab into sympy:master
@skirpichev skirpichev deleted the unknown repository branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
View
12 sympy/core/add.py
@@ -2,8 +2,8 @@
from collections import defaultdict
-from sympy.core.core import C
-from sympy.core.compatibility import reduce, is_sequence
+from sympy.core.basic import C, Basic
+from sympy.core.compatibility import cmp_to_key, reduce, is_sequence
from sympy.core.singleton import S
from sympy.core.operations import AssocOp
from sympy.core.cache import cacheit
@@ -11,13 +11,11 @@
from sympy.core.expr import Expr
+# Key for sorting commutative args in canonical order
+_args_sortkey = cmp_to_key(Basic.compare)
def _addsort(args):
# in-place sorting of args
-
- # Currently we sort things using hashes, as it is quite fast. A better
- # solution is not to sort things at all - but this needs some more
- # fixing.
- args.sort(key=hash)
+ args.sort(key=_args_sortkey)
def _unevaluated_Add(*args):
View
2  sympy/core/numbers.py
@@ -1710,7 +1710,7 @@ def __le__(self, other):
return Rational.__le__(self, other)
def __hash__(self):
- return super(Integer, self).__hash__()
+ return hash(self.p)
def __index__(self):
return self.p
View
2  sympy/core/tests/test_expr.py
@@ -1173,7 +1173,7 @@ def test_as_powers_dict():
def test_as_coefficients_dict():
check = [S(1), x, y, x*y, 1]
assert [Add(3*x, 2*x, y, 3).as_coefficients_dict()[i] for i in check] == \
- [3, 5, 1, 0, 0]
+ [3, 5, 1, 0, 3]
assert [(3*x*y).as_coefficients_dict()[i] for i in check] == \
[0, 0, 0, 3, 0]
assert (3.0*x*y).as_coefficients_dict()[3.0*x*y] == 1
View
9 sympy/core/tests/test_numbers.py
@@ -21,19 +21,16 @@ def test_integers_cache():
assert python_int in _intcache
assert hash(python_int) not in _intcache
- assert sympy_int not in _intcache
sympy_int_int = Integer(sympy_int)
assert python_int in _intcache
assert hash(python_int) not in _intcache
- assert sympy_int_int not in _intcache
sympy_hash_int = Integer(hash(python_int))
assert python_int in _intcache
assert hash(python_int) in _intcache
- assert sympy_hash_int not in _intcache
def test_seterr():
@@ -1328,12 +1325,10 @@ def test_as_content_primitive():
assert S(3.1).as_content_primitive() == (1, 3.1)
-@XFAIL
def test_hashing_sympy_integers():
# Test for issue 5072
- # https://github.com/sympy/sympy/issues/5072
- assert hash(S(4)) == 4
- assert hash(S(4)) == hash(int(4))
+ assert set([Integer(3)]) == set([int(3)])
+ assert hash(Integer(4)) == hash(int(4))
def test_issue_4172():
Something went wrong with that request. Please try again.