Fix #5072: hash(Integer) should return the int #2973

Merged
merged 2 commits into from Apr 4, 2014

Conversation

Projects
None yet
3 participants
@skirpichev
Contributor

skirpichev commented Feb 27, 2014

fix #5072

@smichr, please take look on changes in add.py. _addsort and _mulsort were introduced by 8942676. Why cmp_to_key(Basic.compare) wasn't used as a key in both places?

@asmeurer

This comment has been minimized.

Show comment Hide comment
@asmeurer

asmeurer Feb 27, 2014

Member

So the issue noted on the issue page no longer appears?

Member

asmeurer commented Feb 27, 2014

So the issue noted on the issue page no longer appears?

@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Feb 28, 2014

Contributor

On Thu, Feb 27, 2014 at 03:35:29PM -0800, Aaron Meurer wrote:

So the issue noted on the issue page no longer appears?

Are you about
set([Integer(3)]) == set([int(3)])
?

This test was added to test_numbers.py

Contributor

skirpichev commented Feb 28, 2014

On Thu, Feb 27, 2014 at 03:35:29PM -0800, Aaron Meurer wrote:

So the issue noted on the issue page no longer appears?

Are you about
set([Integer(3)]) == set([int(3)])
?

This test was added to test_numbers.py

@asmeurer

This comment has been minimized.

Show comment Hide comment
@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Feb 28, 2014

Contributor

Yes. Test in test_nseries.py - works.

Contributor

skirpichev commented Feb 28, 2014

Yes. Test in test_nseries.py - works.

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Feb 28, 2014

Member

I wasn't involved with the decision to use hashes, I just tried to make the change in one place so that whereever it was being used we would be consistent. The comment says that uses hashes was fast, so I imagine that things will have slowed down now that you use Basic.compare.

Member

smichr commented Feb 28, 2014

I wasn't involved with the decision to use hashes, I just tried to make the change in one place so that whereever it was being used we would be consistent. The comment says that uses hashes was fast, so I imagine that things will have slowed down now that you use Basic.compare.

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Feb 28, 2014

Member

I suspect the better change would be to use hash(self.p) but to leave the sorting alone.

Member

smichr commented Feb 28, 2014

I suspect the better change would be to use hash(self.p) but to leave the sorting alone.

@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Feb 28, 2014

Contributor

The comment says that uses hashes was fast, so I imagine that things will have slowed down now that you use Basic.compare.

The problem was that using hash doesn't guarantee to produce some well-defined canonical order. In general there are hash collisions. That's why we have (this problem)[https://code.google.com/p/sympy/issues/detail?id=1973#c3] with hash, but not with Basic.compare.

I suspect the better change would be to use hash(self.p) but to leave the sorting alone.

That's impossible. The only question is: how to change sorting function.

Contributor

skirpichev commented Feb 28, 2014

The comment says that uses hashes was fast, so I imagine that things will have slowed down now that you use Basic.compare.

The problem was that using hash doesn't guarantee to produce some well-defined canonical order. In general there are hash collisions. That's why we have (this problem)[https://code.google.com/p/sympy/issues/detail?id=1973#c3] with hash, but not with Basic.compare.

I suspect the better change would be to use hash(self.p) but to leave the sorting alone.

That's impossible. The only question is: how to change sorting function.

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Feb 28, 2014

Member

Maybe use

def _addsort(args):
    # in-place sorting of args

    # Currently we sort things using hashes, as it is quite fast. A better
    # solution is not to sort things at all - but this needs some more
    # fixing.
    from sympy.core.compatibility import cmp_to_key, ordered
    hash=cmp_to_key(lambda x, y: 0 if x == y else (1 if tuple(ordered([x, y], warn=True))[0] == y else -1))
    args.sort(key=hash)

then you'll never worry about hash collisions again since it will tell you if there was a collision.

I made this change and ran the core suite and no tests failed (though wester was still running).

Member

smichr commented Feb 28, 2014

Maybe use

def _addsort(args):
    # in-place sorting of args

    # Currently we sort things using hashes, as it is quite fast. A better
    # solution is not to sort things at all - but this needs some more
    # fixing.
    from sympy.core.compatibility import cmp_to_key, ordered
    hash=cmp_to_key(lambda x, y: 0 if x == y else (1 if tuple(ordered([x, y], warn=True))[0] == y else -1))
    args.sort(key=hash)

then you'll never worry about hash collisions again since it will tell you if there was a collision.

I made this change and ran the core suite and no tests failed (though wester was still running).

@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Mar 1, 2014

Contributor

then you'll never worry about hash collisions again since it will tell you if there was a collision.

I don't sure I understand you here. Is this code working with hash collisions?

If so, it's faster then Basic.compare? Should we change both places (i.e., mul.py - too)?

BTW, can we use lambda x: x.sort_key() as a key? (This seems to be working on our test suite as well.)

Contributor

skirpichev commented Mar 1, 2014

then you'll never worry about hash collisions again since it will tell you if there was a collision.

I don't sure I understand you here. Is this code working with hash collisions?

If so, it's faster then Basic.compare? Should we change both places (i.e., mul.py - too)?

BTW, can we use lambda x: x.sort_key() as a key? (This seems to be working on our test suite as well.)

@asmeurer

This comment has been minimized.

Show comment Hide comment
@asmeurer

asmeurer Mar 1, 2014

Member

BTW, can we use lambda x: x.sort_key() as a key? (This seems to be working on our test suite as well.)

That's what default_sort_key was. Somehow it got more complicated, though.

Member

asmeurer commented Mar 1, 2014

BTW, can we use lambda x: x.sort_key() as a key? (This seems to be working on our test suite as well.)

That's what default_sort_key was. Somehow it got more complicated, though.

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Mar 2, 2014

Member

You can check the speed of ordered -- when things are more complicated than a singleton it will likely be faster since it sorts on the basis of the number of nodes, first. Then only for ties it applies the additional key. This is all discussed in its docstring, I believe. I think it would be fine to use for Mul, too. In either place, however, there may be changes to tests as a result of new orderings.

Member

smichr commented Mar 2, 2014

You can check the speed of ordered -- when things are more complicated than a singleton it will likely be faster since it sorts on the basis of the number of nodes, first. Then only for ties it applies the additional key. This is all discussed in its docstring, I believe. I think it would be fine to use for Mul, too. In either place, however, there may be changes to tests as a result of new orderings.

@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Mar 2, 2014

Contributor

@smichr, what do you think about using default_sort_key? (last commit)

btw, I can change mul.py as well, but there is subtle failures in sympy/solvers/ode.py docstrings with hint='linear_coefficients'. Could you take a look?

Contributor

skirpichev commented Mar 2, 2014

@smichr, what do you think about using default_sort_key? (last commit)

btw, I can change mul.py as well, but there is subtle failures in sympy/solvers/ode.py docstrings with hint='linear_coefficients'. Could you take a look?

@skirpichev skirpichev added the Core label Mar 3, 2014

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Mar 3, 2014

Member

It fails because it doesn't collect on the derivative as master does:

>>> var('x');f=Function('f')
x
>>>
>>>
>>> q=x*Derivative(f(x), x) - 6*x + f(x)*Derivative(f(x), x) + f(x)
>>> collect(q,f(x).diff(x))
x*Derivative(f(x), x) - 6*x + (Derivative(f(x), x) + 1)*f(x)

instead of (in master)

>>> collect(q,f(x).diff(x))
-6*x + (x + f(x))*Derivative(f(x), x) + f(x)

The fix is to use exact=True, I believe. It is used in other places where f or df are being used as the collecting terms. (That change should probably be made anyway.)

Member

smichr commented Mar 3, 2014

It fails because it doesn't collect on the derivative as master does:

>>> var('x');f=Function('f')
x
>>>
>>>
>>> q=x*Derivative(f(x), x) - 6*x + f(x)*Derivative(f(x), x) + f(x)
>>> collect(q,f(x).diff(x))
x*Derivative(f(x), x) - 6*x + (Derivative(f(x), x) + 1)*f(x)

instead of (in master)

>>> collect(q,f(x).diff(x))
-6*x + (x + f(x))*Derivative(f(x), x) + f(x)

The fix is to use exact=True, I believe. It is used in other places where f or df are being used as the collecting terms. (That change should probably be made anyway.)

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Mar 4, 2014

Member

I didn't set up a separate PR to this, but there are 3 commits in my ode branch that you might want to use. Modifying Dummy's sort_key is important (especially if ordered is going to be used). And an anecdotal observation about using default_sort_key vs ordered, the latter runs core tests in 306 seconds while the former took 338.

Member

smichr commented Mar 4, 2014

I didn't set up a separate PR to this, but there are 3 commits in my ode branch that you might want to use. Modifying Dummy's sort_key is important (especially if ordered is going to be used). And an anecdotal observation about using default_sort_key vs ordered, the latter runs core tests in 306 seconds while the former took 338.

@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Mar 4, 2014

Contributor

@smichr, thank you. Now, some problems (e.g., for ode docstrings) are solved in my 1973-integer-hash-DRAFT branch. Here is test run.

Contributor

skirpichev commented Mar 4, 2014

@smichr, thank you. Now, some problems (e.g., for ode docstrings) are solved in my 1973-integer-hash-DRAFT branch. Here is test run.

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Mar 4, 2014

Member

The first two expressions involving a and k are the same as shown by unrad; the last one involving theta simply has arguments (as expected) in a different order but is otherwise identical.

Member

smichr commented Mar 4, 2014

The first two expressions involving a and k are the same as shown by unrad; the last one involving theta simply has arguments (as expected) in a different order but is otherwise identical.

@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Mar 5, 2014

Contributor

Now, there should be only failures from meijerint module:
https://travis-ci.org/skirpichev/sympy/builds/20123489

Contributor

skirpichev commented Mar 5, 2014

Now, there should be only failures from meijerint module:
https://travis-ci.org/skirpichev/sympy/builds/20123489

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Mar 5, 2014

Member

Just a different order of args, I suppose, for the meijer failures?

Are you going to add the sort_key commit for Dummy?

Member

smichr commented Mar 5, 2014

Just a different order of args, I suppose, for the meijer failures?

Are you going to add the sort_key commit for Dummy?

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Mar 5, 2014

Member

nevermind about the Dummy sort key -- I opened another pull request

Member

smichr commented Mar 5, 2014

nevermind about the Dummy sort key -- I opened another pull request

@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Mar 5, 2014

Contributor

Just a different order of args, I suppose, for the meijer failures?

Apparently, yes. But I'm not sure where exactly it was broken.

Contributor

skirpichev commented Mar 5, 2014

Just a different order of args, I suppose, for the meijer failures?

Apparently, yes. But I'm not sure where exactly it was broken.

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Mar 6, 2014

Member

I assume it's just in the new sorting of Mul args. So just put the new expression in the test so it passes. (Or am I misunderstanding something?)

Member

smichr commented Mar 6, 2014

I assume it's just in the new sorting of Mul args. So just put the new expression in the test so it passes. (Or am I misunderstanding something?)

@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Mar 6, 2014

Contributor

On Wed, Mar 05, 2014 at 06:43:08PM -0800, Christopher Smith wrote:

I assume it's just in the new sorting of Mul args. So just put the new
expression in the test so it passes.

No. It's not so simple.

Contributor

skirpichev commented Mar 6, 2014

On Wed, Mar 05, 2014 at 06:43:08PM -0800, Christopher Smith wrote:

I assume it's just in the new sorting of Mul args. So just put the new
expression in the test so it passes.

No. It's not so simple.

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Mar 6, 2014

Member

If I replace default_sort_key with the ordered key I gave you the first two failures pass but the line 460 failure does not.

If I just change the Mul sorting back to the way it is in master then it works. If I change the Mul sorting to use hash then it passes but the test at line 481 in test_transforms fails.

Maybe somewhere an assumption is made that a certain argument always appears last but it no longer does. We already know that the non-commutative terms are always placed at the end and the Rational is always placed first. My guess is that somewhere an assumption about the last non-commutative term is being made which is no longer true.

Member

smichr commented Mar 6, 2014

If I replace default_sort_key with the ordered key I gave you the first two failures pass but the line 460 failure does not.

If I just change the Mul sorting back to the way it is in master then it works. If I change the Mul sorting to use hash then it passes but the test at line 481 in test_transforms fails.

Maybe somewhere an assumption is made that a certain argument always appears last but it no longer does. We already know that the non-commutative terms are always placed at the end and the Rational is always placed first. My guess is that somewhere an assumption about the last non-commutative term is being made which is no longer true.

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Mar 8, 2014

Member

The meijerint algorithm(s) don't try to obtain the optimal solution so it is expected that different results will be obtained. So I think it's ok to simply change the tests. (You changed a trig test to tan(x)**2/2 in your WIP-1973 branch and that's not correct.) Please see my 1973a branch for a build where I change how the mul args are handled; see 1973 for an almost working version that uses ordered (but there are some interesting test failures there, too, https://travis-ci.org/smichr/sympy/builds/20335796 ).

Member

smichr commented Mar 8, 2014

The meijerint algorithm(s) don't try to obtain the optimal solution so it is expected that different results will be obtained. So I think it's ok to simply change the tests. (You changed a trig test to tan(x)**2/2 in your WIP-1973 branch and that's not correct.) Please see my 1973a branch for a build where I change how the mul args are handled; see 1973 for an almost working version that uses ordered (but there are some interesting test failures there, too, https://travis-ci.org/smichr/sympy/builds/20335796 ).

@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Mar 8, 2014

Contributor

You changed a trig test to tan(x)**2/2 in your WIP-1973 branch and that's not correct.

It looks as a mathematically correct output (same value, up to constant term).

too, https://travis-ci.org/smichr/sympy/builds/20335796 ).

Try to rebase your branch.

Contributor

skirpichev commented Mar 8, 2014

You changed a trig test to tan(x)**2/2 in your WIP-1973 branch and that's not correct.

It looks as a mathematically correct output (same value, up to constant term).

too, https://travis-ci.org/smichr/sympy/builds/20335796 ).

Try to rebase your branch.

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Mar 9, 2014

Member

correct output

Hmm, so it is!

rebase your branch

I'm giving up on this. My 1973a branch has some docstring and minor code edits that you can feel free to take. (This is a rabbit chase that's taking me too far afield. As you said about the meijerint failures, "it's not so simple".)

Member

smichr commented Mar 9, 2014

correct output

Hmm, so it is!

rebase your branch

I'm giving up on this. My 1973a branch has some docstring and minor code edits that you can feel free to take. (This is a rabbit chase that's taking me too far afield. As you said about the meijerint failures, "it's not so simple".)

@skirpichev skirpichev changed the title from Fix issue 1973: hash(Integer) should return the int to Fix #5072: hash(Integer) should return the int Apr 1, 2014

@skirpichev

This comment has been minimized.

Show comment Hide comment
@skirpichev

skirpichev Apr 1, 2014

Contributor

Ok. Any objections against merge of these two commits? I have not observe any performance penalty yet.

Everything else is in my 1973-integer-hash-DRAFT branch.

Contributor

skirpichev commented Apr 1, 2014

Ok. Any objections against merge of these two commits? I have not observe any performance penalty yet.

Everything else is in my 1973-integer-hash-DRAFT branch.

@smichr

This comment has been minimized.

Show comment Hide comment
@smichr

smichr Apr 4, 2014

Member

Test pass. This looks like a good step forward. It's in.

Member

smichr commented Apr 4, 2014

Test pass. This looks like a good step forward. It's in.

smichr added a commit that referenced this pull request Apr 4, 2014

Merge pull request #2973 from skirpichev/1973-integer-hash
Fix #5072: hash(Integer) should return the int

@smichr smichr merged commit 1cb4fab into sympy:master Apr 4, 2014

1 check failed

default The Travis CI build could not complete due to an error
Details

@skirpichev skirpichev deleted the skirpichev:1973-integer-hash branch Apr 4, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment