Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better hash for Element #19016

Closed
nathanncohen mannequin opened this issue Aug 12, 2015 · 157 comments
Closed

Better hash for Element #19016

nathanncohen mannequin opened this issue Aug 12, 2015 · 157 comments

Comments

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Aug 12, 2015

As reported on sage-devel [1], Element implements its hash based on its string representation. This causes a lot of troubles

  • it breaks the equality => same hash assumption for finitely presented groups
    sage: G = groups.presentation.Cyclic(4)
    sage: G.cayley_graph().vertices()
    [1, a, a^2, a^-2, a^3, a^-1]

and symbolic expressions

sage: f=sin(x)^2
sage: g=1-cos(x)^2
sage: bool(f == g)
True
sage: hash(f) == hash(g)
False

and possibly many others

sage: from sage.structure.element import Element
sage: class E(Element):
....:     def __init__(self):
....:         Element.__init__(self, Parent())
sage: e = E()
sage: hash(e)
-4965357552728411610
sage: e.rename('hey')
sage: hash(e)
-6429308858210906323

and similarly, hashing should not be available on any mutable object.

There are several possibilities that are currently being discussed:

[1] https://groups.google.com/d/topic/sage-devel/6rXKkF87Gtc/discussion

See also: #19302, #19321, #19331

Component: misc

Author: Nils Bruin, Vincent Delecroix

Branch: c7b4f0e

Reviewer: Volker Braun

Issue created by migration from https://trac.sagemath.org/ticket/19016

@nathanncohen nathanncohen mannequin added this to the sage-6.9 milestone Aug 12, 2015
@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Aug 12, 2015

Branch: u/ncohen/19016

@nathanncohen

This comment has been minimized.

@nathanncohen nathanncohen mannequin added the s: needs review label Aug 12, 2015
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 12, 2015

Commit: 754dc57

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 12, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

754dc57trac 19016: A more naive sage.structure.element.__hash__

@videlec
Copy link
Contributor

videlec commented Aug 12, 2015

comment:3

Hi Nathann,

What about #18246? Just returning 0 will create many collisions that will be hard to track. I think that raising an error is more appropriate because you know what is to be fixed.

Vincent

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Aug 12, 2015

comment:4

If you see a way to turn #18246 into a needs_review ticket soon, then I do not mind. But I don't want to fight against 800 screaming guys again, and I want this to be solved.

What about a middle ground: raise a warning, which would appear only once, whenever this 0 hash function is used?

This is a very bad bug and it needs to be fixed. Creating tickets that we forget is nto much more responsible than doing nothing.

Nathann

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Aug 12, 2015

comment:5

By the way, it is not even the same hash function. Here it is Element.__hash__ and it is SageObject.__hash__ in #18246. Both need to be solved of course.

@videlec
Copy link
Contributor

videlec commented Aug 12, 2015

comment:6

Replying to @nathanncohen:

If you see a way to turn #18246 into a needs_review ticket soon, then I do not mind. But I don't want to fight against 800 screaming guys again, and I want this to be solved.

Putting in needs_review just need some efforts. I can dig into it if you are up for a review.

What about a middle ground: raise a warning, which would appear only once, whenever this 0 hash function is used?

It is already a bit better.

This is a very bad bug and it needs to be fixed.

It is.

Creating tickets that we forget is nto much more responsible than doing nothing.

I did not forget as you see ;-)

With your implementation, I am pretty sure that the running time of the tests will be much slower. Sage very often put elements in dictionaries or sets. Your solution does not help users to see that __hash__ is important, because you will not run into bugs, just into very slow code. I agree that you fix the error but you open a much more annoying feature.

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Aug 12, 2015

comment:7

Putting in needs_review just need some efforts. I can dig into it if you are up for a review.

Remember that #18246 is not even about the hash function that I change here.

With your implementation, I am pretty sure that the running time of the tests will be much slower. Sage very often put elements in dictionaries or sets. Your solution does not help users to see that __hash__ is important, because you will not run into bugs, just into very slow code. I agree that you fix the error but you open a much more annoying feature.

If you think that we can fix the bug quickly by removing the __hash__ function from Element, then let us do that. If the result is that my ticket will be put "on hold" waiting for something that never happens (like it happened 4 months ago on #18246), then this ticket is better than an imaginary one.

Concerns about efficiency do not weight much compares to wrong results.

Nathann

@nbruin
Copy link
Contributor

nbruin commented Aug 12, 2015

comment:8

-1 to a constant hash. That leads to such bad performance that it amounts to "wrong results".

+1 to removing silly default hashes. Hashing is simply not something that can be done on a very general level.

For your example, unless you come up with a canonical form for group elements that you can take the hash of, you really shouldn't have a hash function, by the way. These elements should be unhashable, not hashable with a constant hash value.

@videlec
Copy link
Contributor

videlec commented Aug 12, 2015

comment:9

#18246 needs review (it removes the __hash__ from SageObject).

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Aug 12, 2015

comment:10

-1 to a constant hash. That leads to such bad performance that it amounts to "wrong results".

+1 to removing silly default hashes. Hashing is simply not something that can be done on a very general level.

What is your position for "constant hash" Vs "we change nothing and the bug stays"? How many classes do you think would be broken (and would have to be fixed) if this hash is removed? Are you willing to help?

Nathann

@nbruin
Copy link
Contributor

nbruin commented Aug 12, 2015

comment:11

Replying to @nathanncohen:

What is your position for "constant hash" Vs "we change nothing and the bug stays"?

"Change nothing". I think it's unfortunate that hashing by repr will usually work quite well in cases where a hash is possible: a hash usually requires a canonical form to compute the hash from and that canonical form will usually be used for printing, because that's nice. So in practice, the default hash (although much too slow to be a reasonable hash) will probably work when it should.

In cases where the current default hash doesn't work, I think that having it as it is now will lead to better detection than when we change it to a default hash of 0. When people find this in particular cases, the solution is clear: actively disable the hash.

Having to disable something is of course the wrong way around: the default hash shouldn't be there in the first place. But putting a constant 0 hash there instead doesn't make transitioning to the desired situation any easier.

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Aug 12, 2015

comment:12

What is your position for "constant hash" Vs "we change nothing and the bug stays"?

"Change nothing".

I love those guys. "You fix a bug, but because it is not done the way I like I will block your tickets, and the bug will stay unsolved. And no, I will not help do it the way I want to see it done. You have to do it by yourself, while I watch you sweat".

Free software.

See, I wouldn't feel well with myself if after something like that I got to work and started implementing what you ask me to. So the bug will stay, thanks to you. Today, you contributed to the common effort for a better mathematical software.

Nathann

@nathanncohen nathanncohen mannequin removed this from the sage-6.9 milestone Aug 12, 2015
@videlec
Copy link
Contributor

videlec commented Aug 12, 2015

comment:13

As I said in [comment:9] I did remove the stupid __hash__ from SageObject (this is #18246). Not setting it to 0 but making it raise a TypeError. It was not a tremendous effort even if I had to dig in things that I do not understand.

I guess that the same strategy would also be good for Element and I agree with Nils on this. The warning option proposed in [comment:4] by Nathann looks also good to me because each time you program something you will notice that you are doing something wrong. Could we try to found a solution? This is annoying and I am willing to help to fix it.

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Aug 12, 2015

comment:14

Could we try to found a solution? This is annoying and I am willing to help to fix it.

I do not intend to touch this ticket again. I'm starting a new trend: when people tell me that my idea is bad, I stop fighting and leave (see #18904, though it's not the first). So that if I ever find something more interesting to do during my holidays than contribute to Sage, I won't hesitate to delete my account and move on to other things.

Nathann

@vbraun
Copy link
Member

vbraun commented Aug 12, 2015

comment:15

The default __hash__ could be moved from Element to a CanonicalRepresentation (or so) category. After injecting that into a few basic categories there are probably not that many parents where it has to be added by hand... thoughts?

@videlec
Copy link
Contributor

videlec commented Aug 12, 2015

comment:16

Replying to @vbraun:

The default __hash__ could be moved from Element to a CanonicalRepresentation (or so) category. After injecting that into a few basic categories there are probably not that many parents where it has to be added by hand... thoughts?

That might be an easy move since we would just have to change the category to something like PreviousCategory().WithCanonicalRepresentation(). For me this has to be done on a case by case basis for each Parent. The advantage would be the simplicity. But I found that this would mostly move the problem. But it would be better, because explicit in the parent implementation.

But in the long run, do we really want to use only once the string representation for the hash? Is that how we want the user to implement their own Parent/Element? I like Nathann's idea of having a warning if the default hash is called (and the default could be either hash(repr(P)) or 0 I do not really care).

@vbraun
Copy link
Member

vbraun commented Aug 12, 2015

comment:18

Often you can have normal forms on general grounds, so IMHO would be reasonable to just make CanonicalRepresentation a part of vector spaces and such. Thats why I said we shouldn't have to do it for every parent by hand. For most parents the hash(repr(element)) is a good enough hash function. Sure you could make it faster by an overall constant factor, but its still gives you O(1) associative containers.

@videlec
Copy link
Contributor

videlec commented Oct 27, 2015

comment:113

Hi Volker,

Thanks for having a look. On which Sage version are you testing!? If this is not merged early in a beta we have few chances to let it pass! The doctest you mentioned are introduced by a ticket on asymptotic expansions not previously merged in sage.6.10.beta1. I am willing to merge the associated branch in this ticket but you must ensure me that I will not have to do it thousands time.

Vincent

@vbraun
Copy link
Member

vbraun commented Oct 27, 2015

comment:114

How about you try to reproduce it with said ticket, if you can get a new branch by tomorrow I can merge it.

Can you also fix the random failure at #19488

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 27, 2015

Changed commit from c1cf361 to 81012bc

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 27, 2015

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

2f434d7write doc and many doctests for substitute
875542dwrite AsymptoticExpansion.symbolic_expression
5ac0feaextend SR._element_constructor_ to accept asymptotic expansions
fa814b0Trac #19431: merge 6.10.beta0
6a6efc4introduce parameter R in .symbolic_expression
f39b942simplify SR._element_constructor
5c3cba3fix checks whether parent is SR to check if instance of SymbolicRing
ee52932fixup and doctest of parameter R in .symbolic_expression
04c79adTrac 19016: merge #19436
81012bcTrac 19016: add hash value for asymptotic ring elt

@nbruin
Copy link
Contributor

nbruin commented Oct 27, 2015

comment:116

Replying to @vbraun:

How about you try to reproduce it with said ticket, if you can get a new branch by tomorrow I can merge it.

Can you also fix the random failure at #19488

The doctest gets introduced on this ticket and it's a bad one: it's just comparing the parents, so the result is fundamentally ill-defined. Just delete the test (or check that it's not an error if you care about that behaviour. I'd say cmp(b,1) should yield an error).

@videlec
Copy link
Contributor

videlec commented Oct 27, 2015

comment:117

Replying to @nbruin:

Replying to @vbraun:

How about you try to reproduce it with said ticket, if you can get a new branch by tomorrow I can merge it.

Can you also fix the random failure at #19488

The doctest gets introduced on this ticket and it's a bad one: it's just comparing the parents, so the result is fundamentally ill-defined. Just delete the test (or check that it's not an error if you care about that behaviour. I'd say cmp(b,1) should yield an error).

I agree with Nils that an error would be more appropriate. Though, for the sake of that ticket I would be inclined to follow the default comparison code from Element (in sage.structure.element) that is similar to Python one and compare by id. It is also random though.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 27, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

c7b4f0eTrac 19016: more robust doctest (see #19488)

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 27, 2015

Changed commit from 81012bc to c7b4f0e

@videlec
Copy link
Contributor

videlec commented Oct 28, 2015

comment:119

Actually, for the sake of this ticket I think that we should not change anything to the previous behavior of cmp... simple fix in my last commit.


New commits:

c7b4f0eTrac 19016: more robust doctest (see #19488)

@vbraun
Copy link
Member

vbraun commented Oct 28, 2015

Changed branch from public/19016-bis to c7b4f0e

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Oct 28, 2015

Changed commit from c7b4f0e to none

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Oct 28, 2015

comment:121

(applause for what was done here)

@videlec
Copy link
Contributor

videlec commented Oct 28, 2015

comment:122

Thanks Volker!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants