Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix hash of (dense) matrices #21316

Closed
dkrenn opened this issue Aug 23, 2016 · 8 comments
Closed

fix hash of (dense) matrices #21316

dkrenn opened this issue Aug 23, 2016 · 8 comments

Comments

@dkrenn
Copy link
Contributor

dkrenn commented Aug 23, 2016

It took me only 4 guesses to find two matrices with the same hash:

sage: N = Matrix([[1,1],[2,2]]); N.set_immutable(); hash(N)
3
sage: N = Matrix([[0,0],[0,1]]); N.set_immutable(); hash(N)
3

This should not be that easy.

In particular (looking at the code in sage.matrix.matric_dense), the hash value does not depend on entry 1,1 of the matrix (multiplication by zero in the hash code):

sage: N = Matrix([[3,2],[0,0]]); N.set_immutable(); hash(N)
2
sage: N = Matrix([[443,2],[0,0]]); N.set_immutable(); hash(N)
2
sage: N = Matrix([[2,2],[0,0]]); N.set_immutable(); hash(N)
2

Component: misc

Work Issues: adapt doctests

Branch/Commit: u/dkrenn/hash-matrix @ 1e1263f

Issue created by migration from https://trac.sagemath.org/ticket/21316

@dkrenn dkrenn added this to the sage-7.4 milestone Aug 23, 2016
@dkrenn
Copy link
Contributor Author

dkrenn commented Aug 23, 2016

comment:1

From sage.matrix.matrix_dense:

        v = self._list()
        cdef Py_ssize_t i
        cdef long h = 0

        for i from 0 <= i < len(v):
            h = h ^ (i * hash(v[i]))

What is a good caching strategy? Simply h = hash(tuple(self._list()))?

@sagetrac-ibookstein
Copy link
Mannequin

sagetrac-ibookstein mannequin commented Jul 29, 2017

comment:2

Non-cryptographic hashes don't have to be resilient to collision attacks, but should indeed give good hash value spread for the input space - which this implementation doesn't.

I can confirm that this implementation causes massive performance degradation when matrices are used as dictionary keys, for instance.

See tuple hash algorithm and explanations why XORing hash values doesn't perform very well.

While the current implementation does try to 'save' itself from the disadvantages of XORing hash values by multiplying by the index, it still performs very poorly.

I tried the solution dkrenn suggested and it does indeed work much better.
If we don't care about the performance cost of creating a tuple (and a list), seeing as the hash value is cached, it's a nice solution.

@sagetrac-ibookstein sagetrac-ibookstein mannequin modified the milestones: sage-7.4, sage-8.1 Jul 29, 2017
@dkrenn
Copy link
Contributor Author

dkrenn commented Jul 30, 2017

Branch: u/dkrenn/hash-matrix

@dkrenn
Copy link
Contributor Author

dkrenn commented Jul 30, 2017

Commit: 1e1263f

@dkrenn
Copy link
Contributor Author

dkrenn commented Jul 30, 2017

Work Issues: adapt doctests

@dkrenn
Copy link
Contributor Author

dkrenn commented Jul 30, 2017

New commits:

1e1263fnew hash function for matrix_dense

@sagetrac-ibookstein
Copy link
Mannequin

sagetrac-ibookstein mannequin commented Nov 22, 2017

comment:6

Looks like major work was put into this issue in #19050, rendering this ticket a duplicate.

@jdemeyer
Copy link

comment:7

Duplicate of #10950.

@jdemeyer jdemeyer removed this from the sage-8.1 milestone Nov 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants