BUG: Fix void_scalar hash function to use the elements of the record in ... #482

Merged
merged 4 commits into from Oct 22, 2012

Conversation

Projects
None yet
2 participants
Owner

teoliphant commented Oct 10, 2012

This PR improves the hash function calculation of VOID_SCALAR array scalars which are the elements returned from fully structured-array indexing. Prior to this PR, the hash of void scalars just used the pointer to the data which was not very helpful as this email demonstrates:

http://mail.scipy.org/pipermail/numpy-discussion/2011-August/058258.html

With this pull request, the hash function is computed using the same algorithm as Python's tuple's with the elements obtained by extracting objects from the array scalar. The field names are not used in determining the hash (but they are used in determining scalar equality).

There should be fewer surprises when using numpy void objects in Python dictionaries with this patch.

Owner

njsmith commented Oct 11, 2012

Looks good to me. Test failure is spurious.

Owner

njsmith commented Oct 11, 2012

Wait a second, I changed my mind. I forgot that void objects are mutable. But, they define a value-wise ==. According to Python semantics, the only legal option for such objects is to raise an error if you attempt to hash them. This is how lots of builtin Python objects work, like dicts and lists:

In [21]: d = {}
In [22]: d[{}] = 1
TypeError: unhashable type: 'dict'

We should be just leaving the __hash__/tp_hash field empty, which will trigger the 'unhashable' error.

Owner

teoliphant commented Oct 11, 2012

Yeah, but they are already hashable in a broken way and so this patch is to make them less broken.

Let's just document that you should not use numpy void scalars as keys if you are going to change the underlying data (which is more rare than the read-only case).

Those who are willing to manage it can. Those who prefer to prevent all possible dangers can copy the data into a tuple first.

We could also add an immutable flag to the numpy void and only hash read-only records.

Travis Oliphant
(on a mobile)
512-826-7480

On Oct 11, 2012, at 11:57 AM, njsmith notifications@github.com wrote:

Wait a second, I changed my mind. I forgot that void objects are mutable. But, they define a value-wise ==. According to Python semantics, the only legal option for such objects is to raise an error if you attempt to hash them. This is how lots of builtin Python objects work, like dicts and lists:

In [21]: d = {}
In [22]: d[{}] = 1
TypeError: unhashable type: 'dict'
We should be just leaving the hash/tp_hash field empty, which will trigger the 'unhashable' error.


Reply to this email directly or view it on GitHub.

Owner

njsmith commented Oct 12, 2012

It does not really make it less broken. There are two guarantees that dicts make: (1) if you do d[key1] = value, and then d[key2] retrieves that same value, then key1 == key2. (2) if key1 == key2, then you can do d[key1] = value and then d[key2] will retrieve that value. Right now we guarantee (1), but not (2). Your patch makes it so that we guarantee (2), but not (1). So for instance, this code works right now:

In [36]: vscalar = np.array([(1, 2)], dtype="i2,i2")[0]

In [37]: d = {}

In [38]: d[vscalar] = 10

In [39]: vscalar[0] = 100

In [40]: d[vscalar]
Out[40]: 10

But with the proposed change change, the modification in line 39 will make it so that the dict entry is completely irretrievable. (In fact d.keys()[0] in d will return False!)

Also, we already have a read-only flag on void scalars....

In [22]: a = np.array([(1, 2)], dtype="i2,i2")

In [23]: a.flags.writeable = False

In [24]: a[0].flags.writeable
Out[24]: False

In [25]: a[0][0] = 10
RuntimeError: Can't write to memory

So I really think the end goal should be that (1) hash on read-write scalars raises an error, (2) hash on read-only scalars either (a) raises an error or (b) does something sensible. And since we have a patch here implementing "something sensible", and at least one user who seems to care, we might as well go with door (b).

If we agree on that then the next question is how to get there -- whether we can make the change immediately or should go through a deprecation period, and what that deprecation period should look like.

Owner

teoliphant commented Oct 13, 2012

I've modified the patch to raise an error on read-write scalars and perform the hash on read-only scalars.

Owner

njsmith commented Oct 14, 2012

The test failures are spurious.

So to be clear, you think probably not enough people are putting void scalars in dicts for us to worry about breaking compatibility here?

Owner

teoliphant commented Oct 17, 2012

I will raise the point on the NumPy discussion list. I'm not entirely clear on what will break here. If someone is currently using a mutable void scalar as a key in a dict, then that will no longer work. Is that the only breakage you see?

My perspective is that there might be 1 or 2 people doing this and they would actually welcome the change.

teoliphant added a commit that referenced this pull request Oct 22, 2012

Merge pull request #482 from ContinuumIO/void_scalar_hash
BUG: Fix void_scalar hash function to use the elements of the record in ...

@teoliphant teoliphant merged commit c5ccca9 into numpy:master Oct 22, 2012

1 check failed

default The Travis build failed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment