Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/CLN: Fix DataType hashing #1172

Closed
wants to merge 1 commit into from

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Oct 16, 2017

Before this PR, every datatype's hash function was hash(type(self)), which
means collisions galore (triggering eq comparison when using complex types
as dict keys). This PR provides a hash function based on __slots__ giving
every type a unique hash based on type(self), its slots (if any) + the
nullable field from the DataType parent class.

@cpcloud cpcloud force-pushed the cleanup-hashing branch 6 times, most recently from d92802d to 89bcb86 Compare October 22, 2017 00:00
@cpcloud cpcloud self-assigned this Oct 22, 2017
@cpcloud cpcloud added this to the 0.12 milestone Oct 22, 2017
@cpcloud cpcloud added the feature Features or general enhancements label Oct 22, 2017
@cpcloud cpcloud force-pushed the cleanup-hashing branch 4 times, most recently from c9a67d6 to 45f7ec7 Compare October 23, 2017 17:42
@cpcloud
Copy link
Member Author

cpcloud commented Oct 24, 2017

@wesm can you review this when you get a chance?

@cpcloud cpcloud requested a review from wesm October 24, 2017 15:29
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. Thanks for doing this!

self._name = name
if not isinstance(schema, Schema):
raise TypeError(
'schema argument to HasSchem class must be a Schema instance'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -390,78 +415,79 @@ def valid_literal(self, value):

class Int8(Integer):

__slots__ = ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do __slots__ inherit in any reasonable way? (I assume not)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Stated here: https://docs.python.org/3.6/reference/datamodel.html#notes-on-using-slots

The action of a __slots__ declaration is limited to the class where it is defined. As a result, subclasses will have a __dict__ unless they also define __slots__ (which must only contain names of any additional slots).

@cpcloud
Copy link
Member Author

cpcloud commented Oct 24, 2017

Merging on green.

@cpcloud cpcloud force-pushed the cleanup-hashing branch 3 times, most recently from 92ca85b to daea4ca Compare October 25, 2017 01:01
@cpcloud cpcloud closed this in bb9ef9e Oct 28, 2017
@cpcloud cpcloud deleted the cleanup-hashing branch October 28, 2017 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants