-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document best-practices for writing hash methods #5805
Conversation
Discussion is as per https://bugs.ruby-lang.org/issues/18611. Co-authored-by: Sam Bostock <sam.bostock@shopify.com>
81a0f55
to
c15b512
Compare
* For example: | ||
* | ||
* def hash | ||
* [self.class, a, b, c].hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned on this issue, I don't think we should include the class here, and doing so is measurably slower.
In the vast majority of cases (where #hash
is defined on a leaf class) there is no need, and the rest of cases (#hash
defined in class which has subclasses) one should think what makes sense based on what they use those subclasses for.
I'd be fine to mention something like (and removing other references to the class):
Consider including
self.class
in the values if the class may have subclasses and subclasses instances with the same values should not be considered equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why it's about subclasses?
As a use-case, imagine a compiler IR where you have Add
and Sub
nodes, both of which refer to the same inputs, but have different operations. If they both have x
and y
as the hash values they'll collide in for example a GVN hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why it's about subclasses?
Because I'm thinking both eql?
and hash
, either:
- care about the exact class (e.g.,
self.class == other.class
ineql?
) - don't care about the exact class (e.g.,
other.is_a?(SomeClass)
ineql?
)
(otherwise one of the two methods is broken)
You are right in your example that including the class makes sense and that avoids extra hash collisions.
I forgot about the case of 2 unrelated classes, with different hash methods but hash values accidentally collide, which semantically doesn't matter if eql?
checks it, but performance-wise it can matter, that's a good point.
Another example is e.g. Complex and Rational, both might compute the hash as [value1, value2].hash
but could cause unfortunate collisions in a Hash which can have both as keys.
As a counter-example, imagine a slight variation, where there is a Add
node and a AddOne
node (AddOne < Add
), and I want Add(x, 1).eql?(AddOne(x)) => true
, then it's the second case above.
Based on that I agree it seems best in general to include the class in the hash and recommend it here. For the Add/AddOne case above it should be clear enough that [Add, x, y].hash
+ is_a?(Add)
should be used for that special case (and that's actually better than [x, y].hash
+ is_a?(Add)
if there can be non-Add keys in the same Hash).
hash.c
Outdated
@@ -303,6 +303,18 @@ objid_hash(VALUE obj) | |||
* | |||
* Certain core classes such as Integer use built-in hash calculations and | |||
* do not call the #hash method when used as a hash key. | |||
* | |||
* When implementing your own #hash based on multiple values, the best | |||
* practice is to combine values using the hash code of an array: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the text also mention the class (to be consistent with the code)? (now I agree it's good to include it for most cases)
Might also want to have a note somewhere explaining why having the class there is useful, I think it's non-trivial for most people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I was wondering if it might make sense to mention including the same attributes in hashing and equality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to merge this PR if my comment just above is addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I don't know what's going on with CI though - is Compilations / ${{ matrix.entry.name }} (pull_request)
supposed to be a job name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's just because it's not expanded due to being skipped as this is a doc-only change.
Not sure how it finds out this is a doc-only change though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
ruby/.github/workflows/compilers.yml
Lines 5 to 13 in 3a8d60f
paths-ignore: | |
- 'doc/**' | |
- '**.md' | |
- '**.rdoc' | |
pull_request: | |
paths-ignore: | |
- 'doc/**' | |
- '**.md' | |
- '**.rdoc' |
* Discussion is as per https://bugs.ruby-lang.org/issues/18611. Co-authored-by: Sam Bostock <sam.bostock@shopify.com>
* Discussion is as per https://bugs.ruby-lang.org/issues/18611. Co-authored-by: Sam Bostock <sam.bostock@shopify.com>
Discussion is as per https://bugs.ruby-lang.org/issues/18611.
@eregon @byroot (the two people who commented on the issue) please review.
CC @sambostock.