Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document best-practices for writing hash methods #5805

Merged
merged 3 commits into from
Apr 30, 2022

Conversation

chrisseaton
Copy link
Contributor

Discussion is as per https://bugs.ruby-lang.org/issues/18611.

@eregon @byroot (the two people who commented on the issue) please review.

CC @sambostock.

Discussion is as per https://bugs.ruby-lang.org/issues/18611.

Co-authored-by: Sam Bostock <sam.bostock@shopify.com>
@nobu nobu added the Documentation Improvements to documentation. label Apr 15, 2022
hash.c Outdated Show resolved Hide resolved
* For example:
*
* def hash
* [self.class, a, b, c].hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on this issue, I don't think we should include the class here, and doing so is measurably slower.
In the vast majority of cases (where #hash is defined on a leaf class) there is no need, and the rest of cases (#hash defined in class which has subclasses) one should think what makes sense based on what they use those subclasses for.

I'd be fine to mention something like (and removing other references to the class):

Consider including self.class in the values if the class may have subclasses and subclasses instances with the same values should not be considered equal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why it's about subclasses?

As a use-case, imagine a compiler IR where you have Add and Sub nodes, both of which refer to the same inputs, but have different operations. If they both have x and y as the hash values they'll collide in for example a GVN hash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why it's about subclasses?

Because I'm thinking both eql? and hash, either:

  • care about the exact class (e.g., self.class == other.class in eql?)
  • don't care about the exact class (e.g., other.is_a?(SomeClass) in eql?)
    (otherwise one of the two methods is broken)

You are right in your example that including the class makes sense and that avoids extra hash collisions.
I forgot about the case of 2 unrelated classes, with different hash methods but hash values accidentally collide, which semantically doesn't matter if eql? checks it, but performance-wise it can matter, that's a good point.
Another example is e.g. Complex and Rational, both might compute the hash as [value1, value2].hash but could cause unfortunate collisions in a Hash which can have both as keys.

As a counter-example, imagine a slight variation, where there is a Add node and a AddOne node (AddOne < Add), and I want Add(x, 1).eql?(AddOne(x)) => true, then it's the second case above.

Based on that I agree it seems best in general to include the class in the hash and recommend it here. For the Add/AddOne case above it should be clear enough that [Add, x, y].hash + is_a?(Add) should be used for that special case (and that's actually better than [x, y].hash + is_a?(Add) if there can be non-Add keys in the same Hash).

hash.c Outdated
@@ -303,6 +303,18 @@ objid_hash(VALUE obj)
*
* Certain core classes such as Integer use built-in hash calculations and
* do not call the #hash method when used as a hash key.
*
* When implementing your own #hash based on multiple values, the best
* practice is to combine values using the hash code of an array:
Copy link
Member

@eregon eregon Apr 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the text also mention the class (to be consistent with the code)? (now I agree it's good to include it for most cases)
Might also want to have a note somewhere explaining why having the class there is useful, I think it's non-trivial for most people.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I was wondering if it might make sense to mention including the same attributes in hashing and equality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to merge this PR if my comment just above is addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I don't know what's going on with CI though - is Compilations / ${{ matrix.entry.name }} (pull_request) supposed to be a job name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's just because it's not expanded due to being skipped as this is a doc-only change.
Not sure how it finds out this is a doc-only change though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

paths-ignore:
- 'doc/**'
- '**.md'
- '**.rdoc'
pull_request:
paths-ignore:
- 'doc/**'
- '**.md'
- '**.rdoc'
controls skipping for that particular job

@eregon eregon merged commit 3a8d60f into ruby:master Apr 30, 2022
schneems pushed a commit to schneems/ruby that referenced this pull request May 23, 2022
* Discussion is as per https://bugs.ruby-lang.org/issues/18611.

Co-authored-by: Sam Bostock <sam.bostock@shopify.com>
@chrisseaton chrisseaton deleted the hash-best-practice-docs branch July 11, 2022 14:27
schneems pushed a commit to schneems/ruby that referenced this pull request Jul 26, 2022
* Discussion is as per https://bugs.ruby-lang.org/issues/18611.

Co-authored-by: Sam Bostock <sam.bostock@shopify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Improvements to documentation.
5 participants