Permalink
Browse files

`ActiveRecord::Base#hash` should differ between classes

Prior to this change, we would get collisions if Active Record objects
of different classes with the same ID were used as keys of the same
hash. It bothers me slightly that we have to allocate inside of this
method, but Ruby doesn't provide any way to hash multiple values without
allocation
  • Loading branch information...
sgrif committed May 31, 2016
1 parent 81251c6 commit c8be4574a2a35c896560ff58b26111ad6dd9d60f
Showing with 5 additions and 1 deletion.
  1. +1 −1 activerecord/lib/active_record/core.rb
  2. +4 −0 activerecord/test/cases/base_test.rb
@@ -432,7 +432,7 @@ def ==(comparison_object)
# [ Person.find(1), Person.find(2), Person.find(3) ] & [ Person.find(1), Person.find(4) ] # => [ Person.find(1) ]
def hash
if id
id.hash
[self.class, id].hash
else
super
end
@@ -1504,6 +1504,10 @@ def test_default_values_are_deeply_dupped
assert_not_equal Post.new.hash, Post.new.hash
end
test "records of different classes have different hashes" do
assert_not_equal Post.new(id: 1).hash, Developer.new(id: 1).hash
end
test "resetting column information doesn't remove attribute methods" do
topic = topics(:first)

5 comments on commit c8be457

@eiked

This comment has been minimized.

eiked replied Jun 9, 2016

I'd suggest to use (self.hash ^ id.hash) to avoid the allocation.

@pan

This comment has been minimized.

Contributor

pan replied Jun 14, 2016

XOR introduces a risk of collisions here, e.g. 0b1010 ^ 0b0101 == 0b1110 ^ 0b0001

@jonathanhefner

This comment has been minimized.

jonathanhefner replied Jun 15, 2016

Collisions will happen anyway as there are a finite number of buckets in Hash. (i.e. somewhere inside Hash there is something like bucked_id = key_hash % bucket_count)


EDIT (to avoid further replies, as requested):

Thinking further about the math, I don't think XORing hashes will increase chance of collision. Given four booleans (b1, b2, b3, b4), the probability of (b1 ^ b2) == (b3 ^ b4) is 1/2 (easy to brute-force check this). Given four bit-strings (s1, s2, s3, s4) of length n, the expression (s1 ^ s2) == (s3 ^ s4) is equivalent to (s1[0] ^ s2[0]) == (s3[0] ^ s4[0]) && ... && (s1[n] ^ s2[n]) == (s3[n] ^ s4[n]). The probability of which is (1/2 ** n). This is the same probability as s1 == s3, i.e. a no-XOR n-bit hash collision. Perhaps someone can verify this logic.

@sgrif

This comment has been minimized.

Member

sgrif replied Jun 15, 2016

@eiked

This comment has been minimized.

eiked replied Jun 21, 2016

Hi sgrif and all.

I believe that xor on two (good) hash values does provide another good hash value.
To improve on that, we should xor a (constant but random chosen) prime number.

As we all know, having a good hash value for every object is very important.
(This is why sgrif started this change in the first place)

and as jonathan pointed out: xor should just be fine.
(aka random^random = random!)

So actually my suggestion is to xor a unique prime as a third factor.
That's like a salt in the hash
It's actually like some unique type information embedded in that hash

Please sign in to comment.