Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive hashing corrupts shared hash buffer #7866

Closed
notEthan opened this issue Jul 21, 2023 · 7 comments · Fixed by #7901
Closed

Recursive hashing corrupts shared hash buffer #7866

notEthan opened this issue Jul 21, 2023 · 7 comments · Fixed by #7901
Milestone

Comments

@notEthan
Copy link

I started memoizing the #hash method of a class (Ptr, below), which seems to have broken comparison of a Set containing instances of a class (Node) that has an ivar containing a Ptr, created in Node's initializer.

require 'set'

class Ptr
  def ==(other)
    other.is_a?(Ptr)
  end

  alias_method :eql?, :==

  def hash
    h = @hash ||= {x: 0}.hash
    puts "Ptr#hash:  #{h}"
    h
  end
end

class Node
  def initialize
    @ptr = Ptr.new
    @node_set = Set[self]
  end

  attr_reader :node_set

  def ==(other)
    other.is_a?(Node)
  end

  alias_method :eql?, :==

  def hash
    h = {ptr: @ptr}.hash
    puts "Node#hash: #{h}"
    h
  end
end

node = Node.new
post_node_set = Set[node]

puts
puts "post_node_set == node.node_set:"
puts  post_node_set == node.node_set
puts
puts "post_node_set.map(&:hash) == node.node_set.map(&:hash):"
puts  post_node_set.map(&:hash) == node.node_set.map(&:hash)
puts
puts "node.node_set == post_node_set:"
puts  node.node_set == post_node_set

Expected: all == comparisons result in true
Actual: first comparison is false (and the rest are true)

Some confluence of factors is at play here, I've simplified this down a lot, but simplifying further seems to no longer trigger the issue. Doing less in Node#initialize, or if Ptr#hash or Node#hash are hashing something simpler than Hash instances, result in expected behavior.

Output of the above, with notes:

Ptr#hash:  6413925280117835575
Node#hash: 8466601984141238220 ⬅️ from inside Node#initialize
Ptr#hash:  6413925280117835575
Node#hash: 3645463223149969145 ⬅️ after Node#initialize, should be same as from inside Node#initialize

post_node_set == node.node_set:
Ptr#hash:  6413925280117835575
Node#hash: 3645463223149969145
false                          ⬅️ expected: true

post_node_set.map(&:hash) == node.node_set.map(&:hash):
Ptr#hash:  6413925280117835575
Node#hash: 3645463223149969145
Ptr#hash:  6413925280117835575
Node#hash: 3645463223149969145
true

node.node_set == post_node_set:
Ptr#hash:  6413925280117835575
Node#hash: 3645463223149969145
true

Environment Information

@headius
Copy link
Member

headius commented Aug 16, 2023

post_node_set == node.node_set

So this is the only one that fails, yes? I have not found a cause yet but it seems like the problem must revolve around Node not hashing the same as you expect. I believe Set is working properly but some interaction with your overridden methods might be confusing the underlying Hash.

@headius
Copy link
Member

headius commented Aug 16, 2023

Removing the memoization or calling @ptr.hash in Node#initialize both make it work as expected. Problem seems to be something to do with the memoization.

@headius
Copy link
Member

headius commented Aug 16, 2023

Here's the smallest repro I have that still shows the issue:

Edit: I pasted the wrong source at first.

require 'set'

class Ptr
  def hash
    @hash ||= {x:1}.hash
  end
end

class Node
  def initialize
    @ptr = Ptr.new
  end

  def hash
    {ptr: @ptr}.hash
  end
end

node = Node.new
set1 = Set[node]
set2 = Set[node]
puts set1 == set2
puts set2 == set1

There's some interaction between Set's logic for standing up the internal hash and the memoized Ptr#hash value.

@headius
Copy link
Member

headius commented Aug 16, 2023

I have tested whether something is wrong with our Set by forcing it to use the gem, which is the same pure-Ruby set.rb that CRuby uses. It did not change the result, so it seems the issue is in JRuby's Hash or in how we are compiling this code.

@headius
Copy link
Member

headius commented Aug 16, 2023

New smaller reproduction that eliminates Set and Node. It's clearly not a problem with Set.

class Ptr
  def hash
    @hash ||= { x: 1 }.hash
  end
end

ptr = Ptr.new
set1 = { { ptr: ptr } => true}
set2 = { { ptr: ptr } => true}
puts set1 == set2
puts set2 == set1

@headius
Copy link
Member

headius commented Aug 16, 2023

Even smaller!

class Ptr
  def hash
    @hash ||= { x: 1 }.hash
  end
end

ptr = Ptr.new
puts({ ptr: ptr }.hash, { ptr: ptr }.hash)

@headius headius changed the title Set#== returning false incorrectly Recursive hashing corrupts shared hash buffer Aug 22, 2023
headius added a commit to headius/jruby that referenced this issue Aug 22, 2023
The original optimization here moved the allocation of the buffer
byte[16] into a thread-local to reuse it, but it kept the original
inline modification of that buffer via ByteBuffer.putLong. This
caused recursive Hash#hash calls to overwrite the buffer mid-hash,
leading to issues like jruby#7866 where a lazily-calculated recursive
hash produces a bogus value the first time through.

The change here avoids accessing and modifying the shared buffer
until after any recursive hash calls have completed.
@headius
Copy link
Member

headius commented Aug 22, 2023

OMG I found it.

In JRuby 9.4.0.0, 60ae8ba introduced a cache for the byte[16] used to calculate an aggregate hash based on the key and value, but none of us noticed that the buffer was in-use during recursive hash calls. That led to those recursive calls wiping out the partial results already present.

I've fixed it in #7901.

headius added a commit to headius/jruby that referenced this issue Aug 22, 2023
The original optimization here moved the allocation of the buffer
byte[16] into a thread-local to reuse it, but it kept the original
inline modification of that buffer via ByteBuffer.putLong. This
caused recursive Hash#hash calls to overwrite the buffer mid-hash,
leading to issues like jruby#7866 where a lazily-calculated recursive
hash produces a bogus value the first time through.

The change here avoids accessing and modifying the shared buffer
until after any recursive hash calls have completed.
@headius headius linked a pull request Aug 22, 2023 that will close this issue
@headius headius closed this as completed Aug 22, 2023
@enebo enebo added this to the JRuby 9.4.4.0 milestone Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants