Recursive hashing corrupts shared hash buffer #7866

notEthan · 2023-07-21T22:54:01Z

I started memoizing the #hash method of a class (Ptr, below), which seems to have broken comparison of a Set containing instances of a class (Node) that has an ivar containing a Ptr, created in Node's initializer.

require 'set'

class Ptr
  def ==(other)
    other.is_a?(Ptr)
  end

  alias_method :eql?, :==

  def hash
    h = @hash ||= {x: 0}.hash
    puts "Ptr#hash:  #{h}"
    h
  end
end

class Node
  def initialize
    @ptr = Ptr.new
    @node_set = Set[self]
  end

  attr_reader :node_set

  def ==(other)
    other.is_a?(Node)
  end

  alias_method :eql?, :==

  def hash
    h = {ptr: @ptr}.hash
    puts "Node#hash: #{h}"
    h
  end
end

node = Node.new
post_node_set = Set[node]

puts
puts "post_node_set == node.node_set:"
puts  post_node_set == node.node_set
puts
puts "post_node_set.map(&:hash) == node.node_set.map(&:hash):"
puts  post_node_set.map(&:hash) == node.node_set.map(&:hash)
puts
puts "node.node_set == post_node_set:"
puts  node.node_set == post_node_set

Expected: all == comparisons result in true
Actual: first comparison is false (and the rest are true)

Some confluence of factors is at play here, I've simplified this down a lot, but simplifying further seems to no longer trigger the issue. Doing less in Node#initialize, or if Ptr#hash or Node#hash are hashing something simpler than Hash instances, result in expected behavior.

Output of the above, with notes:

Ptr#hash:  6413925280117835575
Node#hash: 8466601984141238220 ⬅️ from inside Node#initialize
Ptr#hash:  6413925280117835575
Node#hash: 3645463223149969145 ⬅️ after Node#initialize, should be same as from inside Node#initialize

post_node_set == node.node_set:
Ptr#hash:  6413925280117835575
Node#hash: 3645463223149969145
false                          ⬅️ expected: true

post_node_set.map(&:hash) == node.node_set.map(&:hash):
Ptr#hash:  6413925280117835575
Node#hash: 3645463223149969145
Ptr#hash:  6413925280117835575
Node#hash: 3645463223149969145
true

node.node_set == post_node_set:
Ptr#hash:  6413925280117835575
Node#hash: 3645463223149969145
true

Environment Information

JRuby version 9.4.3.0
Operating system: reproduces on all of ubuntu-latest, macos-latest, windows-latest on github actions
Test runs at https://github.com/notEthan/ghtest/actions/runs/5627207157

The text was updated successfully, but these errors were encountered:

headius · 2023-08-16T04:43:46Z

post_node_set == node.node_set

So this is the only one that fails, yes? I have not found a cause yet but it seems like the problem must revolve around Node not hashing the same as you expect. I believe Set is working properly but some interaction with your overridden methods might be confusing the underlying Hash.

headius · 2023-08-16T04:50:58Z

Removing the memoization or calling @ptr.hash in Node#initialize both make it work as expected. Problem seems to be something to do with the memoization.

headius · 2023-08-16T05:56:08Z

Here's the smallest repro I have that still shows the issue:

Edit: I pasted the wrong source at first.

require 'set'

class Ptr
  def hash
    @hash ||= {x:1}.hash
  end
end

class Node
  def initialize
    @ptr = Ptr.new
  end

  def hash
    {ptr: @ptr}.hash
  end
end

node = Node.new
set1 = Set[node]
set2 = Set[node]
puts set1 == set2
puts set2 == set1

There's some interaction between Set's logic for standing up the internal hash and the memoized Ptr#hash value.

headius · 2023-08-16T06:11:44Z

I have tested whether something is wrong with our Set by forcing it to use the gem, which is the same pure-Ruby set.rb that CRuby uses. It did not change the result, so it seems the issue is in JRuby's Hash or in how we are compiling this code.

headius · 2023-08-16T21:36:08Z

New smaller reproduction that eliminates Set and Node. It's clearly not a problem with Set.

class Ptr
  def hash
    @hash ||= { x: 1 }.hash
  end
end

ptr = Ptr.new
set1 = { { ptr: ptr } => true}
set2 = { { ptr: ptr } => true}
puts set1 == set2
puts set2 == set1

headius · 2023-08-16T21:45:29Z

Even smaller!

class Ptr
  def hash
    @hash ||= { x: 1 }.hash
  end
end

ptr = Ptr.new
puts({ ptr: ptr }.hash, { ptr: ptr }.hash)

The original optimization here moved the allocation of the buffer byte[16] into a thread-local to reuse it, but it kept the original inline modification of that buffer via ByteBuffer.putLong. This caused recursive Hash#hash calls to overwrite the buffer mid-hash, leading to issues like jruby#7866 where a lazily-calculated recursive hash produces a bogus value the first time through. The change here avoids accessing and modifying the shared buffer until after any recursive hash calls have completed.

headius · 2023-08-22T22:16:39Z

OMG I found it.

In JRuby 9.4.0.0, 60ae8ba introduced a cache for the byte[16] used to calculate an aggregate hash based on the key and value, but none of us noticed that the buffer was in-use during recursive hash calls. That led to those recursive calls wiping out the partial results already present.

I've fixed it in #7901.

The original optimization here moved the allocation of the buffer byte[16] into a thread-local to reuse it, but it kept the original inline modification of that buffer via ByteBuffer.putLong. This caused recursive Hash#hash calls to overwrite the buffer mid-hash, leading to issues like jruby#7866 where a lazily-calculated recursive hash produces a bogus value the first time through. The change here avoids accessing and modifying the shared buffer until after any recursive hash calls have completed.

headius changed the title ~~Set#== returning false incorrectly~~ Recursive hashing corrupts shared hash buffer Aug 22, 2023

headius mentioned this issue Aug 22, 2023

Protect cached threadlocal in recursive hash #7901

Merged

headius linked a pull request Aug 22, 2023 that will close this issue

Protect cached threadlocal in recursive hash #7901

Merged

headius closed this as completed Aug 22, 2023

enebo added this to the JRuby 9.4.4.0 milestone Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursive hashing corrupts shared hash buffer #7866

Recursive hashing corrupts shared hash buffer #7866

notEthan commented Jul 21, 2023

headius commented Aug 16, 2023

headius commented Aug 16, 2023

headius commented Aug 16, 2023 •

edited

headius commented Aug 16, 2023

headius commented Aug 16, 2023 •

edited

headius commented Aug 16, 2023

headius commented Aug 22, 2023

Recursive hashing corrupts shared hash buffer #7866

Recursive hashing corrupts shared hash buffer #7866

Comments

notEthan commented Jul 21, 2023

headius commented Aug 16, 2023

headius commented Aug 16, 2023

headius commented Aug 16, 2023 • edited

headius commented Aug 16, 2023

headius commented Aug 16, 2023 • edited

headius commented Aug 16, 2023

headius commented Aug 22, 2023

headius commented Aug 16, 2023 •

edited

headius commented Aug 16, 2023 •

edited