Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily revert open-addressing Hash for concurrency reasons. #5412

Merged
merged 1 commit into from Nov 5, 2018

Conversation

Projects
None yet
2 participants
@headius
Copy link
Member

commented Nov 5, 2018

While verifying JRuby 9.2.1, @enebo discovered that certain
concurrent uses of a simple Rails app were triggering intermittent
errors, and forcing the JIT to run very quickly produced NPEs
under concurrent load. In 6b2fa45 @enebo was able to fix this
by adding a null check, but we could not explain how a null could
appear at that point in the code unless the hash had been
corrupted, perhaps across threads.

Additional testing with a number of contrived scripts seems to
show that the new open-addressing hash (plus its linear mode) is
more fragile under concurrency than the old linked buckets
implementation. This is in a way, by design: the new
implementation reduces indirection and allocation by packing
keys and values into a contiguous area of memory, opening up
potential for more races against the same data.

In order to have a stable 9.2.1 release, we are temporarily
rolling back this change until we can get a better grasp of the
changes needed to make it as least as concurrency-friendly as the
old hash implementation, if not properly concurrency-safe (which
may or may not be easier in this new implementation).

@headius headius added this to the JRuby 9.2.1.0 milestone Nov 5, 2018

@headius

This comment has been minimized.

Copy link
Member Author

commented Nov 5, 2018

@ChrisBr We are going to delay the release of your open-addressing Hash until 9.2.2, so we can investigate how to make it a bit more robust under concurrent mutation. This is not an indictment of your work...the fact that the 9.2 Hash impl seems to survive more unsafe scenarios may be accidental, or simply due to the fact that it starts using a heavy bucket system immediately.

See discussions under #5406 about @enebo's discovery, but ignore my original assertions because they were based on a flawed understanding of the new Hash impl.

Here are a few examples of scripts that appear to work in 9.2 and fail in 9.2.1:

Leads to NPE in shift due to pulling a null key out of entries.

h = {}
Thread.abort_on_exception = true
t1 = Thread.new { i = 0; loop { h[i+=1] = 1 } }
t2 = Thread.new { loop { h.shift } }
sleep

Leads to ArrayIndexOutOfBounds quickly:

h = {}
Thread.abort_on_exception = true
loop {
  x = false
  t1 = Thread.new {|t| Thread.pass until x; i = 0; 10_000_000.times { h[i+=1] = Thread.current } }
  t2 = Thread.new { Thread.pass until x; i = 0.1; 10_000_000.times { h[i+=1] = Thread.current } }
  x = true
  t1.join
  t2.join
}

I have not included a third script that used delete instead of shift since it appeared to run without error on both Hash impls. There are likely other cases but these are key mutations (shrinking and growing) that appear to be more fragile with open addressing.

Temporarily revert open-addressing Hash for concurrency reasons.
While verifying JRuby 9.2.1, @enebo discovered that certain
concurrent uses of a simple Rails app were triggering intermittent
errors, and forcing the JIT to run very quickly produced NPEs
under concurrent load. In 6b2fa45 @enebo was able to fix this
by adding a null check, but we could not explain how a null could
appear at that point in the code unless the hash had been
corrupted, perhaps across threads.

Additional testing with a number of contrived scripts seems to
show that the new open-addressing hash (plus its linear mode) is
more fragile under concurrency than the old linked buckets
implementation. This is in a way, by design: the new
implementation reduces indirection and allocation by packing
keys and values into a contiguous area of memory, opening up
potential for more races against the same data.

In order to have a stable 9.2.1 release, we are temporarily
rolling back this change until we can get a better grasp of the
changes needed to make it as least as concurrency-friendly as the
old hash implementation, if not properly concurrency-safe (which
may or may not be easier in this new implementation).

Original open addressing work by @ChrisBr.

@headius headius force-pushed the headius:delay_open_hash branch from de0283e to 5ad17f7 Nov 5, 2018

@headius headius merged commit 44c1ddc into jruby:master Nov 5, 2018

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

@headius headius deleted the headius:delay_open_hash branch Nov 5, 2018

@ChrisBr

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

@headius hm ok! Any idea how we can make it (more) thread safe?

headius added a commit that referenced this pull request Nov 6, 2018

Reinstate open-addressing hash to work on concurrency improvements.
See #5412.

This reverts commit 44c1ddc, reversing
changes made to b48889f.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.