Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily revert open-addressing Hash for concurrency reasons. #5412

Merged
merged 1 commit into from Nov 5, 2018

Conversation

headius
Copy link
Member

@headius headius commented Nov 5, 2018

While verifying JRuby 9.2.1, @enebo discovered that certain
concurrent uses of a simple Rails app were triggering intermittent
errors, and forcing the JIT to run very quickly produced NPEs
under concurrent load. In 6b2fa45 @enebo was able to fix this
by adding a null check, but we could not explain how a null could
appear at that point in the code unless the hash had been
corrupted, perhaps across threads.

Additional testing with a number of contrived scripts seems to
show that the new open-addressing hash (plus its linear mode) is
more fragile under concurrency than the old linked buckets
implementation. This is in a way, by design: the new
implementation reduces indirection and allocation by packing
keys and values into a contiguous area of memory, opening up
potential for more races against the same data.

In order to have a stable 9.2.1 release, we are temporarily
rolling back this change until we can get a better grasp of the
changes needed to make it as least as concurrency-friendly as the
old hash implementation, if not properly concurrency-safe (which
may or may not be easier in this new implementation).

@headius headius added this to the JRuby 9.2.1.0 milestone Nov 5, 2018
@headius
Copy link
Member Author

headius commented Nov 5, 2018

@ChrisBr We are going to delay the release of your open-addressing Hash until 9.2.2, so we can investigate how to make it a bit more robust under concurrent mutation. This is not an indictment of your work...the fact that the 9.2 Hash impl seems to survive more unsafe scenarios may be accidental, or simply due to the fact that it starts using a heavy bucket system immediately.

See discussions under #5406 about @enebo's discovery, but ignore my original assertions because they were based on a flawed understanding of the new Hash impl.

Here are a few examples of scripts that appear to work in 9.2 and fail in 9.2.1:

Leads to NPE in shift due to pulling a null key out of entries.

h = {}
Thread.abort_on_exception = true
t1 = Thread.new { i = 0; loop { h[i+=1] = 1 } }
t2 = Thread.new { loop { h.shift } }
sleep

Leads to ArrayIndexOutOfBounds quickly:

h = {}
Thread.abort_on_exception = true
loop {
  x = false
  t1 = Thread.new {|t| Thread.pass until x; i = 0; 10_000_000.times { h[i+=1] = Thread.current } }
  t2 = Thread.new { Thread.pass until x; i = 0.1; 10_000_000.times { h[i+=1] = Thread.current } }
  x = true
  t1.join
  t2.join
}

I have not included a third script that used delete instead of shift since it appeared to run without error on both Hash impls. There are likely other cases but these are key mutations (shrinking and growing) that appear to be more fragile with open addressing.

While verifying JRuby 9.2.1, @enebo discovered that certain
concurrent uses of a simple Rails app were triggering intermittent
errors, and forcing the JIT to run very quickly produced NPEs
under concurrent load. In 6b2fa45 @enebo was able to fix this
by adding a null check, but we could not explain how a null could
appear at that point in the code unless the hash had been
corrupted, perhaps across threads.

Additional testing with a number of contrived scripts seems to
show that the new open-addressing hash (plus its linear mode) is
more fragile under concurrency than the old linked buckets
implementation. This is in a way, by design: the new
implementation reduces indirection and allocation by packing
keys and values into a contiguous area of memory, opening up
potential for more races against the same data.

In order to have a stable 9.2.1 release, we are temporarily
rolling back this change until we can get a better grasp of the
changes needed to make it as least as concurrency-friendly as the
old hash implementation, if not properly concurrency-safe (which
may or may not be easier in this new implementation).

Original open addressing work by @ChrisBr.
@ChrisBr
Copy link
Contributor

ChrisBr commented Nov 6, 2018

@headius hm ok! Any idea how we can make it (more) thread safe?

headius added a commit that referenced this pull request Nov 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants