Skip to content

New ReentrantReadWriteLock class #381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 30, 2015

Conversation

alexdowad
Copy link
Contributor

See PR #372 for the background on this one.

I am opening this PR to allow for code review. It is NOT ready to merge yet.

@alexdowad
Copy link
Contributor Author

Note the new #try_write_lock and #try_read_lock methods. Also, this class allows "upgrading" and "downgrading" of locks -- to do this, simply acquire a read lock before releasing your write lock (or vice versa). It doesn't care if you don't follow a strict ordering of write-acquire, read-acquire, read-release, write-release.

@alexdowad alexdowad force-pushed the reentrant_rw_lock branch from 9b1e5d0 to b977602 Compare July 26, 2015 20:16
@alexdowad
Copy link
Contributor Author

Specs are coming soon...

@alexdowad
Copy link
Contributor Author

I cherry-picked the 2 commits from @jdantonio's PR which still seem useful, so if/when this PR is ready to merge, those will go in at the same time.


if c == 0 # no readers OR writers running OR writers waiting
# if we successfully swap the RUNNING_WRITER bit on, then we can go ahead
if @Counter.compare_and_set(0, RUNNING_WRITER)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexdowad If I follow the code correctly, this is the bit that allows a thread with a read lock to "upgrade" to a write lock (as in the Rails lock class). Am I correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdantonio, wow, you just made me realize that this code does not handle that "read lock upgrading to write lock" scenario properly! Thanks, I'll fix it!

@jdantonio
Copy link
Member

@alexdowad This looks fantastic! Thank you very much for working on this!

The tests are failing on all JRuby build because of the Java update to AtomicFixnum:

ext/com/concurrent_ruby/ext/JavaAtomicFixnumLibrary.java:80: error: cannot find symbol
        public IRubyObject update(ThreadContext context, Block block) {
                                                         ^
  symbol:   class Block
  location: class JavaAtomicFixnum

This looks like it should be easy to fix. I can look into it if need be.

The tests are also failing on Windows because of a Bignum/Fixnum issue. I will look into that for you.

With respect to the tests, it seems to me that we should be able to extract most of the tests for ReadWriteLock into a set of shared specs, behaves_as :read_write_lock. I can do that myself, so if you only write tests for the new re-entrant functionality that should be plenty.

@alexdowad
Copy link
Contributor Author

I'll fix the Java code. Sorry, this PR is still a bit rough. Regarding Bignum/Fixnum, would that be on a 32-bit machine by any chance? It's been a while since I looked at the relevant MRI code, but I suspect that they may be using 2 bits for tagging, leaving only 30 bits available for fixnums. Could you adjust WRITER_BITS to 14 rather than 15, and re-run the test on Windows?

@jdantonio
Copy link
Member

The Windows build will run automatically every time you update the PR. Right now we're testing both 32 and 64 bit versions of Ruby on Windows. There is a Concurrent.on_windows? helper function that might be helpful. Perhaps something like this?

WRITER_BITS = (Concurrent.on_windows? ? 14 : 15)

@alexdowad alexdowad force-pushed the reentrant_rw_lock branch 2 times, most recently from e6526c5 to 64c9a57 Compare July 27, 2015 20:03
@alexdowad
Copy link
Contributor Author

You can see that the specs are underway now (and have already flushed out a number of bugs). Still lots more tests to add.

A fun diversion has come up, though: I wrote a spec which causes MRI to segfault! To my trusty debugger...

@alexdowad alexdowad force-pushed the reentrant_rw_lock branch from 64c9a57 to b223545 Compare July 27, 2015 20:11
@alexdowad
Copy link
Contributor Author

Sorry, just re-ordered some commits.

@alexdowad alexdowad force-pushed the reentrant_rw_lock branch 2 times, most recently from 5a10fc6 to 554a0be Compare July 28, 2015 11:34
@alexdowad
Copy link
Contributor Author

We have a pretty good little suite of specs for ReentrantReadWriteLock spec now. More bugs flushed out.

I think a thorough code review by at least one either developer is needed before this goes into production.

@jdantonio
Copy link
Member

The tests hang with Concurrent::ReentrantReadWriteLock #try_read_lock on all versions of Windows (see here). I'll pull the code to my Windows test machine later tonight and see if I can figure out what's going on.

@alexdowad alexdowad force-pushed the reentrant_rw_lock branch from 554a0be to cac958a Compare July 29, 2015 07:41
@alexdowad
Copy link
Contributor Author

OK, I'm on it.

@alexdowad
Copy link
Contributor Author

If anyone is interested in the MRI segfault which was discovered while working on this PR, here is what it turned out to be: https://bugs.ruby-lang.org/issues/11404

@alexdowad alexdowad force-pushed the reentrant_rw_lock branch 2 times, most recently from c3f62ca to fcf02d1 Compare July 29, 2015 11:01
@alexdowad
Copy link
Contributor Author

The problem on Windows is sorted out. Unfortunately, it looks like Rubinius.unlock is buggy. Have a look at this:

        def ns_wait(timeout = nil)
          wchan = Rubinius::Channel.new

          begin
            @__Waiters__.push wchan
            puts "Waiting.... #{self.inspect}"
            puts "Locked = #{Rubinius.locked?(self)}"
            puts Rubinius.unlock(self).inspect
            puts "Locked = #{Rubinius.locked?(self)}"
            signaled = wchan.receive_timeout timeout
          ensure
            Rubinius.lock(self)

            if !signaled && !@__Waiters__.delete(wchan)
              # we timed out, but got signaled afterwards,
              # so pass that signal on to the next waiter
              @__Waiters__.shift << true unless @__Waiters__.empty?
            end
          end

          self
        end

That's a modified version of Concurrent::Synchronization::Lock#ns_wait, with some debug print statements. Output from RSpec test:

Waiting.... #<Concurrent::Synchronization::Lock:0x2260 @__Waiters__=[#<Rubinius::Channel
:0x2268>]>
Locked = true
nil
Locked = true

Looking at the RBX source, it looks like Rubinius.unlock is supposed to return nil if successful, or some special "primitive failure" value if unsuccessful. You can see that it is returning nil, but the object is still not unlocked!

@pitr-ch, your comments please!

@alexdowad alexdowad force-pushed the reentrant_rw_lock branch from fcf02d1 to dcba34d Compare July 29, 2015 12:21
@jdantonio
Copy link
Member

I'm glad you were able to get things to work on Windows! I didn't have a chance to take a look last night as I had hoped. The one test that failed on one Windows build doesn't concern me. Tests that check the status of a thread are incredibly difficult to make work consistently.

The Rbx bug is unfortunate, but I don't think this should stop is from merging this PR. There is nothing we can do about a buggy runtime. We can submit a bug to Rbx and add an Rbx-specific warning to the code.

@alexdowad
Copy link
Contributor Author

@jdantonio, of course you can do whatever you like, but I would suggest holding onto this one for just a few days to see if we can really make it right.

@jdantonio
Copy link
Member

@alexdowad I was under the impression that we couldn't do anything about this. If there is a way to fix it then by all means, we should.

@pitr-ch
Copy link
Member

pitr-ch commented Jul 29, 2015

@alexdowad thanks for letting me know, I'll look asap. 👍 for waiting until fixed, we cannot released it if it's broken on Rubinius anyway, we should at least give them a chance to fix it if it's indeed bug in the runtime.

@ReadQueue = Synchronization::Lock.new # used to queue waiting readers
@WriteQueue = Synchronization::Lock.new # used to queue waiting writers
@HeldCount = ThreadLocalVar.new(0) # indicates # of R & W locks held by this thread
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more safe to have ensure_ivar_visibility! here as in ReadWriteLock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is, if we put a memory barrier there on Rubinius, are we also going to put a memory barrier at the beginning of every method which uses the ivars? If not, then is the memory barrier in the constructor actually worth anything?

Regarding the current problem with Rubinius.unlock -- I have tried peppering the code with memory barriers and it didn't make any difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning for the barrier is as follows: the object is being constructed so no other thread has a reference to it (assuming proper construction, where reference does not escape during construction). The barrier is here so final field assignments of the constructed object cannot be reordered with an assignment of the constructed object to a publicly accessible variable where it can be read by other threads. If the barrier is omitted other threads can see the object instance before it's fully initialised because of the possible reorderings on thread performing the construction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense!

I'd like to make this improvement without inheriting from Synchronization::Lock. Lock adds extra instance variables which will not be useful for ReentrantReadWriteLock. This would increase the memory footprint of each instance.

What would you think of Rubinius.memory_barrier if Concurrent.on_rbx??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now i think its better to inherit so it's used standardly. However I am aware of this issue and I'll fix it later for all classes using Synchronisation::Object. If here is some custom code here I can easily miss it.

@alexdowad alexdowad force-pushed the reentrant_rw_lock branch from 61bc71d to 4f47db6 Compare July 30, 2015 13:18
@@ -8,13 +8,26 @@ module Synchronization
class RbxObject < AbstractObject
def initialize
@__Waiters__ = []
@__Owner__ = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for nitpick: this should be @__owner__ since it's mutated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't know your convention was caps for "final". Fixed.

… held

Rubinius.lock counts how many times an object has been locked, and only releases the
lock when it has been unlocked an equal number of times. Therefore, if Lock#wait is
called on a Lock which was *already* locked, the call to Rubinius.unlock which occurs
before sleeping does not actually release the lock. This makes it impossible for other
threads to wake it up with #signal or #broadcast.

A solution is to make sure that #synchronize never calls Rubinius.lock on an object which
is already locked.
ReadWriteLock is a fast and very useful utility, but it is non-reentrant and this
makes it unsuitable for some (well, a lot of) applications. You can never take an
exclusive lock without being 100% sure that you are not holding it already. If you
mess up, then game over. Your thread will deadlock hard and grind to a halt. Any
other thread which tries to take the exclusive lock will deadlock too.

When there are only a few sections of code which access a shared, mutable resource,
avoiding this scenario is not so hard. But when you have a lot of places where
locking is needed, and they call each other in complex ways, forget it. You WILL
get it wrong.

If you are building a library for other programmers to use, and you call into
"consumer" code with locks held, that's even worse. (Such as public API methods
which yield to a block...) You don't know what that "consumer" code will do. It
may call back into your library again.

Reentrant locks are so much more friendly. You can audit each method
individually to check whether it acquires (and releases) all the locks that
it needs in the right places. You don't have to check every single call site to
see which locks may already be held when the method is entered. (This will save you
some lost hair, which you can later tear out when dealing with lock acquisition
order issues.)

Acquiring and releasing a ReentrantReadWriteLock is about 50% more expensive
(in CPU time) than a plain ReadWriteLock. But it's still fast. Don't worry
about performance unless you expect to acquire/release it 10,000s of times
per second.

ReentrantReadWriteLock has another nice property: if you are holding a write
lock, you can FIRST acquire a read lock and THEN drop the write lock. This allows
lock "downgrading". "Upgrading" can be done in the converse way.

If you think "ReentrantReadWriteLock" is too long to type (and it is), feel free
to "alias" it by defining a constant. This is Ruby, after all.
@alexdowad alexdowad force-pushed the reentrant_rw_lock branch from 4f47db6 to 5d4e385 Compare July 30, 2015 14:16
@alexdowad
Copy link
Contributor Author

@pitr-ch, they do seem similar, because the algorithm used for ReentrantReadWriteLock was derived from the one used for ReadWriteLock, but there are numerous differences in the details of the algorithm. It might be possible to find some little random bits of code which are the same, but not conceptually "cohesive" operations which make sense on their own.

DRY is generally a good thing, but can be detrimental when taken too far. Why? Because removing duplication often means adding another layer of indirection. Too much indirection makes code harder to read, understand, and maintain.

At the same time, higher LOC count also makes a project harder to understand and maintain. And unifying duplicated code has the advantage that bug fixes and performance optimizations on de-duplicated code can benefit all the places that use it. On the other hand, de-duplicating also means that you can break class A while trying to fix class B. You are creating dependencies where none existed before.

There is more that could be said about pros and cons on both sides, but in short, I believe that readability, maintainability, and performance of this code will all be enhanced if we just keep the reentrant RW-lock classes separate.

However, I would love to be proved wrong, if you (or anybody) can show a nice way to break out the pieces of the algorithms, which reduces duplication while enhancing readability and maintaining good performance.

@pitr-ch
Copy link
Member

pitr-ch commented Jul 30, 2015

Ok at first glance it looked similar a lot, but as you say it just appears that way. So lets keep it, no need to delay this PR. I may look at it later.

jdantonio added a commit that referenced this pull request Jul 30, 2015
@jdantonio jdantonio merged commit f6b35d9 into ruby-concurrency:master Jul 30, 2015
@alexdowad alexdowad deleted the reentrant_rw_lock branch July 30, 2015 16:40
@jdantonio
Copy link
Member

Unfortunately, while @alexdowad was doing his excellent work to create a highly-performant lock class that was compatible with the dependency loader lock in Rails, they added new features to their lock. At this point it looks like their lock is highly specialized and probably not worth our time to support. :-(

@alexdowad
Copy link
Contributor Author

It is quite specialized indeed. I wonder how much they gain from those "loose upgrades". It seems to me the purpose could probably be achieved in a much simpler way, though I'm not deep in the Rails code so I don't know.

@matthewd
Copy link
Contributor

@alexdowad they prevent us from deadlocking as soon as you have two concurrent requests, which both need an opportunity to do a load

@jdantonio
Copy link
Member

@alexdowad Did you get my email?

@jdantonio
Copy link
Member

On the bright side, this new lock class is an excellent addition! And we have an improved TLV, too!

@alexdowad
Copy link
Contributor Author

@matthewd, why do requests need to do a load? Autoload of Rails classes/other gem classes?

@matthewd
Copy link
Contributor

Rails application classes (e.g., the UserController or Post classes) -- the lock is for the development mode, (re)load-on-demand, behaviour.

@alexdowad
Copy link
Contributor Author

Why does that reloading have to be done in the middle of a request? Can't you complete all outstanding requests, reload, then start accepting requests again?

@matthewd
Copy link
Contributor

We unload between requests, and we load each class when it is referenced (via const_missing); we can't pre-load the entire application every time because performance.

@alexdowad
Copy link
Contributor Author

OK, fair enough.

@jdantonio
Copy link
Member

@matthewd Thank you for all your feedback!

@pitr-ch
Copy link
Member

pitr-ch commented Aug 2, 2015

We could generalise our ReadWrite lock implementation to be able to cover the Rails use-case. A factory class could be introduced to CR, which would generate a Lock classes based on given rules. The rules would be a small set of purposes and compatibility rules. Based on that it should be able to generate performant implementation. For ReadWriteLock the rules would be: {read: [:read], write: []} (where key is name of purpose and value is set of compatible purposes). For the Rails use-case it would be: {load: [:load], unload: [:load, :unload]}. The factory class would divide the bits in Fixnum to track the counts for given purposes, failing when there is not enough space or fall-backing to an immutable struct of Fixnums.

This is just an idea for improvement, since the lock is not used in Rails production, there will be little gain if Rails migrate to this more performant implementation.

@alexdowad
Copy link
Contributor Author

@pitr-ch It's an interesting idea, but do you think such an interface would be useful for anyone else aside from Rails?

@alexdowad
Copy link
Contributor Author

Another thing: we can cover the Rails use case much more simply:

class SharedLock < Concurrent::ReentrantReadWriteLock
  def with_write_lock_loose(&block)
    if (@HeldCount.value & READ_LOCK_MASK) > 0
      # we are already holding a read lock
      release_read_lock
      with_write_lock(&block)
      acquire_read_lock
    else
      with_write_lock(&block)
    end
  end
end

That doesn't cover the "purpose" thing, does it? But I think even that can be covered by adding a couple more custom methods which delegate to the existing methods on ReentrantReadWriteLock.

@pitr-ch
Copy link
Member

pitr-ch commented Aug 3, 2015

Yeah that sounds more practical. I cannot think of another example. I mainly wanted to put the idea somewhere if anybody is interested looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding features, adding tests, improving documentation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants