Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #481 fix hanging race condition #482

Merged
merged 26 commits into from
Sep 16, 2020

Conversation

ColinDKelley
Copy link
Collaborator

@ColinDKelley ColinDKelley commented Aug 12, 2020

Fixes issue #481 by removing code that attempted to synchronize threads with sleep() and Thread#wakeup. That was a race condition bug since if wakeup ran before sleep(), it would be lost and the sleeping thread would simply hang.

Detailed changes:

  • Use ::Mutex/::ConditionVariable to wait/wake up on state transitions. I factored this down into FSM since it's used in two state machine classes now.
  • Clean up and simplify FSM. Enforce symbol contracts, remove unnecessary methods, rename default_state to start_state since the latter is standard FSM terminology. Also initialize with initialize_fsm vs super().
  • Use Queue#pop to block front-end when waiting for events.
  • Combine Loop#setup and Loop#resume since they were always called consecutively. This allowed the Listener :front_end_ready state to disappear.
  • Remove Loop#resume and Loop#pause since they were no-ops.
  • Rename teardown to stop to be consistent. (At some point I'd like to do a similar thing with :processing_events. This could simply be called :started.)

lib/listen/listener.rb Show resolved Hide resolved
lib/listen/fsm.rb Show resolved Hide resolved
lib/listen/fsm.rb Outdated Show resolved Hide resolved
lib/listen/fsm.rb Outdated Show resolved Hide resolved
lib/listen/fsm.rb Show resolved Hide resolved
lib/listen/fsm.rb Show resolved Hide resolved
lib/listen/fsm.rb Outdated Show resolved Hide resolved
lib/listen/fsm.rb Show resolved Hide resolved
lib/listen/event/loop.rb Show resolved Hide resolved
lib/listen/event/loop.rb Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Aug 12, 2020

Coverage Status

Coverage decreased (-0.7%) to 97.026% when pulling aabfbbf on Invoca:issue_481-fix-hanging-race-condition into ba5059c on guard:master.

spec/lib/listen/adapter/linux_spec.rb Show resolved Hide resolved
spec/lib/listen/adapter/linux_spec.rb Show resolved Hide resolved
spec/lib/listen/adapter/linux_spec.rb Show resolved Hide resolved
spec/lib/listen/adapter/linux_spec.rb Show resolved Hide resolved
spec/lib/listen/adapter/linux_spec.rb Show resolved Hide resolved
lib/listen/event/loop.rb Show resolved Hide resolved
lib/listen/event/loop.rb Show resolved Hide resolved
lib/listen/event/loop.rb Outdated Show resolved Hide resolved
lib/listen/event/loop.rb Outdated Show resolved Hide resolved
lib/listen/event/config.rb Show resolved Hide resolved
Copy link
Member

@ioquatix ioquatix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort. I have asked for some changes.

In the future, please make smaller PRs focusing on one particular issue.

Pay attention to changes to public interfaces since this can cause a lot of pain downstream.

I look forward to reviewing and merging this PR and others from you.

lib/listen/event/loop.rb Outdated Show resolved Hide resolved
class Error < RuntimeError
class NotStarted < Error
end
class ThreadFailedToStart < Error; end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since NotStarted is a public interface, please reuse it instead of introducing ThreadFailedToStart.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point--I hadn't thought about it being in the public interface. I will put that back to its old name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored it to its old name, NotStarted. And if start is called when we're already started, I put the code back as it was to simply return as a no-op, since there could be callers who were counting on that public behavior.
0b236c1


def start
transition! :starting do
state == :pre_start or raise Error::AlreadyStarted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's better to use explicit if statement here

unless state == :pre_start
  raise Error::AlreadyStarted...
end

class NotStarted < Error
end
class ThreadFailedToStart < Error; end
class AlreadyStarted < Error; end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should inherit from NotStarted to preserve existing semantics where possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a bit confusing though?

AlreadyStarted.is_a?(NotStarted)

I will look into what was raised before if start was called a second time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, understood. In this context... if someone calls start, and it's already started, that's a kind of NotStarted because it was already started. If you can think of something better, it's worth considering.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored to its old behavior: 0b236c1

lib/listen/fsm.rb Outdated Show resolved Hide resolved
# Passing a state name sets the start state
def start_state(new_start_state = nil)
if new_start_state
new_start_state.is_a?(Symbol) or raise ArgumentError, "state name must be a Symbol (got #{new_start_state.inspect})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of explicitly detecting it's a symbol, why not use to_sym?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a style we like for enforcing contracts, especially around symbol-vs-string! There aren't any callers of this method passing strings, so .to_sym would be a no-op now. Adding .to_sym now might serve to enable more confusing usage later...but there's no need for that, is there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of pseudo type checking is uncommon in Ruby, even thought I can see how it makes sense.

The reality is, if you add this, there are many other places where you might start trying to enforce type contracts.

Agreed, we don't really want users passing in strings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True that it's not common in Ruby (yet). But we've found it to be invaluable for containing String vs. Symbol insanity! Otherwise, you wind up with defensive code sprinkled everywhere that tries to handle String and Symbol as equivalent...and you never know if that clutter is complete.

Speaking of which, I added the missing spec for the FSM module and made sure to test this case: bbd10ee

# Low-level, immediate state transition with no checks or callbacks.
def transition!(new_state_name)
new_state_name.is_a?(Symbol) or raise ArgumentError, "state name must be a Symbol (got #{new_state_name.inspect})"
@mutex or raise ArgumentError, "FSM not initialized. You must call super() from initialize!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an overly specific issue to check for - I can imagine some code was not working correctly, but in that case @mutex.synchronize will blow up with nil error... do we need to manually check it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because of the contract that the including code must call super() first. That's pretty unusual for mixins, but it's how the code was already structured, and it work well in this case. It's critical to get the mutex initialized up front because the lazy style that's often used in mixins (@mutex ||= Mutex.new) wouldn't work; ironically, it would need a synchronizing mutex around it!

Ah, this might make it more clear: how about an overall flag like @_fsm_initialized that's set = true when super() is called. This could code check that for the ArgumentError test. If that test passes, we can be confident that @mutex has been initialized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood.

Well, I don't really want to make the code any more complicated than it already is.

If the code is not implemented correctly (missing call to super) it's a mechanical problem - i.e. the code needs to be fixed. This isn't a transient run-time issue. The current implementation is okay, but it's also very explicit in a non-canonical way.

I don't think we should be adding extra instance variables to enforce initialisation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized we had to move away from super(). There's a reason I've never seen mixins initialized that way before: it precludes any base class initialization, since the mixin would have to do that for you (by calling super itself), but would have no way to know the base class's initialize signature!

Instead I went with an explicitly-named method initialize_fsm. And stored a variable to remember if that didn't happen. And I addressed the longstanding comment in there to make the FSM methods private. And I added a spec for the FSM module and tested all of this. I think it's all much cleaner now! LMK what you think.

@name = name
@block = block
@transitions = nil
@transitions = Array(transitions).map(&:to_sym) if transitions
@transitions = if transitions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need if transitions.

Copy link
Collaborator Author

@ColinDKelley ColinDKelley Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it from the old code. You can see it hiding up there one line up... in "trailing if" form. There is code below that counts on it. It's just after this comment:

All transitions are allowed if none are expressly declared

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simply not needed. Array(nil) => [].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. That if is there so that when transitions is nil, @transitions will be as well. The code below counts on that, because it uses nil in @transitions as an indicator that all transitions are allowed.

@ioquatix
Copy link
Member

You may need to rebase on master.

Copy link
Collaborator Author

@ColinDKelley ColinDKelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @ioquatix.

I've responded to all the comments you left. 3 of them require changes. I'll try to push those up in the next day or two.

class Error < RuntimeError
class NotStarted < Error
end
class ThreadFailedToStart < Error; end
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point--I hadn't thought about it being in the public interface. I will put that back to its old name.

class NotStarted < Error
end
class ThreadFailedToStart < Error; end
class AlreadyStarted < Error; end
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a bit confusing though?

AlreadyStarted.is_a?(NotStarted)

I will look into what was raised before if start was called a second time.


def start
transition! :starting do
state == :pre_start or raise Error::AlreadyStarted
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're fans for the one-line and/or form for concise preconditions [this use case is exactly why and and or are in Ruby, from Perl], but I can certainly change it to the 3-line form.

lib/listen/fsm.rb Outdated Show resolved Hide resolved
# Passing a state name sets the start state
def start_state(new_start_state = nil)
if new_start_state
new_start_state.is_a?(Symbol) or raise ArgumentError, "state name must be a Symbol (got #{new_start_state.inspect})"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a style we like for enforcing contracts, especially around symbol-vs-string! There aren't any callers of this method passing strings, so .to_sym would be a no-op now. Adding .to_sym now might serve to enable more confusing usage later...but there's no need for that, is there?

# Low-level, immediate state transition with no checks or callbacks.
def transition!(new_state_name)
new_state_name.is_a?(Symbol) or raise ArgumentError, "state name must be a Symbol (got #{new_state_name.inspect})"
@mutex or raise ArgumentError, "FSM not initialized. You must call super() from initialize!"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because of the contract that the including code must call super() first. That's pretty unusual for mixins, but it's how the code was already structured, and it work well in this case. It's critical to get the mutex initialized up front because the lazy style that's often used in mixins (@mutex ||= Mutex.new) wouldn't work; ironically, it would need a synchronizing mutex around it!

Ah, this might make it more clear: how about an overall flag like @_fsm_initialized that's set = true when super() is called. This could code check that for the ArgumentError test. If that test passes, we can be confident that @mutex has been initialized.

@name = name
@block = block
@transitions = nil
@transitions = Array(transitions).map(&:to_sym) if transitions
@transitions = if transitions
Copy link
Collaborator Author

@ColinDKelley ColinDKelley Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it from the old code. You can see it hiding up there one line up... in "trailing if" form. There is code below that counts on it. It's just after this comment:

All transitions are allowed if none are expressly declared

spec/lib/listen/fsm_spec.rb Outdated Show resolved Hide resolved
spec/lib/listen/fsm_spec.rb Outdated Show resolved Hide resolved
spec/lib/listen/fsm_spec.rb Outdated Show resolved Hide resolved
spec/lib/listen/fsm_spec.rb Outdated Show resolved Hide resolved
spec/lib/listen/fsm_spec.rb Outdated Show resolved Hide resolved
spec/lib/listen/fsm_spec.rb Outdated Show resolved Hide resolved
@@ -0,0 +1,103 @@
RSpec.describe Listen::FSM do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style/FrozenStringLiteralComment: Missing magic comment # frozen_string_literal: true.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make a separate PR for this later.

spec/lib/listen/fsm_spec.rb Show resolved Hide resolved
spec/lib/listen/fsm_spec.rb Show resolved Hide resolved
spec/lib/listen/fsm_spec.rb Show resolved Hide resolved
spec/lib/listen/adapter/polling_spec.rb Show resolved Hide resolved
spec/lib/listen/adapter/polling_spec.rb Show resolved Hide resolved
spec/lib/listen/adapter/polling_spec.rb Show resolved Hide resolved
spec/lib/listen/adapter/polling_spec.rb Show resolved Hide resolved
lib/listen/fsm.rb Show resolved Hide resolved
lib/listen/fsm.rb Show resolved Hide resolved
lib/listen/fsm.rb Show resolved Hide resolved
lib/listen/fsm.rb Show resolved Hide resolved
lib/listen/event/loop.rb Outdated Show resolved Hide resolved
lib/listen/event/loop.rb Show resolved Hide resolved
@ColinDKelley ColinDKelley force-pushed the issue_481-fix-hanging-race-condition branch 2 times, most recently from 4a0458a to d7efb56 Compare August 27, 2020 19:17
lib/listen/file.rb Outdated Show resolved Hide resolved
lib/listen/file.rb Outdated Show resolved Hide resolved
lib/listen/file.rb Outdated Show resolved Hide resolved
@ColinDKelley ColinDKelley force-pushed the issue_481-fix-hanging-race-condition branch from d7efb56 to 17c612d Compare August 27, 2020 19:22
@ColinDKelley
Copy link
Collaborator Author

@ioquatix I believe all comments are addressed here. I merged up latest master as well. There were a couple build failures that came across with master. I sent a separate PR #488 for those and merged it in here.

@ColinDKelley
Copy link
Collaborator Author

Hi @ioquatix
Anything else to do on this PR?

@ioquatix
Copy link
Member

ioquatix commented Sep 7, 2020

My apologies, I was busy for Ruby Kaigi conference. I'm reviewing it now.

@ioquatix
Copy link
Member

ioquatix commented Sep 7, 2020

I'll fix the Ruby 3 issues. However can you check why:

Failures:

  1) Listen with one listen dir force_polling option to false with default ignore options two dirs with files in listen dir listens to dir move
     Failure/Error:
       expect(wrapper.listen do
         mv 'dir1', 'dir2/'
       end).to eq expected

       expected: {:added=>["dir2/dir1/file1.rb"], :modified=>[], :removed=>["dir1/file1.rb"]}
            got: {:added=>["dir2/dir1/file1.rb"], :modified=>["dir2/file2.rb"], :removed=>["dir1/file1.rb"]}

       (compared using ==)

@ioquatix
Copy link
Member

ioquatix commented Sep 7, 2020

Okay, I see a bunch of specs are failing, I'll try to bring master back to green.

@ioquatix
Copy link
Member

ioquatix commented Sep 7, 2020

Okay, master is green again, can you rebase?

@ioquatix ioquatix mentioned this pull request Sep 8, 2020
@ColinDKelley ColinDKelley force-pushed the issue_481-fix-hanging-race-condition branch from 92a532c to b4c5afe Compare September 12, 2020 23:40
@ColinDKelley
Copy link
Collaborator Author

ColinDKelley commented Sep 12, 2020

@ioquatix Thanks for getting master green again. I've merged that in and now this branch is green too!

@ioquatix
Copy link
Member

Can you please rebase on master so I can merge - please don't merge master into your branch because it creates a huge mess.

image

… instance method that just delegate to class, require symbols
…tStarted), and silently no-op on start if already started
@ColinDKelley ColinDKelley force-pushed the issue_481-fix-hanging-race-condition branch from b4c5afe to 24dec52 Compare September 14, 2020 22:59
spec/support/acceptance_helper.rb Show resolved Hide resolved
spec/support/acceptance_helper.rb Show resolved Hide resolved
spec/lib/listen/fsm_spec.rb Show resolved Hide resolved
spec/lib/listen/fsm_spec.rb Show resolved Hide resolved
spec/lib/listen/fsm_spec.rb Show resolved Hide resolved
spec/lib/listen/event/processor_spec.rb Show resolved Hide resolved
spec/lib/listen/event/processor_spec.rb Show resolved Hide resolved
spec/lib/listen/event/processor_spec.rb Show resolved Hide resolved
lib/listen/fsm.rb Show resolved Hide resolved
lib/listen/fsm.rb Show resolved Hide resolved
@ColinDKelley
Copy link
Collaborator Author

@ioquatix : Rebased and pushed.

@ioquatix ioquatix merged commit 86e06ae into guard:master Sep 16, 2020
@ioquatix
Copy link
Member

Great work, thanks for your effort. Do you have time to jump on call some time this week to discuss a release plan/schedule?

@ColinDKelley
Copy link
Collaborator Author

Hi @ioquatix Thank you! Yes, this week is good for a call. I'm on US/Pacific time. How about if we pick an exact time and medium by email?

@ColinDKelley ColinDKelley deleted the issue_481-fix-hanging-race-condition branch October 30, 2020 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants