Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Intermittently Failing Tests #271

Closed
wants to merge 5 commits into from
Closed

Fixing Intermittently Failing Tests #271

wants to merge 5 commits into from

Conversation

jdantonio
Copy link
Member

See #263

@jdantonio
Copy link
Member Author

@chrisseaton I've been trying to figure out why the tests for Exchanger fail intermittently and I believe I have found a race condition in the current MVar version of the #exchange method. My understanding of MVar may be incomplete, however. Please review and give me your thoughts.

The intermittent failure seems to only occur on JRuby. More often that not the failure is a deadlock (see here). Other times the function returns the wrong value (see here).

If the #take call on line 37 returns anything other than MVar::TIMEOUT then two competing threads may race for the elsif and else clauses on lines 40 and 48. In theory, two threads could both read EMPTY from @first.take(timeout) on line 37 then attempt the put on 41. Similarly, two threads could simultaneously attempt either put on lines 49 and 50.

Is this correct?

@chrisseaton
Copy link
Member

Yes I think that's right. If two threads raced to do the put, one of them would succeed and the other would block. That explains the deadlock.

I tried to fix this, but the really tricky bit is the timeouts. I can get it correct I think up until the point where one thread has published its object and is waiting for the second. If that times out, it has to unpublished its object, but atomically.

It might be best to move back to a simpler implementation using lower level primitives more directly - it might have been a mistake to try to use MVar for this with timeouts.

@chrisseaton
Copy link
Member

      # Both threads modify the first variable
      first_result = @first.modify(timeout) do |first|
        # Does it currently contain the special empty value?
        if first == EMPTY
          # If so, modify it to contain our value
          value
        else
          # Otherwise, modify it back to the empty state
          EMPTY
        end
      end

      # If that timed out, the whole operation timed out
      return nil if first_result == MVar::TIMEOUT

      # What was in @first before we modified it?
      if first_result == EMPTY
        # We stored our object - someone else will turn up with the second
        # object at some point in the future

        # Wait for the second object to appear
        second_result = @second.take(timeout)

        # If that timed out, the whole operation timed out
        return nil if second_result == MVar::TIMEOUT

        # BUT HOW DO WE CANCEL OUR RESULT BEING AVAILABLE IN @first?

        # Return that second object
        second_result
      else
        # We reset @first to be empty again - so the other value is in
        # first_result and we need to tell the other thread about our value

        # Tell the other thread about our object
        second_result = @second.put(value, timeout)

        # If that timed out, the whole operation timed out
        return nil if second_result == MVar::TIMEOUT

        # We already have its object
        first_result
      end
    end

@jdantonio
Copy link
Member Author

@chrisseaton Thank you for looking into this. I just pushed a commit (308f2c5) that includes two Exchanger implementations. One includes your update and the other is the original implementation from @mighe. Both pass the current tests. The slot-based implementation doesn't appear to handle timeouts, either (the result of wait_for_value is never checked). We only have one test for timeouts so I don't think our test suite is sufficient. I've been reviewing the Java implementation and, not surprisingly, it's much more complex. Do you know of a simpler implementation that the Java one that I can use as a model? I don't mind adding more tests and doing a new implementation, but I'd be more confident if I could follow a proven implementation in another language.

@jdantonio jdantonio mentioned this pull request Apr 7, 2015
@pitr-ch pitr-ch added this to the 0.9.0 Release milestone Apr 9, 2015
@jdantonio jdantonio mentioned this pull request Apr 12, 2015
@jdantonio
Copy link
Member Author

The only significant changes in this PR are for Exchanger, which we now know to be buggy. No need to merge these changes so I'm closing.

@jdantonio jdantonio closed this May 2, 2015
@jdantonio jdantonio deleted the fix/tests branch May 2, 2015 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants