-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random failures in ContinuationQueue#poll #462
Comments
Thanks for spending your time on an investigation. Can you please submit a pull request? |
Yes, that |
Actually even deleting the
(EDIT: by tracing it I've seen this loop body execute a second time, perhaps a little surprising given that the only remaining At this point this is starting to look like the complexity of |
Yes, it is unfortunate that Ruby's memory model is not defined or even
documented.
I will do some testing of your version.
On Mon, 20 Feb 2017 at 03:00, Joseph Wong ***@***.***> wrote:
Actually even deleting the @cond.signal still sometimes result in a
failure... which made me wonder how Ruby's memory model handles making
changes visible across threads. This worked out better for me:
def poll(timeout_in_ms = nil)
timeout = timeout_in_ms ? timeout_in_ms / 1000.0 : nil
@lock.synchronize do
expiry = Time.now.utc + timeout
while @q.empty?
wait = expiry - Time.now.utc
@***@***.***, wait)
raise ::Timeout::Error if Time.now.utc >= expiry
end
item = @q.shift
item
end
end
At this point this is starting to look like the complexity of ::Queue's
implementation itself. (Too bad it doesn't do timeouts natively.)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#462 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAEQi9Jcsol7mTQFX599uNQHu7cpLrIks5reNeZgaJpZM4MFnjL>
.
--
Staff Software Engineer, Pivotal/RabbitMQ
|
Your version doesn't handle |
@jhtwong please give master a try and thank you for the original patch. If this works well or at least better for you for a few days I can backport it to |
Thanks @michaelklishin! I just tried master @ 1f1c109 and so far so good for me. It'd be great to have it backported to (p.s. I had snuck in the |
@jhtwong thank you for digging out an almost 2 year old concurrency primitive issue. Let's wait for a few days and then backport. |
Less susceptible to race conditions. Fixes #462 (or so we hope).
@michaelklishin will a version |
I can do a release tomorrow if it's been working well for you.
… On 4 Mar 2017, at 02:10, Joseph Wong ***@***.***> wrote:
@michaelklishin will a version 2.6.4 be cut with this fix? Thanks for backporting to 2.6.x!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
It's be working well for me, and thanks for releasing 2.6.4! 👍 |
@michaelklishin is it possible that you didn't push the |
@igorwwwwwwwwwwwwwwwwwwww oh, indeed. That's for spotting. |
@michaelklishin thanks! ❤️ |
@michaelklishin I am getting time out error even for bunny 2.7.2. Below I have given the error : /home/user/.rvm/gems/ruby-2.3.1/gems/bunny-2.7.2/lib/bunny/concurrent/continuation_queue.rb:39:in `block in poll': Timeout::Error (Timeout::Error) |
@SmrutiSuman this is not a support forum. Consider providing more information when you ask others for help. |
I have been debugging random failures on Queue#subscribe and Session#create_channel for days now, where they seem to die with the top of the stack like this:
Per documentation I only allow one thread to access a particular channel at any given time. But still I run into random race conditions between an attempt to subscribe and a concurrent attempt by bunny to deliver a message on another subscription (on the same channel).
After digging into things, I found that
ContinuationQueue#poll
has a bug in it: it shouldn't do a@cond.signal
after picking off the top of the@q
array, if the contract of the@cond
ConditionVariable is that it is signaled if and only if something is added to@q
.See:
b978942#diff-0066bdaf813d7be3eec58dfc0e476ec3R36
After deleting this one line the random failures seem to have gone away completely.
The text was updated successfully, but these errors were encountered: