Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-linearizable behavior of concurrent queues #216

Closed
alefedor opened this issue Aug 13, 2020 · 10 comments
Closed

Non-linearizable behavior of concurrent queues #216

alefedor opened this issue Aug 13, 2020 · 10 comments

Comments

@alefedor
Copy link

There is a bug in ManyToManyConcurrentArrayQueue, ManyToOneConcurrentArrayQueue and ManyToOneConcurrentLinkedQueue that leads to poll returning null even if there was an offer operation before in the same thread and there are no concurrent poll operations.

I believe that this bug differs from the one mentioned in documentation, because it is not connected to isEmpty or size methods and queues are known to be not empty.

The bug can be reproduced with Lincheck.
The scenario that can produce non-linearizable results is the same for all mentioned queues:

Parallel part:
| offer(1): true | offer(5): true |
|                | poll():   null |

There are two parallel threads and the poll operation in the second thread returns null despite the fact that the queue definitely contains at least one element (the element that was added by the previous offer(5)).

Here is an example of incorrect execution for ManyToOneConcurrentLinkedQueue (yet to be released feature in Lincheck):

= Parallel part execution: =
|                      | offer(5)                                                                                                          |
|                      |   swapTail(@34bccdde): @60cd5f19 at ManyToOneConcurrentLinkedQueue.offer(ManyToOneConcurrentLinkedQueue.java:140) |
|                      |   switch                                                                                                          |
| offer(1): true       |                                                                                                                   |
| poll(): null         |                                                                                                                   |
|   thread is finished |                                                                                                                   |
|                      |   putOrderedObject(@60cd5f19,16,@34bccdde) at Node.nextOrdered(ManyToOneConcurrentLinkedQueue.java:46)            |
|                      |   result: true                                                                                                    |
|                      |   thread is finished                                                                                              |   
@mjpt777
Copy link
Contributor

mjpt777 commented Aug 13, 2020

Thanks @alefedor this is a interesting test. The behaviour is related to why isEmpty() and size() are different to the Java JDK queues. This may not be obvious at first glance. The interaction between offer and poll is concurrently safe but not sequentially consistent. We could "fix" this by busying spinning within the offer and poll methods when we know a concurrent offer or poll operation is occuring. However this can cause worse behaviour in a production system as threads get starved due to unhelpful spinning. It can be argued that it is better to retry the poll operation or do some other useful work in the meantime, driven by the application. Within Aeron we have lots of good examples of this for the agents.

For an overall system to make good progress it is better not to spin on a condition more than necessary which can starve out other threads which could be making progress. We will need to ponder this and consider how we improve the documentation to make it expectations more clear. I believe there is a failing in the Java Queue expectations of not separating concurrent and non-concurrent queues. Consider the implementation of iterators on a queue for how the design goes down a rathole. The really efficient queue implementations that provide for good overall system progress are often not sequentially consistent between operations, but they are concurrently safe. Maybe this relaxed behaviour could do with a good description and rationale.

All this being said thanks for raising the issue and the tool looks interesting.

@ndkoval
Copy link

ndkoval commented Aug 13, 2020

Hi @mjpt777! Is that true that in order to reduce contention poll() can return null when a concurrent offer(..) is in progress?

@alefedor
Copy link
Author

@mjpt777, if these data structures are relaxed, then this behavior is fine, but it probably should have been described either in the documentation or even in the name. Anyway, thank you for your detailed reply!

@mjpt777
Copy link
Contributor

mjpt777 commented Aug 13, 2020

@ndkoval In this implementation that is correct. If the thread doing the offer takes an interrupt and the polling thread did not return null it would have to spin and this could further delay the time until the offering thread could run again. The offered elements are not lost, this implementation allows the consumer to not get tied up and thus it can make other progress outside this method.

In a realistic system it is not natural for a producer to immediately try to consume what it just produced. While it is possible to implement a queue that supports this the cost of doing so is high for an unnatural requirement.

@mjpt777
Copy link
Contributor

mjpt777 commented Aug 13, 2020

@alefedor You are right to suggest the documentation could be improved and we will take that onboard.

@franz1981
Copy link
Contributor

Totally agree with @mjpt777 : on JCTools we have chosen to impl a relaxedPoll API to overcome the limitations of the Java Queue API for this same reason.
On JCTools we haven't still used the shiny new Thread:: onSpinWait call for such busy spinning scenarios but probably providing a IdleStrategy on poll to allow a caller to choose what to do while spin/waiting is a more declarative way to better use the time while awaiting element to appear while saving cache trashing on the producer sequence and the last element yet to be offered.

@itsmemrbump
Copy link

Appreciate that this has been closed for a while but it seemed the most apt place to comment. I am using ManyToOneConcurrentLinkedQueue with a single publisher and single consumer. When I do something like

if(!queue.isEmpty()) {
   long l = queue.peek().getValue();
}

it sometimes throws a NullPointerException due to null being returned on peek(). Having read the javadoc and the comments above I inferred that null could be returned on peek() or poll() if size() was greater than zero but that this would not happen with isEmpty(). Have I misinterpreted that or is that unexpected (in which case I will try and get a testcase)? Would guess it is the former but thought it worth checking.

@mikeb01
Copy link
Contributor

mikeb01 commented May 8, 2022

While I think it would be possible make this work for your specific case and the ManyToOne queue, I think it is the wrong approach to take. With concurrent structures, like a concurrent queue, it is not safe to assume that the data will be consistent across independent method calls. The two operations will not represent one atomic action. The far safer option is to just assume that queue.peek() can return null and deal with it. E.g.

final Message message = queue.peek();
if (null != message) {
    long l = message.getValue();
}

This code will always be safe and achieves the same thing and will likely be just as efficient (if not faster).

Were you to change this to a ManyToMany queue and had other threads consuming messages as well, the code would still continue to work correctly, where as relying on the isEmpty check would be broken in that scenario.

@itsmemrbump
Copy link

Thanks @mikeb01. Checking for null is fine. Just wanted to double check on the expectations of isEmpty().

@mjpt777
Copy link
Contributor

mjpt777 commented May 8, 2022

@itsmemrbump The JDK Queue interface is a very bad design for a concurrent queue. As a general rule, and @mikeb01 points this out, that concurrent access is only consistent within the context of a single method call. The world can have changed between calls in an concurrent context. It is best to design and implement with this in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants