Potential race condition in lazy futures #83

alechenninger · 2017-04-05T19:35:44Z

I think this can happen?

Future A calls get, see it is not completed, calls completer, starts waiting
Future B, in same batch, calls get, see it is not completed, calls completer, this future is no longer in the batch, so it completes before future A's call to completer
Future B sees it is not completed but it called its completer. Kaboom!

Seems like an obvious miss, so perhaps this is not right. Will look later.

alechenninger · 2017-04-27T13:31:06Z

So this has to do with the timeout handling we recently added. Basically, currently we don't share a bulk requester between multiple threads, and wait for all futures individually. However, when one times out (a new feature), we move on to the next future, which will be in the same batch, but at that point the completer is running in another thread, so the next future's .get will be called and won't be completed, triggering the bug.

The fix is complicated. A workaround is to set the timeout really high for messages. I'd like to talk about fixing for real, but I will be out next week.

alechenninger · 2017-04-27T13:50:55Z

@snehagunta We should also look at why those messages are timing out

dcrissman · 2017-04-27T14:08:59Z

Seen exception:

2017-04-26 14:26:36,685 [Camel (camel-1) thread #0 - SjmsBatchConsumer] ERROR [org.esbtools.eventhandler.AsyncBatchMessageProcessorRoute] Failed to process message: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Future attempted to lazily trigger completion, but completer did not actually complete the future. Check the provided completer function for correctness.
	at org.esbtools.eventhandler.lightblue.client.BulkLightblueRequester$LazyTransformableFuture.get(BulkLightblueRequester.java:408)
	at org.esbtools.eventhandler.lightblue.client.BulkLightblueRequester$LazyTransformableFuture.get(BulkLightblueRequester.java:383)
	at org.esbtools.eventhandler.lightblue.client.BulkLightblueRequester$LazyRequestTransformableFuture.get(BulkLightblueRequester.java:548)
	at org.esbtools.eventhandler.NestedTransformableFutureIgnoringReturn.get(NestedTransformableFutureIgnoringReturn.java:64)
	at org.esbtools.eventhandler.NestedTransformableFutureIgnoringReturn.get(NestedTransformableFutureIgnoringReturn.java:25)
	at org.esbtools.eventhandler.AsyncBatchMessageProcessorRoute.lambda$configure$0(AsyncBatchMessageProcessorRoute.java:123)
	at org.apache.camel.processor.DelegateSyncProcessor.process(DelegateSyncProcessor.java:63)
	at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
	at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:468)
	at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)
	at org.apache.camel.processor.Pipeline.process(Pipeline.java:121)
	at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
	at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)
	at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:109)
	at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:87)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop.processBatch(SjmsBatchConsumer.java:427)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop.access$1900(SjmsBatchConsumer.java:215)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop$BatchConsumptionTask.completionBatch(SjmsBatchConsumer.java:367)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop$BatchConsumptionTask.consumeBatchesOnLoop(SjmsBatchConsumer.java:303)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop$BatchConsumptionTask.access$400(SjmsBatchConsumer.java:269)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop.run(SjmsBatchConsumer.java:236)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Future attempted to lazily trigger completion, but completer did not actually complete the future. Check the provided completer function for correctness.
	... 24 more

alechenninger · 2017-04-27T14:21:40Z

I think to fix we need to:

Track what requests are being processed. This must be an atomic move from queued to in progress.
Completer must start its requests (this happens already), but it should also wait for its requests to be completed if they are already in progress by another thread (doesn't happen). One option is to pass requests to doQueuedRequestsAndCompleteFutures and have it wait. The "waiting" part is also nontrivial: you will probably need a CountDownLatch associated with a batch of in progress requests.

This means however, that a timed-out future may fail everything in that batch, since the thread making the requests is interrupted, therefore throws an exception, therefore fails all of the futures in that batch. However, we have retries and idempotent messages for this reason already.

Food for thought: hystrix already has request batching built in. We could consider instead making a HystrixBulkLightblueRequester that made each request using hystrix, and let hystrix handle the thread pooling and batching, as long as we made commands batchable. This would probably be more maintainable, safer, and easier to implement. I would strongly consider that route.

alechenninger added module: lightblue type: bug labels Apr 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential race condition in lazy futures #83

Potential race condition in lazy futures #83

alechenninger commented Apr 5, 2017

alechenninger commented Apr 27, 2017

alechenninger commented Apr 27, 2017

dcrissman commented Apr 27, 2017

alechenninger commented Apr 27, 2017

Potential race condition in lazy futures #83

Potential race condition in lazy futures #83

Comments

alechenninger commented Apr 5, 2017

alechenninger commented Apr 27, 2017

alechenninger commented Apr 27, 2017

dcrissman commented Apr 27, 2017

alechenninger commented Apr 27, 2017