Skip to content
This repository has been archived by the owner on Sep 16, 2021. It is now read-only.

Potential race condition in lazy futures #83

Open
alechenninger opened this issue Apr 5, 2017 · 4 comments
Open

Potential race condition in lazy futures #83

alechenninger opened this issue Apr 5, 2017 · 4 comments

Comments

@alechenninger
Copy link
Contributor

I think this can happen?

  1. Future A calls get, see it is not completed, calls completer, starts waiting
  2. Future B, in same batch, calls get, see it is not completed, calls completer, this future is no longer in the batch, so it completes before future A's call to completer
  3. Future B sees it is not completed but it called its completer. Kaboom!

Seems like an obvious miss, so perhaps this is not right. Will look later.

@alechenninger
Copy link
Contributor Author

@dcrissman @snehagunta

So this has to do with the timeout handling we recently added. Basically, currently we don't share a bulk requester between multiple threads, and wait for all futures individually. However, when one times out (a new feature), we move on to the next future, which will be in the same batch, but at that point the completer is running in another thread, so the next future's .get will be called and won't be completed, triggering the bug.

The fix is complicated. A workaround is to set the timeout really high for messages. I'd like to talk about fixing for real, but I will be out next week.

@alechenninger
Copy link
Contributor Author

@snehagunta We should also look at why those messages are timing out

@dcrissman
Copy link
Member

Seen exception:

2017-04-26 14:26:36,685 [Camel (camel-1) thread #0 - SjmsBatchConsumer] ERROR [org.esbtools.eventhandler.AsyncBatchMessageProcessorRoute] Failed to process message: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Future attempted to lazily trigger completion, but completer did not actually complete the future. Check the provided completer function for correctness.
	at org.esbtools.eventhandler.lightblue.client.BulkLightblueRequester$LazyTransformableFuture.get(BulkLightblueRequester.java:408)
	at org.esbtools.eventhandler.lightblue.client.BulkLightblueRequester$LazyTransformableFuture.get(BulkLightblueRequester.java:383)
	at org.esbtools.eventhandler.lightblue.client.BulkLightblueRequester$LazyRequestTransformableFuture.get(BulkLightblueRequester.java:548)
	at org.esbtools.eventhandler.NestedTransformableFutureIgnoringReturn.get(NestedTransformableFutureIgnoringReturn.java:64)
	at org.esbtools.eventhandler.NestedTransformableFutureIgnoringReturn.get(NestedTransformableFutureIgnoringReturn.java:25)
	at org.esbtools.eventhandler.AsyncBatchMessageProcessorRoute.lambda$configure$0(AsyncBatchMessageProcessorRoute.java:123)
	at org.apache.camel.processor.DelegateSyncProcessor.process(DelegateSyncProcessor.java:63)
	at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
	at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:468)
	at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)
	at org.apache.camel.processor.Pipeline.process(Pipeline.java:121)
	at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
	at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)
	at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:109)
	at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:87)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop.processBatch(SjmsBatchConsumer.java:427)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop.access$1900(SjmsBatchConsumer.java:215)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop$BatchConsumptionTask.completionBatch(SjmsBatchConsumer.java:367)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop$BatchConsumptionTask.consumeBatchesOnLoop(SjmsBatchConsumer.java:303)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop$BatchConsumptionTask.access$400(SjmsBatchConsumer.java:269)
	at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop.run(SjmsBatchConsumer.java:236)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Future attempted to lazily trigger completion, but completer did not actually complete the future. Check the provided completer function for correctness.
	... 24 more

@alechenninger
Copy link
Contributor Author

I think to fix we need to:

  1. Track what requests are being processed. This must be an atomic move from queued to in progress.
  2. Completer must start its requests (this happens already), but it should also wait for its requests to be completed if they are already in progress by another thread (doesn't happen). One option is to pass requests to doQueuedRequestsAndCompleteFutures and have it wait. The "waiting" part is also nontrivial: you will probably need a CountDownLatch associated with a batch of in progress requests.

This means however, that a timed-out future may fail everything in that batch, since the thread making the requests is interrupted, therefore throws an exception, therefore fails all of the futures in that batch. However, we have retries and idempotent messages for this reason already.

Food for thought: hystrix already has request batching built in. We could consider instead making a HystrixBulkLightblueRequester that made each request using hystrix, and let hystrix handle the thread pooling and batching, as long as we made commands batchable. This would probably be more maintainable, safer, and easier to implement. I would strongly consider that route.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants