-
Notifications
You must be signed in to change notification settings - Fork 5
Potential race condition in lazy futures #83
Comments
So this has to do with the timeout handling we recently added. Basically, currently we don't share a bulk requester between multiple threads, and wait for all futures individually. However, when one times out (a new feature), we move on to the next future, which will be in the same batch, but at that point the completer is running in another thread, so the next future's .get will be called and won't be completed, triggering the bug. The fix is complicated. A workaround is to set the timeout really high for messages. I'd like to talk about fixing for real, but I will be out next week. |
@snehagunta We should also look at why those messages are timing out |
Seen exception:
|
I think to fix we need to:
This means however, that a timed-out future may fail everything in that batch, since the thread making the requests is interrupted, therefore throws an exception, therefore fails all of the futures in that batch. However, we have retries and idempotent messages for this reason already. Food for thought: hystrix already has request batching built in. We could consider instead making a HystrixBulkLightblueRequester that made each request using hystrix, and let hystrix handle the thread pooling and batching, as long as we made commands batchable. This would probably be more maintainable, safer, and easier to implement. I would strongly consider that route. |
I think this can happen?
Seems like an obvious miss, so perhaps this is not right. Will look later.
The text was updated successfully, but these errors were encountered: