Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk task queue grows infinitely after upgrading to 6.2.1 (from 5.6.4) #28714

Closed
bra-fsn opened this issue Feb 17, 2018 · 7 comments
Closed

Bulk task queue grows infinitely after upgrading to 6.2.1 (from 5.6.4) #28714

bra-fsn opened this issue Feb 17, 2018 · 7 comments
Assignees
Labels
>bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard.

Comments

@bra-fsn
Copy link

bra-fsn commented Feb 17, 2018

Elasticsearch version (bin/elasticsearch --version):
Version: 6.2.1, Build: 7299dc3/2018-02-07T19:34:26.990113Z, JVM: 1.8.0_131

Plugins installed: []
analysis-icu

JVM version (java -version):
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

OS version (uname -a if on a Unix-like system):
FreeBSD 11.1

Description of the problem including expected versus actual behavior:
After upgrading to 6.2.1, one of the data (and also client) nodes reject all bulk operations. In the reject response queued tasks grow infinitely while completed tasks doesn't change (see attached logs).

Steps to reproduce:
We have a lot of different data and indices spread over 40 nodes. So far I could observe this error only on one node. When I try to restart the node with kill, it doesn't stop. Below is the stacktrace.

Provide logs (if relevant):
This is just about a minute (logged by an application, uses python elasticsearch client). queued tasks grow, while completed tasks doesn't.
with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@3684362a on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14926, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@11961ec on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14939, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@573952db on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14951, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@573952db on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14951, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@4127dfd2 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14960, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@4c6147ea on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14961, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@4958c58e on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14960, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@7acb9dba on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14967, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@75af7186 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14967, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@7830996e on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14972, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@513cdd87 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14979, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@1166d220 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14983, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@498b99a2 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14998, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@64e61a2f on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15005, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@372ab4a0 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15006, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@4381386b on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15006, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@2c661b6d on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15025, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@5fcb36db on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15025, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@21834e56 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15038, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@58444a25 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15038, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@25da450 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15047, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@6f3c0966 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15047, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@27485a8e on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15050, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@170e6b30 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15056, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'}

And this is the stacktrace after I tried to kill the node and it didn't stop:
https://pastebin.com/rnzESu5B

@danielmitterdorfer
Copy link
Member

Hi @bra-fsn, we reserve Github for bug reports and feature requests only. Please ask questions like these in the Elasticsearch forum instead. Thank you!

Some hints to get you started:

  • The stack traces indicate that the cluster is doing a couple of update and delete operations when you took them.
  • You should check the documentation on upgrades.
  • You can use the hot threads API to which operations take a significant amount of time (in your case it might make sense to set ignore_idle_threads to false)

@ywelsch
Copy link
Contributor

ywelsch commented Feb 19, 2018

There's a deadlock in InternalEngine (the stack dump even says so). /cc: @s1monw

The stack trace below should be pretty self-explanatory of what's going on:

Found one Java-level deadlock:

============================="


"elasticsearch[fmfe16][index][T#24]":

  waiting for ownable synchronizer 0x00000009df4b02c0, (a java.util.co\
	ncurrent.locks.ReentrantLock$NonfairSync)"
,
	  which is held by "elasticsearch[fmfe16][index][T#9]\
	""


"elasticsearch[fmfe16][index][T#9]":

  waiting to lock monitor 0x0000000afc30c4e8"
 (object 0x00000008b40309c8, a org.elasticsearch.index.engine.LiveVers\
	ionMap)"
,
	  which is held by "elasticsearch[fmfe16][get][T#14]""


"elasticsearch[fmfe16][get][T#14]":

  waiting for ownable synchronizer 0x00000009df4b02c0, (a java.util.co\
	ncurrent.locks.ReentrantLock$NonfairSync)"
,
	  which is held by "elasticsearch[fmfe16][index][T#9]\
	""




Java stack information for the threads listed above:

===================================================

"elasticsearch[fmfe16][index][T#24]":

	at sun.misc.Unsafe.park(Native Method)

	- parking to wait for  <0x00000009df4b02c0> (a java.util.concu\
	rrent.locks.ReentrantLock$NonfairSync)

	at java.util.concurrent.locks.LockSupport.park(LockSupport.jav\
	a:175)

	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkA\
	ndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acqui\
	reQueued(AbstractQueuedSynchronizer.java:870)

	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acqui\
	re(AbstractQueuedSynchronizer.java:1199)

	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(R\
	eentrantLock.java:209)

	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock\
	.java:285)

	at org.elasticsearch.common.util.concurrent.KeyedLock.acquire(\
	KeyedLock.java:76)

	at org.elasticsearch.index.engine.LiveVersionMap.acquireLock(L\
	iveVersionMap.java:431)

	at org.elasticsearch.index.engine.InternalEngine.pruneDeletedT\
	ombstones(InternalEngine.java:1661)

	at org.elasticsearch.index.engine.InternalEngine.maybePruneDel\
	etedTombstones(InternalEngine.java:1348)

	at org.elasticsearch.index.engine.InternalEngine.refresh(Inter\
	nalEngine.java:1427)

	at org.elasticsearch.index.engine.InternalEngine.get(InternalE\
	ngine.java:637)

	at org.elasticsearch.index.shard.IndexShard.get(IndexShard.jav\
	a:850)

	at org.elasticsearch.index.get.ShardGetService.innerGet(ShardG\
	etService.java:153)

	at org.elasticsearch.index.get.ShardGetService.get(ShardGetSer\
	vice.java:81)

	at org.elasticsearch.action.update.UpdateHelper.prepare(Update\
	Helper.java:74)

	at org.elasticsearch.action.update.TransportUpdateAction.shard\
	Operation(TransportUpdateAction.java:174)

	at org.elasticsearch.action.update.TransportUpdateAction.shard\
	Operation(TransportUpdateAction.java:167)

	at org.elasticsearch.action.update.TransportUpdateAction.shard\
	Operation(TransportUpdateAction.java:67)

	at org.elasticsearch.action.support.single.instance.TransportI\
	nstanceSingleOperationAction$ShardTransportHandler.messageReceived(Tra\
	nsportInstanceSingleOperationAction.java:243)

	at org.elasticsearch.action.support.single.instance.TransportI\
	nstanceSingleOperationAction$ShardTransportHandler.messageReceived(Tra\
	nsportInstanceSingleOperationAction.java:239)

	at org.elasticsearch.transport.TransportRequestHandler.message\
	Received(TransportRequestHandler.java:30)

	at org.elasticsearch.transport.RequestHandlerRegistry.processM\
	essageReceived(RequestHandlerRegistry.java:66)

	at org.elasticsearch.transport.TcpTransport$RequestHandler.doR\
	un(TcpTransport.java:1555)

	at org.elasticsearch.common.util.concurrent.ThreadContext$Cont\
	extPreservingAbstractRunnable.doRun(ThreadContext.java:635)

	at org.elasticsearch.common.util.concurrent.AbstractRunnable.r\
	un(AbstractRunnable.java:37)

	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoo\
	lExecutor.java:1149)

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPo\
	olExecutor.java:624)

	at java.lang.Thread.run(Thread.java:748)

"elasticsearch[fmfe16][index][T#9]":

	at org.elasticsearch.index.engine.InternalEngine.getVersionFro\
	mMap(InternalEngine.java:732)

	- waiting to lock <0x00000008b40309c8> "
(a org.elasticsearch.index.engine.LiveVersionMap)

	at org.elasticsearch.index.engine.InternalEngine.get(InternalE\
	ngine.java:627)

	at org.elasticsearch.index.shard.IndexShard.get(IndexShard.jav\
	a:850)

	at org.elasticsearch.index.get.ShardGetService.innerGet(ShardG\
	etService.java:153)

	at org.elasticsearch.index.get.ShardGetService.get(ShardGetSer\
	vice.java:81)

	at org.elasticsearch.action.update.UpdateHelper.prepare(Update\
	Helper.java:74)

	at org.elasticsearch.action.update.TransportUpdateAction.shard\
	Operation(TransportUpdateAction.java:174)

	at org.elasticsearch.action.update.TransportUpdateAction.shard\
	Operation(TransportUpdateAction.java:167)

	at org.elasticsearch.action.update.TransportUpdateAction.shard\
	Operation(TransportUpdateAction.java:67)

	at org.elasticsearch.action.support.single.instance.TransportI\
	nstanceSingleOperationAction$ShardTransportHandler.messageReceived(Tra\
	nsportInstanceSingleOperationAction.java:243)

	at org.elasticsearch.action.support.single.instance.TransportI\
	nstanceSingleOperationAction$ShardTransportHandler.messageReceived(Tra\
	nsportInstanceSingleOperationAction.java:239)

	at org.elasticsearch.transport.TransportRequestHandler.message\
	Received(TransportRequestHandler.java:30)

	at org.elasticsearch.transport.RequestHandlerRegistry.processM\
	essageReceived(RequestHandlerRegistry.java:66)

	at org.elasticsearch.transport.TcpTransport$RequestHandler.doR\
	un(TcpTransport.java:1555)

	at org.elasticsearch.common.util.concurrent.ThreadContext$Cont\
	extPreservingAbstractRunnable.doRun(ThreadContext.java:635)

	at org.elasticsearch.common.util.concurrent.AbstractRunnable.r\
	un(AbstractRunnable.java:37)

	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoo\
	lExecutor.java:1149)

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPo\
	olExecutor.java:624)

	at java.lang.Thread.run(Thread.java:748)

"elasticsearch[fmfe16][get][T#14]":

	at sun.misc.Unsafe.park(Native Method)

	- parking to wait for  <0x00000009df4b02c0> (a java.util.concu\
	rrent.locks.ReentrantLock$NonfairSync)

	at java.util.concurrent.locks.LockSupport.park(LockSupport.jav\
	a:175)

	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkA\
	ndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acqui\
	reQueued(AbstractQueuedSynchronizer.java:870)

	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acqui\
	re(AbstractQueuedSynchronizer.java:1199)

	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(R\
	eentrantLock.java:209)

	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock\
	.java:285)

	at org.elasticsearch.common.util.concurrent.KeyedLock.acquire(\
	KeyedLock.java:76)

	at org.elasticsearch.index.engine.LiveVersionMap.acquireLock(L\
	iveVersionMap.java:431)

	at org.elasticsearch.index.engine.InternalEngine.pruneDeletedT\
	ombstones(InternalEngine.java:1661)

	at org.elasticsearch.index.engine.InternalEngine.maybePruneDel\
	etedTombstones(InternalEngine.java:1348)

	at org.elasticsearch.index.engine.InternalEngine.refresh(Inter\
	nalEngine.java:1427)

	at org.elasticsearch.index.engine.InternalEngine.getVersionFro\
	mMap(InternalEngine.java:737)

	- locked <0x00000008b40309c8> "
(a org.elasticsearch.index.engine.LiveVersionMap)

	at org.elasticsearch.index.engine.InternalEngine.get(InternalE\
	ngine.java:627)

	at org.elasticsearch.index.shard.IndexShard.get(IndexShard.jav\
	a:850)

	at org.elasticsearch.index.get.ShardGetService.innerGet(ShardG\
	etService.java:153)

	at org.elasticsearch.index.get.ShardGetService.get(ShardGetSer\
	vice.java:81)

	at org.elasticsearch.action.get.TransportGetAction.shardOperat\
	ion(TransportGetAction.java:88)

	at org.elasticsearch.action.get.TransportGetAction.shardOperat\
	ion(TransportGetAction.java:44)

	at org.elasticsearch.action.support.single.shard.TransportSing\
	leShardAction$ShardTransportHandler.messageReceived(TransportSingleSha\
	rdAction.java:293)

	at org.elasticsearch.action.support.single.shard.TransportSing\
	leShardAction$ShardTransportHandler.messageReceived(TransportSingleSha\
	rdAction.java:286)

	at org.elasticsearch.transport.TransportRequestHandler.message\
	Received(TransportRequestHandler.java:30)

	at org.elasticsearch.transport.RequestHandlerRegistry.processM\
	essageReceived(RequestHandlerRegistry.java:66)

	at org.elasticsearch.transport.TcpTransport$RequestHandler.doR\
	un(TcpTransport.java:1555)

	at org.elasticsearch.common.util.concurrent.ThreadContext$Cont\
	extPreservingAbstractRunnable.doRun(ThreadContext.java:635)

	at org.elasticsearch.common.util.concurrent.AbstractRunnable.r\
	un(AbstractRunnable.java:37)

	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoo\
	lExecutor.java:1149)

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPo\
	olExecutor.java:624)

	at java.lang.Thread.run(Thread.java:748)


	Found 1 deadlock.

@ywelsch ywelsch reopened this Feb 19, 2018
@ywelsch ywelsch added >bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels Feb 19, 2018
@bra-fsn
Copy link
Author

bra-fsn commented Feb 19, 2018

@danielmitterdorfer: this is a bug report, not a question...

@danielmitterdorfer
Copy link
Member

@ywelsch thanks for double-checking. I did not spot the deadlock in the thread dump originally.

@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2018

I am looking into it

@s1monw s1monw self-assigned this Feb 19, 2018
s1monw added a commit to s1monw/elasticsearch that referenced this issue Feb 19, 2018
Pruning tombstones is best effort and should not block if a key is currently
locked. This can cause a deadlock in rare situations if we switch of append
only optimization while heavily updating the same key in the engine
while the LiveVersionMap is locked. This is very rare since this code
patch only executed every 15 seconds by default since that is the interval
we try to prune the deletes in the version map.

Closes elastic#28714
@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2018

@bra-fsn thanks for reporting this. @danielmitterdorfer @ywelsch I opened a pr for this.

@bra-fsn
Copy link
Author

bra-fsn commented Feb 20, 2018

@s1monw : great, thanks for fixing it!
Also thanks @ywelsch for not letting this down into the sink. :)

s1monw added a commit that referenced this issue Feb 20, 2018
Pruning tombstones is best effort and should not block if a key is currently
locked. This can cause a deadlock in rare situations if we switch of append
only optimization while heavily updating the same key in the engine
while the LiveVersionMap is locked. This is very rare since this code
patch only executed every 15 seconds by default since that is the interval
we try to prune the deletes in the version map.

Closes #28714
s1monw added a commit that referenced this issue Feb 20, 2018
Pruning tombstones is best effort and should not block if a key is currently
locked. This can cause a deadlock in rare situations if we switch of append
only optimization while heavily updating the same key in the engine
while the LiveVersionMap is locked. This is very rare since this code
patch only executed every 15 seconds by default since that is the interval
we try to prune the deletes in the version map.

Closes #28714
s1monw added a commit that referenced this issue Feb 20, 2018
Pruning tombstones is best effort and should not block if a key is currently
locked. This can cause a deadlock in rare situations if we switch of append
only optimization while heavily updating the same key in the engine
while the LiveVersionMap is locked. This is very rare since this code
patch only executed every 15 seconds by default since that is the interval
we try to prune the deletes in the version map.

Closes #28714
sebasjm pushed a commit to sebasjm/elasticsearch that referenced this issue Mar 10, 2018
Pruning tombstones is best effort and should not block if a key is currently
locked. This can cause a deadlock in rare situations if we switch of append
only optimization while heavily updating the same key in the engine
while the LiveVersionMap is locked. This is very rare since this code
patch only executed every 15 seconds by default since that is the interval
we try to prune the deletes in the version map.

Closes elastic#28714
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard.
Projects
None yet
Development

No branches or pull requests

4 participants