_cat/pending_tasks returns "unknown" tasks #9354

haardvark · 2015-01-19T22:24:28Z

We experienced a buildup of pending tasks in the cluster this weekend after some nodes dropped. Eventually, the cluster completely stopped processing tasks altogether. When I went to check the tasks with _cat/pending_tasks, I saw some tasks that looked to be invalid:

7083 12m URGENT shard-started ([session-2014-12-27][24], node[RlRevMXMSOqc3qORtp3xew], [P], s[INITIALIZING]), reason [after recovery from gateway]
-12 12m unknown
7092 12m URGENT shard-started ([session-2014-12-31][8], node[vEgbZbjXTVixtbpmh6Dh5A], [P], s[INITIALIZING]), reason [after recovery from gateway]

When calling _cat/pending_tasks from a node that wasn't the current master, we would occasionally see get a 500 saying "can't deserialize task - no priority exists for [-12]".

It was suggested to us that this has been when different JVM versions are running in the same cluster, but this isn't the case in our deployment. There is an ES support ticket open for this overall outage (6726) that may provide some more context.

clintongormley · 2015-01-20T10:52:18Z

Apparently the reason for this is as follows: when a task times out, the timeout action is placed onto the pending tasks queue, without an insertion order and source unknown. This works fine on the master, but serialization to other nodes fails (which is not actually a problem, but isn't pretty).

We should:

add an insertion order
handle serialization to other nodes properly

…ds that go into InternalClusterService.updateTasksExecutor At the moment we sometime submit generic runnables, which make life slightly harder when generated pending task list which have to account for them. This commit adds an abstract TimedPrioritizedRunnable class which should always be used. This class also automatically measures time in queue, which is needed for the pending task reporting. Relates to elastic#8077 Closes elastic#9354

…ds that go into InternalClusterService.updateTasksExecutor At the moment we sometime submit generic runnables, which make life slightly harder when generated pending task list which have to account for them. This commit adds an abstract TimedPrioritizedRunnable class which should always be used. This class also automatically measures time in queue, which is needed for the pending task reporting. Relates to elastic#8077 Closes elastic#9354 Closes elastic#9671

clintongormley added >enhancement help wanted adoptme :Data Management/Stats Statistics tracking and retrieval APIs good first issue low hanging fruit v1.5.0 labels Jan 20, 2015

bleskes mentioned this issue Feb 12, 2015

Introduce TimedPrioritizedRunnable base class to all commands that go into InternalClusterService.updateTasksExecutor #9671

Closed

bleskes closed this as completed in d6e9101 Feb 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_cat/pending_tasks returns "unknown" tasks #9354

_cat/pending_tasks returns "unknown" tasks #9354

haardvark commented Jan 19, 2015

clintongormley commented Jan 20, 2015

_cat/pending_tasks returns "unknown" tasks #9354

_cat/pending_tasks returns "unknown" tasks #9354

Comments

haardvark commented Jan 19, 2015

clintongormley commented Jan 20, 2015