Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_cat/pending_tasks returns "unknown" tasks #9354

Closed
haardvark opened this issue Jan 19, 2015 · 1 comment
Closed

_cat/pending_tasks returns "unknown" tasks #9354

haardvark opened this issue Jan 19, 2015 · 1 comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >enhancement good first issue low hanging fruit help wanted adoptme v1.5.0

Comments

@haardvark
Copy link

We experienced a buildup of pending tasks in the cluster this weekend after some nodes dropped. Eventually, the cluster completely stopped processing tasks altogether. When I went to check the tasks with _cat/pending_tasks, I saw some tasks that looked to be invalid:

7083 12m URGENT shard-started ([session-2014-12-27][24], node[RlRevMXMSOqc3qORtp3xew], [P], s[INITIALIZING]), reason [after recovery from gateway]
-12 12m unknown
7092 12m URGENT shard-started ([session-2014-12-31][8], node[vEgbZbjXTVixtbpmh6Dh5A], [P], s[INITIALIZING]), reason [after recovery from gateway]

When calling _cat/pending_tasks from a node that wasn't the current master, we would occasionally see get a 500 saying "can't deserialize task - no priority exists for [-12]".

It was suggested to us that this has been when different JVM versions are running in the same cluster, but this isn't the case in our deployment. There is an ES support ticket open for this overall outage (6726) that may provide some more context.

@clintongormley
Copy link

Apparently the reason for this is as follows: when a task times out, the timeout action is placed onto the pending tasks queue, without an insertion order and source unknown. This works fine on the master, but serialization to other nodes fails (which is not actually a problem, but isn't pretty).

We should:

  • add an insertion order
  • handle serialization to other nodes properly

@clintongormley clintongormley added >enhancement help wanted adoptme :Data Management/Stats Statistics tracking and retrieval APIs good first issue low hanging fruit v1.5.0 labels Jan 20, 2015
bleskes added a commit to bleskes/elasticsearch that referenced this issue Feb 12, 2015
…ds that go into InternalClusterService.updateTasksExecutor

 At the moment we sometime submit generic runnables, which make life slightly harder when generated pending task list which have to account for them. This commit adds an abstract TimedPrioritizedRunnable class which should always be used. This class also automatically measures time in queue, which is needed for the pending task reporting.

  Relates to elastic#8077

  Closes elastic#9354
bleskes added a commit to bleskes/elasticsearch that referenced this issue Feb 12, 2015
…ds that go into InternalClusterService.updateTasksExecutor

 At the moment we sometime submit generic runnables, which make life slightly harder when generated pending task list which have to account for them. This commit adds an abstract TimedPrioritizedRunnable class which should always be used. This class also automatically measures time in queue, which is needed for the pending task reporting.

  Relates to elastic#8077

  Closes elastic#9354
  Closes elastic#9671
bleskes added a commit to bleskes/elasticsearch that referenced this issue Feb 12, 2015
…ds that go into InternalClusterService.updateTasksExecutor

 At the moment we sometime submit generic runnables, which make life slightly harder when generated pending task list which have to account for them. This commit adds an abstract TimedPrioritizedRunnable class which should always be used. This class also automatically measures time in queue, which is needed for the pending task reporting.

  Relates to elastic#8077

  Closes elastic#9354
  Closes elastic#9671
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
…ds that go into InternalClusterService.updateTasksExecutor

 At the moment we sometime submit generic runnables, which make life slightly harder when generated pending task list which have to account for them. This commit adds an abstract TimedPrioritizedRunnable class which should always be used. This class also automatically measures time in queue, which is needed for the pending task reporting.

  Relates to elastic#8077

  Closes elastic#9354
  Closes elastic#9671
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >enhancement good first issue low hanging fruit help wanted adoptme v1.5.0
Projects
None yet
Development

No branches or pull requests

2 participants