Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: Partition Aware #373

Closed
hanahmily opened this issue Jul 5, 2017 · 0 comments
Closed

New Feature: Partition Aware #373

hanahmily opened this issue Jul 5, 2017 · 0 comments
Assignees

Comments

@hanahmily
Copy link
Contributor

Problems

At present,elastic-job-cloud-scheduler treated TASK_LOST like TASK_FAIL/TASK_ERROR, scheduled a new task。This behavior can cause problems:

The master lost contact with a running task (e.g., due to a network partition), but the task may still be running. Then two same task would be running together.

That behavior might be all right for long running service(e.g., web service). But batch job task would be abnormal.

Solution

From Mesos 1.1.0, it supposed PARTITION AWARE feature. Old TASK_LOST state was transformed to 5 different type state:

  • TASK_DROPPED means “task failed to launch”.
    The task is definitely not running.
  • TASK_UNREACHABLE means that the task was running on an agent that has failed health checks -- i.e., the master hasn’t heard from the agent running the task for a configurable period of time.
    The task may still be running.
  • TASK_GONE(_BY_OPERATOR) means “task was running on an agent that has been terminated.”
    The task is definitely not running.
  • TASK_UNKNOWN means that the master has no knowledge of the task.
    This might because either (a) the task was never known to the master, or (b) the agent has been GC’d from the list of unreachable or confirmed-dead agents.
    The task may still be running.

scheduler used those state to determine when a task has truly terminated(TASK_DROPPED TASK_GONE, TASK_GONE_BY_OPERATOR,).

scheduler also should support configurable strategy for different states. The strategy might be to launch a new task or just send alerts.

Which version of Elastic-Job do you using?

2.1.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant