Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Wait for autodetect to be ready in the datafeed #37349

Merged
merged 1 commit into from Jan 11, 2019

Conversation

droberts195
Copy link
Contributor

This is a reinforcement of #37227. It turns out that
persistent tasks are not made stale if the node they
were running on is restarted and the master node does
not notice this. The main scenario where this happens
is when minimum master nodes is the same as the number
of nodes in the cluster, so the cluster cannot elect a
master node when any node is restarted.

When an ML node restarts we need the datafeeds for any
jobs that were running on that node to not just wait
until the jobs are allocated, but to wait for the
autodetect process of the job to start up. In the case
of reassignment of the job persistent task this was
dealt with by the stale status test. But in the case
where a node restarts but its persistent tasks are not
reassigned we need a deeper test.

Fixes #36810

This is a reinforcement of elastic#37227.  It turns out that
persistent tasks are not made stale if the node they
were running on is restarted and the master node does
not notice this.  The main scenario where this happens
is when minimum master nodes is the same as the number
of nodes in the cluster, so the cluster cannot elect a
master node when any node is restarted.

When an ML node restarts we need the datafeeds for any
jobs that were running on that node to not just wait
until the jobs are allocated, but to wait for the
autodetect process of the job to start up.  In the case
of reassignment of the job persistent task this was
dealt with by the stale status test.  But in the case
where a node restarts but its persistent tasks are not
reassigned we need a deeper test.

Fixes elastic#36810
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left one observation that isn't necessary for this change

@@ -327,7 +337,8 @@ public void forecastJob(JobTask jobTask, ForecastParams params, Consumer<Excepti
public void writeUpdateProcessMessage(JobTask jobTask, UpdateParams updateParams, Consumer<Exception> handler) {
AutodetectCommunicator communicator = getOpenAutodetectCommunicator(jobTask);
if (communicator == null) {
String message = "Cannot process update model debug config because job [" + jobTask.getJobId() + "] is not open";
String message = "Cannot process update model debug config because job [" + jobTask.getJobId() +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message probably made sense once but it doesn't anymore. I'd suggest
Cannot update the job config because job...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a new PR to change that.

@droberts195 droberts195 merged commit 1da59db into elastic:master Jan 11, 2019
@droberts195 droberts195 deleted the auto_com_check_in_datafeed branch January 11, 2019 13:22
droberts195 added a commit that referenced this pull request Jan 11, 2019
This is a reinforcement of #37227.  It turns out that
persistent tasks are not made stale if the node they
were running on is restarted and the master node does
not notice this.  The main scenario where this happens
is when minimum master nodes is the same as the number
of nodes in the cluster, so the cluster cannot elect a
master node when any node is restarted.

When an ML node restarts we need the datafeeds for any
jobs that were running on that node to not just wait
until the jobs are allocated, but to wait for the
autodetect process of the job to start up.  In the case
of reassignment of the job persistent task this was
dealt with by the stale status test.  But in the case
where a node restarts but its persistent tasks are not
reassigned we need a deeper test.

Fixes #36810
droberts195 added a commit that referenced this pull request Jan 11, 2019
This is a reinforcement of #37227.  It turns out that
persistent tasks are not made stale if the node they
were running on is restarted and the master node does
not notice this.  The main scenario where this happens
is when minimum master nodes is the same as the number
of nodes in the cluster, so the cluster cannot elect a
master node when any node is restarted.

When an ML node restarts we need the datafeeds for any
jobs that were running on that node to not just wait
until the jobs are allocated, but to wait for the
autodetect process of the job to start up.  In the case
of reassignment of the job persistent task this was
dealt with by the stale status test.  But in the case
where a node restarts but its persistent tasks are not
reassigned we need a deeper test.

Fixes #36810
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants