Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] fix data frame analytics when there are no ML nodes but lazy node allocation is allowed #67840

Merged
merged 6 commits into from
Jan 25, 2021

Conversation

benwtrent
Copy link
Member

We cannot calculate memory size estimates if there are no ML nodes.

But, if lazy nodes are enabled (or lazy starting in the analytics config), we should still be able to start the job.

This commit adds two predicates:

  • In _explain if there are no ML nodes, but there are lazy nodes (or the data frame analytics config allows lazy opening), we simply skip the memory estimate (returning null)
  • In _start we skip checking the memory estimate via _explain if there are no nodes but lazy starting is allowed via the lazy node count or the data frame analytics config.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent
Copy link
Member Author

@elastic/ml-ui how will the data frame analytics wizard behave if it gets a null memory estimate? As it stands right now, Elasticsearch will simply fail with no ML nodes. It seems to me that providing the usable fields but with a null estimate is much preferred.

@benwtrent
Copy link
Member Author

@elasticmachine update branch

explainRequest,
explainListener);

Optional<DiscoveryNode> node = findMlNode(clusterService.state());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some common package the findMlNode method could go to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of just changing the logic so that we always call _explain and _explain handles this.

.getResponse()
.results()
.get(0)
.getState(), equalTo(DataFrameAnalyticsState.STARTING));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a problem that this busy loop won't "catch" the moment in which the state is STARTING because analytics will advance to the next state too quickly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. Because there is no ML node to assign to.

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -190,7 +225,7 @@ private void redirectToMlNode(PutDataFrameAnalyticsAction.Request request,
/**
* Finds the first available ML node in the cluster state.
*/
private static Optional<DiscoveryNode> findMlNode(ClusterState clusterState) {
static Optional<DiscoveryNode> findMlNode(ClusterState clusterState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be made private again?

@benwtrent benwtrent merged commit 8a0aad2 into elastic:master Jan 25, 2021
@benwtrent benwtrent deleted the feature/ml-dfa-scaling-from-0 branch January 25, 2021 16:17
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jan 25, 2021
…e allocation is allowed (elastic#67840)

We cannot calculate memory size estimates if there are no ML nodes.

But, if lazy nodes are enabled (or lazy starting in the analytics config), we should still be able to start the job.

In _explain if there are no ML nodes, but there are lazy nodes (or the data frame analytics config allows lazy opening), we simply skip the memory estimate (returning the default of 1gb)
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jan 25, 2021
…e allocation is allowed (elastic#67840)

We cannot calculate memory size estimates if there are no ML nodes.

But, if lazy nodes are enabled (or lazy starting in the analytics config), we should still be able to start the job.

In _explain if there are no ML nodes, but there are lazy nodes (or the data frame analytics config allows lazy opening), we simply skip the memory estimate (returning the default of 1gb)
benwtrent added a commit that referenced this pull request Jan 25, 2021
…azy node allocation is allowed (#67840) (#67921)

* [ML] fix data frame analytics when there are no ML nodes but lazy node allocation is allowed (#67840)

We cannot calculate memory size estimates if there are no ML nodes.

But, if lazy nodes are enabled (or lazy starting in the analytics config), we should still be able to start the job.

In _explain if there are no ML nodes, but there are lazy nodes (or the data frame analytics config allows lazy opening), we simply skip the memory estimate (returning the default of 1gb)
benwtrent added a commit that referenced this pull request Jan 25, 2021
…zy node allocation is allowed (#67840) (#67920)

* [ML] fix data frame analytics when there are no ML nodes but lazy node allocation is allowed (#67840)

We cannot calculate memory size estimates if there are no ML nodes.

But, if lazy nodes are enabled (or lazy starting in the analytics config), we should still be able to start the job.

In _explain if there are no ML nodes, but there are lazy nodes (or the data frame analytics config allows lazy opening), we simply skip the memory estimate (returning the default of 1gb)
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Jan 28, 2021
The data frame analytics memory estimation process is very small
and short-lived. Therefore, it isn't so bad for it to run on a
node that's not an ML node.

This PR allows the data frame analytics memory estimation process
to run on data or ingest nodes in addition to ML nodes. If the
node receiving the _explain request is any of these then it handles
the request, avoiding a transfer within the cluster. Otherwise the
request is transferred to an ML node if there is one. If there is
no ML node then the request is transferred to a data or ingest
node. (And it fails in the extremely unlikely event that the
cluster has no ML, data or ingest nodes.)

Replacement for elastic#67840
droberts195 added a commit that referenced this pull request Jan 29, 2021
…8146)

The data frame analytics memory estimation process is very small
and short-lived. Therefore, it isn't so bad for it to run on a
node that's not an ML node.

This PR allows the data frame analytics memory estimation process
to run on data or ingest nodes in addition to ML nodes. If the
node receiving the _explain request is any of these then it handles
the request, avoiding a transfer within the cluster. Otherwise the
request is transferred to an ML node if there is one. If there is
no ML node then the request is transferred to a data or ingest
node. (And it fails in the extremely unlikely event that the
cluster has no ML, data or ingest nodes.)

Replacement for #67840
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants