-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] fix data frame analytics when there are no ML nodes but lazy node allocation is allowed #67840
[ML] fix data frame analytics when there are no ML nodes but lazy node allocation is allowed #67840
Conversation
…e allocation is allowed
Pinging @elastic/ml-core (:ml) |
@elastic/ml-ui how will the data frame analytics wizard behave if it gets a |
@elasticmachine update branch |
explainRequest, | ||
explainListener); | ||
|
||
Optional<DiscoveryNode> node = findMlNode(clusterService.state()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some common
package the findMlNode
method could go to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking of just changing the logic so that we always call _explain
and _explain
handles this.
.getResponse() | ||
.results() | ||
.get(0) | ||
.getState(), equalTo(DataFrameAnalyticsState.STARTING)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be a problem that this busy loop won't "catch" the moment in which the state is STARTING
because analytics will advance to the next state too quickly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. Because there is no ML node to assign to.
…elasticsearch into feature/ml-dfa-scaling-from-0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -190,7 +225,7 @@ private void redirectToMlNode(PutDataFrameAnalyticsAction.Request request, | |||
/** | |||
* Finds the first available ML node in the cluster state. | |||
*/ | |||
private static Optional<DiscoveryNode> findMlNode(ClusterState clusterState) { | |||
static Optional<DiscoveryNode> findMlNode(ClusterState clusterState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be made private
again?
…e allocation is allowed (elastic#67840) We cannot calculate memory size estimates if there are no ML nodes. But, if lazy nodes are enabled (or lazy starting in the analytics config), we should still be able to start the job. In _explain if there are no ML nodes, but there are lazy nodes (or the data frame analytics config allows lazy opening), we simply skip the memory estimate (returning the default of 1gb)
…e allocation is allowed (elastic#67840) We cannot calculate memory size estimates if there are no ML nodes. But, if lazy nodes are enabled (or lazy starting in the analytics config), we should still be able to start the job. In _explain if there are no ML nodes, but there are lazy nodes (or the data frame analytics config allows lazy opening), we simply skip the memory estimate (returning the default of 1gb)
…azy node allocation is allowed (#67840) (#67921) * [ML] fix data frame analytics when there are no ML nodes but lazy node allocation is allowed (#67840) We cannot calculate memory size estimates if there are no ML nodes. But, if lazy nodes are enabled (or lazy starting in the analytics config), we should still be able to start the job. In _explain if there are no ML nodes, but there are lazy nodes (or the data frame analytics config allows lazy opening), we simply skip the memory estimate (returning the default of 1gb)
…zy node allocation is allowed (#67840) (#67920) * [ML] fix data frame analytics when there are no ML nodes but lazy node allocation is allowed (#67840) We cannot calculate memory size estimates if there are no ML nodes. But, if lazy nodes are enabled (or lazy starting in the analytics config), we should still be able to start the job. In _explain if there are no ML nodes, but there are lazy nodes (or the data frame analytics config allows lazy opening), we simply skip the memory estimate (returning the default of 1gb)
The data frame analytics memory estimation process is very small and short-lived. Therefore, it isn't so bad for it to run on a node that's not an ML node. This PR allows the data frame analytics memory estimation process to run on data or ingest nodes in addition to ML nodes. If the node receiving the _explain request is any of these then it handles the request, avoiding a transfer within the cluster. Otherwise the request is transferred to an ML node if there is one. If there is no ML node then the request is transferred to a data or ingest node. (And it fails in the extremely unlikely event that the cluster has no ML, data or ingest nodes.) Replacement for elastic#67840
…8146) The data frame analytics memory estimation process is very small and short-lived. Therefore, it isn't so bad for it to run on a node that's not an ML node. This PR allows the data frame analytics memory estimation process to run on data or ingest nodes in addition to ML nodes. If the node receiving the _explain request is any of these then it handles the request, avoiding a transfer within the cluster. Otherwise the request is transferred to an ML node if there is one. If there is no ML node then the request is transferred to a data or ingest node. (And it fails in the extremely unlikely event that the cluster has no ML, data or ingest nodes.) Replacement for #67840
We cannot calculate memory size estimates if there are no ML nodes.
But, if lazy nodes are enabled (or lazy starting in the analytics config), we should still be able to start the job.
This commit adds two predicates:
_explain
if there are no ML nodes, but there are lazy nodes (or the data frame analytics config allows lazy opening), we simply skip the memory estimate (returning null)_start
we skip checking the memory estimate via_explain
if there are no nodes but lazy starting is allowed via the lazy node count or the data frame analytics config.