Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Remove noisy 'Could not find trained model' message #100760

Merged
merged 3 commits into from Oct 12, 2023

Conversation

davidkyle
Copy link
Member

ModelLoadingService watches for the creation of ingest pipelines that use the ingest processor and pre-loads the model referenced in the new pipeline. If the model does not exist because the pipeline has been created before the model an error message is logged, this can happen quite a lot producing spurious error messages. Often this will be because a solution creates ingest pipeline referencing the ELSER model before ELSER has been downloaded.

ModelLoadingService only handles DFA/Boosted tree models which are loaded on ingest nodes, it does not load PyTorch/NLP models but it first has to read the model config to know what type the model is.

If is very easy to cause the error message to be logged, just create a ingest processor with an unknown model id

PUT _ingest/pipeline/elser
{
  "processors": [
    {
      "inference": {
        "model_id": "not-a-model",
        "field_map": {
          "body": "text_field"
        },
        "target_field": "ml"
      }
    }
  ]
}

And the following is logged.

[2023-10-12T09:45:51,528][WARN ][o.e.x.m.i.l.ModelLoadingService] [host] [not-a-model] failed to load model configurationorg.elasticsearch.ResourceNotFoundException: Could not find trained model [not-a-model]
	at org.elasticsearch.ml@8.11.0/org.elasticsearch.xpack.ml.inference.persistence.TrainedModelProvider.lambda$getTrainedModel$22(TrainedModelProvider.java:670)
	at org.elasticsearch.server@8.11.0/org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:177)
	at org.elasticsearch.server@8.11.0/org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:32)
	at org.elasticsearch.server@8.11.0/org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onResponse(NodeClient.java:161)
	at org.elasticsearch.server@8.11.0/org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:203)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 12, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @davidkyle, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@@ -745,14 +750,13 @@ private void cacheEvictionListener(RemovalNotification<String, ModelAndConsumer>

@Override
public void clusterChanged(ClusterChangedEvent event) {
final boolean prefetchModels = event.state().nodes().getLocalNode().isIngestNode();
// If we are not prefetching models and there were no model alias changes, don't bother handling the changes
if ((prefetchModels == false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this changing the behaviour beyond what the release note says?

Previously in the case where an ingest pipeline referenced a model that didn't exist, that model would be prefetched on ingest nodes as soon as it was created. But now that doesn't happen? Or am I misreading this change?

Maybe the old behaviour was unacceptably resource-intensive, because it was doing a lot of processing on ingest nodes on every cluster state update, and changing it is justified. But in that case I think the release note should say this rather than pretend the change is only about logging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make those changes in another PR

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle davidkyle added the auto-backport-and-merge Automatically create backport pull requests and merge when ready label Oct 12, 2023
@davidkyle davidkyle merged commit 31ecf04 into elastic:main Oct 12, 2023
13 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.11

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Oct 12, 2023
)

A 'Could not find trained model' was logged when a new ingest pipeline
using a inference processor was created but the referenced model could 
not be found. This isn't unusual as it is common to create a pipeline
before the model is loaded. This change does not log the error message
in this situation
elasticsearchmachine pushed a commit that referenced this pull request Oct 12, 2023
…100775)

A 'Could not find trained model' was logged when a new ingest pipeline
using a inference processor was created but the referenced model could 
not be found. This isn't unusual as it is common to create a pipeline
before the model is loaded. This change does not log the error message
in this situation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-and-merge Automatically create backport pull requests and merge when ready >bug :ml Machine learning Team:ML Meta label for the ML team v8.11.0 v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants