Skip to content

Conversation

@jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Nov 3, 2025

This PR is to address an issue where we're seeing an increased number of error logs when storing persisting default inference endpoints.

The issue is that the exception that was returned in this PR: https://github.com/elastic/elasticsearch/pull/136569/files#diff-65fa96c525e72184c64209a9b6fd0f8130a7faf883464cbb21824996837c424cR674

Was causing an error to be logged here: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java#L452 because we were no longer returning a ResourceAlreadyExistsException.

Testing

I can reproduce this by making multiple calls to GET _inference/_all at the same time on a newly created cluster. There should be an error log with something like:

[2025-11-03T10:01:47,736][ERROR][o.e.x.i.r.ModelRegistry  ] [runTask-0] Failed to store default inference id [.multilingual-e5-small-elasticsearch] org.elasticsearch.ElasticsearchStatusException: Inference endpoint [.multilingual-e5-small-elasticsearch] already exists

After the fix, that shouldn't exist anymore (it'll be at a debug level).

You should still see warnings though (I verified that these exist prior to my changes) if you send a bunch of the GET requests at the same time when a cluster is first started.

[2025-11-03T10:04:18,890][WARN ][o.e.x.i.r.ModelRegistry  ] [runTask-0] Failed to store document id: [model_.rerank-v1-elasticsearch] inference id: [.rerank-v1-elasticsearch] index: [.inference] bulk failure message [[.inference/ghwJ_yhfQ5OQ43-wgBRcZg][[.inference][0]] org.elasticsearch.index.engine.VersionConflictEngineException: [model_.rerank-v1-elasticsearch]: version conflict, document already exists (current version [1])]
[2025-11-03T10:04:18,893][WARN ][o.e.x.i.r.ModelRegistry  ] [runTask-0] Failed to store document id: [model_.rerank-v1-elasticsearch] inference id: [.rerank-v1-elasticsearch] index: [.secrets-inference] bulk failure message [[.secrets-inference/2FvlGp-SSyqIXVdm4TjKhQ][[.secrets-inference][0]] org.elasticsearch.index.engine.VersionConflictEngineException: [model_.rerank-v1-elasticsearch]: version conflict, document already exists (current version [1])]

@jonathan-buttner jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Nov 3, 2025
@jonathan-buttner jonathan-buttner marked this pull request as ready for review November 3, 2025 15:38
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@jan-elastic jan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jonathan-buttner jonathan-buttner enabled auto-merge (squash) November 3, 2025 15:40
@jonathan-buttner jonathan-buttner merged commit 26ffe2e into elastic:main Nov 3, 2025
34 of 35 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-fix-resource-exists branch November 3, 2025 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning >non-issue Team:ML Meta label for the ML team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants