Skip to content

Conversation

@jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Nov 18, 2025

This PR switches the inference API to point to the EIS v2 authorization endpoint and handles the new response format.

The EIS v2 authorization endpoint provides all the necessary fields for the inference API to create an inference endpoint. The inference API will only create endpoints that have a task type that is supported (the ES integration defines a set of supported task types).

Note: The AuthorizationResponseEntity is not registered even though it is a named writeable. It is a bit of a hack that it needs to be a named writeable just to get it to work through the Sender framework.

EIS v2 auth endpoint response format
{
  "inference_endpoints": [
    {
      "id": ".rainbow-sprinkles-elastic",
      "model_name": "rainbow-sprinkles",
      "task_types": {
        "eis": "chat",
        "elasticsearch": "chat_completion"
      }
      "status": "ga",
      "properties": [
        "multilingual"
      ],
      "release_date": "2024-05-01",
      "end_of_life_date": "2025-12-31"
    },
    {
      "id": ".elastic-elser-v2",
      "model_name": "elser_model_2",
      "task_types": {
        "eis": "embed/text/sparse",
        "elasticsearch": "sparse_embedding"
      }
      "status": "preview",
      "properties": [
        "english"
      ],
      "release_date": "2024-05-01",
      "configuration": {
        "chunking_settings": {
          "strategy": "sentence",
          "max_chunk_size": 250,
          "sentence_overlap": 1
        }
      }
    },
    {
      "id": ".jina-embeddings-v3",
      "model_name": "jina-embeddings-v3",
      "task_types": {
        "eis": "embed/text/dense",
        "elasticsearch": "text_embedding"
      }
      "status": "beta",
      "properties": [
        "multilingual",
        "open-weights"
      ],
      "release_date": "2024-05-01",
      "configuration": {
        "similarity": "cosine",
        "dimensions": 1024,
        "element_type": "float",
        "chunking_settings": {
          "strategy": "sentence",
          "max_chunk_size": 250,
          "sentence_overlap": 1
        }
      }
    }
  ]
}

Testing

eis-gateway

make TLS_CLIENT_AUTH=NoClientCert run

Start elasticsearch

./gradlew :run -Drun.license_type=trial -Dtests.es.xpack.inference.elastic.url=https://localhost:8443 -Dtests.es.xpack.inference.elastic.http.ssl.verification_mode=none -Dtests.es.xpack.inference.elastic.authorization_request_interval="5s" -Dtests.es.xpack.inference.elastic.max_authorization_request_jitter="1s" -Dtests.es.xpack.inference.elastic.ccm_supported_environment=false

You should see preconfigured EIS endpoints:

GET _inference/_all

Response

{
    "endpoints": [
        {
            "inference_id": ".elser-2-elastic",
            "task_type": "sparse_embedding",
            "service": "elastic",
            "service_settings": {
                "model_id": "elser_model_2"
            },
            "chunking_settings": {
                "strategy": "sentence",
                "max_chunk_size": 250,
                "sentence_overlap": 1
            }
        },
...

@jonathan-buttner jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Nov 18, 2025

public static class Request extends AcknowledgedRequest<Request> {
private final List<Model> models;
private final List<? extends Model> models;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are so the authorization logic can return a list of a child class of Model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid needing to use <? extends Model> in this PR by making a small change to ElasticInferenceServiceAuthorizationModel.getEndpoints():

    public List<Model> getEndpoints(Set<String> endpointIds) {
        return endpointIds.stream().<Model>map(authorizedEndpoints::get).filter(Objects::nonNull).toList();
    }

By letting the stream know that is should be a Stream<Model> after the map() call instead of it inferring the Stream<ElasticInferenceServiceModel> type, the return type can use List<Model>

""";

webServer.enqueue(new MockResponse().setResponseCode(200).setBody(responseJson));
var authResponse = getEisAuthorizationResponseWithMultipleEndpoints("ignored");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL is typically passed in to this method. We don't have access to it yet because the webserver may not have been started yet. These tests doesn't actually need the parts of the getEisAuthorizationResponseWithMultipleEndpoints response that leverage the passed in url here anyway.

clusterPlugins project(':x-pack:plugin:inference:qa:test-service-plugin')

// Allow javaRestTest to see unit-test classes from x-pack:plugin:inference so we can import some variables
javaRestTestImplementation(testArtifact(project(xpackModule('inference'))))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so the qa tests can imported from the unit tests in the inference plugin.

}

private static Map<String, Object> getChunkingSettingsMap(AuthorizationResponseEntity.Configuration configuration) {
return Objects.requireNonNullElse(configuration.chunkingSettings(), new HashMap<>());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default to an empty map so the chunking settings use the "newer" default logic (default to the sentence strategy rather than word).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment here that this happens? Or would it be possible to return a chunking strategy object instead of a generic map and fallback to the actual default chunking strategy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll add a comment. I think it's probably best to keep things consistent and have ChunkingSettingsBuilder.fromMap() handle what to do if the settings aren't provided.

new ElasticInferenceServiceDenseTextEmbeddingsServiceSettings(
authorizedEndpoint.modelName(),
getSimilarityMeasure(config),
config.dimensions(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the element type isn't used because we hard code to float. I have an issue to fix that after this PR is merged though.

@@ -1,194 +0,0 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This represented the v1 authorization endpoint response. We no longer interact with that so we don't need this.


// elser-2
public static final String DEFAULT_ELSER_2_MODEL_ID = "elser_model_2";
public static final String DEFAULT_ELSER_ENDPOINT_ID_V2 = ".elser-2-elastic";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this because semantic text references it. Once we implement the heuristics logic we can remove this as well.

public void testHideFromConfigurationApi_ThrowsUnsupported_WithAvailableModels() throws Exception {
try (
var service = createServiceWithMockSender(
ElasticInferenceServiceAuthorizationModel.of(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unused by createServiceWithMockSender so removing.

{
"inference_endpoints": [
{
"id": ".rainbow-sprinkles-elastic",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it'd be easier to read if we didn't do a Strings.format() here to specify the id, name, and other field. I'm open to changing it though. I'm also open to other ideas to avoid the duplication.

public static final String RERANK_V1_MODEL_NAME = "elastic-rerank-v1";
public static final String EIS_RERANK_PATH = "rerank/text/text-similarity";

public record EisAuthorizationResponse(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to encapsulate a testing using a specific eis response and what the expected entities should be created from that json response.

@jonathan-buttner jonathan-buttner marked this pull request as ready for review November 21, 2025 15:39
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine pushed a commit that referenced this pull request Nov 24, 2025
Safety measure to make sure we've the correct default rerank endpoint in
place in case #138249
doesn't make it.

Endpoint name: `.elastic-rerank-v1` -> `.jina-reranker-v2` Model name:
`elastic-rerank-v1` -> `jina-reranker-v2`
Copy link
Contributor

@timgrein timgrein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 👏 First round of reviews, will give it another go as it's pretty large


var config = ElasticInferenceService.createConfiguration(authorizationModel.getAuthorizedTaskTypes());
if (requestedTaskType != null && authorizationModel.getAuthorizedTaskTypes().contains(requestedTaskType) == false) {
var config = ElasticInferenceService.createConfiguration(authorizationModel.getTaskTypes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment why we do an early return here? Didn't understand it a first glance. I assume we return here, because the auth model of EIS doesn't support the requested task type and therefore we simply return the ones we already have?

Copy link
Contributor Author

@jonathan-buttner jonathan-buttner Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep exactly, if the user is looking for text_embedding, but they aren't authorized by EIS for any inference endpoints for text embedding, then we don't include EIS as a provider in that situation.

I'll add a comment.

e
);
delegate.onResponse(ElasticInferenceServiceAuthorizationModel.newDisabledService());
delegate.onResponse(AuthorizationModel.empty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does empty() imply "forbid all"? Maybe we could rename the method then to reflect the result of an empty auth model?

import java.util.stream.Collectors;

/**
* Transforms the response from {@link ElasticInferenceServiceAuthorizationRequestHandler} into a format for consumption by the service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Transforms the response from {@link ElasticInferenceServiceAuthorizationRequestHandler} into a format for consumption by the service.
* Transforms the response from {@link ElasticInferenceServiceAuthorizationRequestHandler} into a format for consumption by the {@link ElasticInferenceService}.

The service is in this case the ElasticInferenceService, right?

/**
* Transforms the response from {@link ElasticInferenceServiceAuthorizationRequestHandler} into a format for consumption by the service.
*/
public class AuthorizationModel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class AuthorizationModel {
public class ElasticInferenceServiceAuthorizationModel {

nit: Wondering why we're using the name AuthorizationModel instead of ElasticInferenceServiceAuthorizationModel here as we usually prefix every EIS-related class with ElasticInferenceService. It's anyway 100% specific to EIS, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename this 👍 after we discuss as a team we can do a broader rename as needed for the other files too.

return switch (taskType) {
case CHAT_COMPLETION -> createCompletionModel(authorizedEndpoint, TaskType.CHAT_COMPLETION, components);
case COMPLETION -> createCompletionModel(authorizedEndpoint, TaskType.COMPLETION, components);
case SPARSE_EMBEDDING -> createSparseEmbeddingsModel(authorizedEndpoint, components);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case SPARSE_EMBEDDING -> createSparseEmbeddingsModel(authorizedEndpoint, components);
case SPARSE_EMBEDDING -> createSparseTextEmbeddingsModel(authorizedEndpoint, components);

nit: for consistency, I think sparse always implies a text embedding model on the other hand - feel free to ignore

}

private static Map<String, Object> getChunkingSettingsMap(AuthorizationResponseEntity.Configuration configuration) {
return Objects.requireNonNullElse(configuration.chunkingSettings(), new HashMap<>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment here that this happens? Or would it be possible to return a chunking strategy object instead of a generic map and fallback to the actual default chunking strategy?


public record TaskTypeObject(String eisTaskType, String elasticsearchTaskType) implements Writeable, ToXContentObject {

private static final String EIS_TASK_TYPE_FIELD = "eis";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static final String EIS_TASK_TYPE_FIELD = "eis";
private static final String ELASTIC_INFERENCE_SERVICE_TASK_TYPE_FIELD = "eis";

nit: I think at some point in the past we've agreed that we shouldn't use "EIS", but always the written out version "Elastic Inference Service" in our code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to hold off on this one, we can rename based on the team's discussion. The field that we're returning from the EIS auth v2 endpoint is the string eis 😄

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the Elastic Inference Service (EIS) integration from v1 to v2 authorization endpoint, introducing a new response format that provides comprehensive endpoint configuration including task types, model names, and optional settings like chunking and embedding dimensions. The v2 format enables dynamic endpoint creation directly from the authorization response, eliminating the need for hardcoded preconfigured endpoint mappings.

Key Changes:

  • Switched authorization endpoint from /api/v1/authorizations to /api/v2/authorizations with enhanced response parsing
  • Replaced static preconfigured endpoint mappings with dynamic model creation from authorization response
  • Refactored authorization model to store complete endpoint objects instead of just model names and task types

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ElasticInferenceServiceAuthorizationResponseEntity.java Complete restructure to parse v2 response format with endpoint IDs, task types, and configuration objects
ElasticInferenceServiceAuthorizationRequest.java Updated to use URIBuilder for constructing v2 endpoint URL
ElasticInferenceServiceAuthorizationModel.java Major refactor to create full model objects from authorization response instead of just tracking IDs
PreconfiguredEndpointModelAdapter.java Deleted - no longer needed with dynamic model creation
InternalPreconfiguredEndpoints.java Gutted to only retain minimal constants, removing hardcoded endpoint mappings
AuthorizationPoller.java Updated to work with new model objects instead of inference IDs
StoreInferenceEndpointsAction.java Made generic to accept List<? extends Model> instead of concrete List<Model>
ElasticInferenceServiceCompletionModel.java Removed @nullable annotations from constructor parameters
Multiple test files Updated to use new test helpers and response formats from v2
build.gradle Added test artifact dependency to share test constants

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

this.authorizedEndpoints = authorizedEndpoints.stream()
.collect(
Collectors.toMap(ElasticInferenceServiceModel::getInferenceEntityId, Function.identity(), (firstModel, secondModel) -> {
logger.warn("Found inference id collision for id [{}], ignoring second model", firstModel.inferenceEntityId());
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The duplicate ID handling on line 257 uses a lambda that logs a warning and returns the first model. However, the warning message references firstModel.inferenceEntityId() which may not be the most informative - consider including information about which model is being kept and which is being discarded (including their task types) to help with debugging.

Suggested change
logger.warn("Found inference id collision for id [{}], ignoring second model", firstModel.inferenceEntityId());
logger.warn(
"Found inference id collision for id [{}]. Keeping model with id [{}] (taskType={}), discarding model with id [{}] (taskType={})",
firstModel.getInferenceEntityId(),
firstModel.getInferenceEntityId(),
firstModel.getTaskType(),
secondModel.getInferenceEntityId(),
secondModel.getTaskType()
);

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +44
private static URI createUri(String url) throws ElasticsearchStatusException {
try {
// TODO, consider transforming the base URL into a URI for better error handling.
return new URI(url + "/api/v1/authorizations");
return new URIBuilder(url).setPath(AUTHORIZATION_PATH).build();
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using URIBuilder is better than simple string concatenation, but there's a potential issue: if the url already contains a path, setPath() will replace it entirely instead of appending to it. Consider using appendPath() or combining the existing path with the new one to avoid unexpected behavior.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url only has the base path so I think we're ok here.

}

public Configuration(StreamInput in) throws IOException {
this(in.readOptionalString(), in.readOptionalVInt(), in.readOptionalString(), in.readGenericMap());
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readGenericMap() and writeGenericMap() methods can handle null values correctly, but in the constructor you're calling in.readGenericMap() which will never return null (it returns an empty map for null). This means chunkingSettings will never actually be null after deserialization, which is inconsistent with the field being marked as @Nullable. Consider updating the logic to use in.readOptionalMap() or document that null becomes an empty map.

Suggested change
this(in.readOptionalString(), in.readOptionalVInt(), in.readOptionalString(), in.readGenericMap());
this(in.readOptionalString(), in.readOptionalVInt(), in.readOptionalString(), in.readOptionalMap());

Copilot uses AI. Check for mistakes.
out.writeOptionalString(similarity);
out.writeOptionalVInt(dimensions);
out.writeOptionalString(elementType);
out.writeGenericMap(chunkingSettings);
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 264, writeGenericMap() is called with chunkingSettings, but according to the Javadoc, this method does not support null values and will throw a NullPointerException if the map is null. Since chunkingSettings is marked as @Nullable, this could cause serialization to fail. Use a null check before writing or use a method that handles null values.

Suggested change
out.writeGenericMap(chunkingSettings);
out.writeOptionalGenericMap(chunkingSettings);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeGenericMap does support null values:

    public void writeGenericMap(@Nullable Map<String, Object> map) throws IOException {
        writeGenericValue(map);
    }

afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Nov 26, 2025
Safety measure to make sure we've the correct default rerank endpoint in
place in case elastic#138249
doesn't make it.

Endpoint name: `.elastic-rerank-v1` -> `.jina-reranker-v2` Model name:
`elastic-rerank-v1` -> `jina-reranker-v2`
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
Safety measure to make sure we've the correct default rerank endpoint in
place in case elastic#138249
doesn't make it.

Endpoint name: `.elastic-rerank-v1` -> `.jina-reranker-v2` Model name:
`elastic-rerank-v1` -> `jina-reranker-v2`

public static class Request extends AcknowledgedRequest<Request> {
private final List<Model> models;
private final List<? extends Model> models;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid needing to use <? extends Model> in this PR by making a small change to ElasticInferenceServiceAuthorizationModel.getEndpoints():

    public List<Model> getEndpoints(Set<String> endpointIds) {
        return endpointIds.stream().<Model>map(authorizedEndpoints::get).filter(Objects::nonNull).toList();
    }

By letting the stream know that is should be a Stream<Model> after the map() call instead of it inferring the Stream<ElasticInferenceServiceModel> type, the return type can use List<Model>

Comment on lines +313 to +318
public static ElasticInferenceServiceAuthorizationResponseEntity.TaskTypeObject createTaskTypeObject(
String eisTaskType,
String elasticsearchTaskType
) {
return new ElasticInferenceServiceAuthorizationResponseEntity.TaskTypeObject(eisTaskType, elasticsearchTaskType);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method add much value? It might be simpler to just call the constructor directly in places that are currently calling this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the reason I added it was because we changed the format of the task settings object a few times from a string to an object. The other minor benefit is that it removes the need for a long line. If you want me to remove it I can though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, no need to change it

threadPool.executor(UTILITY_THREAD_POOL_NAME).execute(() -> getEisAuthorization(authModelListener, eisSender));
}).<List<InferenceServiceConfiguration>>andThen((configurationListener, authorizationModel) -> {
var serviceConfigs = getServiceConfigurationsForServices(availableServices);
serviceConfigs.sort(Comparator.comparing(InferenceServiceConfiguration::getService));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor performance consideration; if we have both non-EIS and EIS services, we now sort the list twice. Maybe it would be better to combine the two if statements and sort within them before returning? Something like:

// If there was a requested task type and the authorization response from EIS doesn't support it, we'll exclude EIS as a valid
// service
if (authorizationModel.isAuthorized() == false
    || requestedTaskType != null && authorizationModel.getTaskTypes().contains(requestedTaskType) == false) {
    serviceConfigs.sort(Comparator.comparing(InferenceServiceConfiguration::getService));
    configurationListener.onResponse(serviceConfigs);
    return;
}

@jonathan-buttner jonathan-buttner added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label Dec 3, 2025
@jonathan-buttner jonathan-buttner enabled auto-merge (squash) December 4, 2025 21:23
@jonathan-buttner jonathan-buttner merged commit a51f7b7 into elastic:main Dec 4, 2025
35 of 36 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-eis-auth-v2 branch December 4, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cloud-deploy Publish cloud docker image for Cloud-First-Testing :ml Machine learning >non-issue Team:ML Meta label for the ML team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants