-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ML] Use EIS v2 authorization endpoint for inference API #138249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Use EIS v2 authorization endpoint for inference API #138249
Conversation
…earch into ml-eis-auth-v2
…earch into ml-eis-auth-v2
|
|
||
| public static class Request extends AcknowledgedRequest<Request> { | ||
| private final List<Model> models; | ||
| private final List<? extends Model> models; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are so the authorization logic can return a list of a child class of Model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid needing to use <? extends Model> in this PR by making a small change to ElasticInferenceServiceAuthorizationModel.getEndpoints():
public List<Model> getEndpoints(Set<String> endpointIds) {
return endpointIds.stream().<Model>map(authorizedEndpoints::get).filter(Objects::nonNull).toList();
}
By letting the stream know that is should be a Stream<Model> after the map() call instead of it inferring the Stream<ElasticInferenceServiceModel> type, the return type can use List<Model>
| """; | ||
|
|
||
| webServer.enqueue(new MockResponse().setResponseCode(200).setBody(responseJson)); | ||
| var authResponse = getEisAuthorizationResponseWithMultipleEndpoints("ignored"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL is typically passed in to this method. We don't have access to it yet because the webserver may not have been started yet. These tests doesn't actually need the parts of the getEisAuthorizationResponseWithMultipleEndpoints response that leverage the passed in url here anyway.
| clusterPlugins project(':x-pack:plugin:inference:qa:test-service-plugin') | ||
|
|
||
| // Allow javaRestTest to see unit-test classes from x-pack:plugin:inference so we can import some variables | ||
| javaRestTestImplementation(testArtifact(project(xpackModule('inference')))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so the qa tests can imported from the unit tests in the inference plugin.
| } | ||
|
|
||
| private static Map<String, Object> getChunkingSettingsMap(AuthorizationResponseEntity.Configuration configuration) { | ||
| return Objects.requireNonNullElse(configuration.chunkingSettings(), new HashMap<>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default to an empty map so the chunking settings use the "newer" default logic (default to the sentence strategy rather than word).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment here that this happens? Or would it be possible to return a chunking strategy object instead of a generic map and fallback to the actual default chunking strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll add a comment. I think it's probably best to keep things consistent and have ChunkingSettingsBuilder.fromMap() handle what to do if the settings aren't provided.
| new ElasticInferenceServiceDenseTextEmbeddingsServiceSettings( | ||
| authorizedEndpoint.modelName(), | ||
| getSimilarityMeasure(config), | ||
| config.dimensions(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the element type isn't used because we hard code to float. I have an issue to fix that after this PR is merged though.
| @@ -1,194 +0,0 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This represented the v1 authorization endpoint response. We no longer interact with that so we don't need this.
|
|
||
| // elser-2 | ||
| public static final String DEFAULT_ELSER_2_MODEL_ID = "elser_model_2"; | ||
| public static final String DEFAULT_ELSER_ENDPOINT_ID_V2 = ".elser-2-elastic"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this because semantic text references it. Once we implement the heuristics logic we can remove this as well.
| public void testHideFromConfigurationApi_ThrowsUnsupported_WithAvailableModels() throws Exception { | ||
| try ( | ||
| var service = createServiceWithMockSender( | ||
| ElasticInferenceServiceAuthorizationModel.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was unused by createServiceWithMockSender so removing.
| { | ||
| "inference_endpoints": [ | ||
| { | ||
| "id": ".rainbow-sprinkles-elastic", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it'd be easier to read if we didn't do a Strings.format() here to specify the id, name, and other field. I'm open to changing it though. I'm also open to other ideas to avoid the duplication.
| public static final String RERANK_V1_MODEL_NAME = "elastic-rerank-v1"; | ||
| public static final String EIS_RERANK_PATH = "rerank/text/text-similarity"; | ||
|
|
||
| public record EisAuthorizationResponse( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to encapsulate a testing using a specific eis response and what the expected entities should be created from that json response.
|
Pinging @elastic/ml-core (Team:ML) |
Safety measure to make sure we've the correct default rerank endpoint in place in case #138249 doesn't make it. Endpoint name: `.elastic-rerank-v1` -> `.jina-reranker-v2` Model name: `elastic-rerank-v1` -> `jina-reranker-v2`
timgrein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 👏 First round of reviews, will give it another go as it's pretty large
|
|
||
| var config = ElasticInferenceService.createConfiguration(authorizationModel.getAuthorizedTaskTypes()); | ||
| if (requestedTaskType != null && authorizationModel.getAuthorizedTaskTypes().contains(requestedTaskType) == false) { | ||
| var config = ElasticInferenceService.createConfiguration(authorizationModel.getTaskTypes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment why we do an early return here? Didn't understand it a first glance. I assume we return here, because the auth model of EIS doesn't support the requested task type and therefore we simply return the ones we already have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly, if the user is looking for text_embedding, but they aren't authorized by EIS for any inference endpoints for text embedding, then we don't include EIS as a provider in that situation.
I'll add a comment.
| e | ||
| ); | ||
| delegate.onResponse(ElasticInferenceServiceAuthorizationModel.newDisabledService()); | ||
| delegate.onResponse(AuthorizationModel.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does empty() imply "forbid all"? Maybe we could rename the method then to reflect the result of an empty auth model?
| import java.util.stream.Collectors; | ||
|
|
||
| /** | ||
| * Transforms the response from {@link ElasticInferenceServiceAuthorizationRequestHandler} into a format for consumption by the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Transforms the response from {@link ElasticInferenceServiceAuthorizationRequestHandler} into a format for consumption by the service. | |
| * Transforms the response from {@link ElasticInferenceServiceAuthorizationRequestHandler} into a format for consumption by the {@link ElasticInferenceService}. |
The service is in this case the ElasticInferenceService, right?
| /** | ||
| * Transforms the response from {@link ElasticInferenceServiceAuthorizationRequestHandler} into a format for consumption by the service. | ||
| */ | ||
| public class AuthorizationModel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public class AuthorizationModel { | |
| public class ElasticInferenceServiceAuthorizationModel { |
nit: Wondering why we're using the name AuthorizationModel instead of ElasticInferenceServiceAuthorizationModel here as we usually prefix every EIS-related class with ElasticInferenceService. It's anyway 100% specific to EIS, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rename this 👍 after we discuss as a team we can do a broader rename as needed for the other files too.
| return switch (taskType) { | ||
| case CHAT_COMPLETION -> createCompletionModel(authorizedEndpoint, TaskType.CHAT_COMPLETION, components); | ||
| case COMPLETION -> createCompletionModel(authorizedEndpoint, TaskType.COMPLETION, components); | ||
| case SPARSE_EMBEDDING -> createSparseEmbeddingsModel(authorizedEndpoint, components); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| case SPARSE_EMBEDDING -> createSparseEmbeddingsModel(authorizedEndpoint, components); | |
| case SPARSE_EMBEDDING -> createSparseTextEmbeddingsModel(authorizedEndpoint, components); |
nit: for consistency, I think sparse always implies a text embedding model on the other hand - feel free to ignore
| } | ||
|
|
||
| private static Map<String, Object> getChunkingSettingsMap(AuthorizationResponseEntity.Configuration configuration) { | ||
| return Objects.requireNonNullElse(configuration.chunkingSettings(), new HashMap<>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment here that this happens? Or would it be possible to return a chunking strategy object instead of a generic map and fallback to the actual default chunking strategy?
|
|
||
| public record TaskTypeObject(String eisTaskType, String elasticsearchTaskType) implements Writeable, ToXContentObject { | ||
|
|
||
| private static final String EIS_TASK_TYPE_FIELD = "eis"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| private static final String EIS_TASK_TYPE_FIELD = "eis"; | |
| private static final String ELASTIC_INFERENCE_SERVICE_TASK_TYPE_FIELD = "eis"; |
nit: I think at some point in the past we've agreed that we shouldn't use "EIS", but always the written out version "Elastic Inference Service" in our code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to hold off on this one, we can rename based on the team's discussion. The field that we're returning from the EIS auth v2 endpoint is the string eis 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR migrates the Elastic Inference Service (EIS) integration from v1 to v2 authorization endpoint, introducing a new response format that provides comprehensive endpoint configuration including task types, model names, and optional settings like chunking and embedding dimensions. The v2 format enables dynamic endpoint creation directly from the authorization response, eliminating the need for hardcoded preconfigured endpoint mappings.
Key Changes:
- Switched authorization endpoint from
/api/v1/authorizationsto/api/v2/authorizationswith enhanced response parsing - Replaced static preconfigured endpoint mappings with dynamic model creation from authorization response
- Refactored authorization model to store complete endpoint objects instead of just model names and task types
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| ElasticInferenceServiceAuthorizationResponseEntity.java | Complete restructure to parse v2 response format with endpoint IDs, task types, and configuration objects |
| ElasticInferenceServiceAuthorizationRequest.java | Updated to use URIBuilder for constructing v2 endpoint URL |
| ElasticInferenceServiceAuthorizationModel.java | Major refactor to create full model objects from authorization response instead of just tracking IDs |
| PreconfiguredEndpointModelAdapter.java | Deleted - no longer needed with dynamic model creation |
| InternalPreconfiguredEndpoints.java | Gutted to only retain minimal constants, removing hardcoded endpoint mappings |
| AuthorizationPoller.java | Updated to work with new model objects instead of inference IDs |
| StoreInferenceEndpointsAction.java | Made generic to accept List<? extends Model> instead of concrete List<Model> |
| ElasticInferenceServiceCompletionModel.java | Removed @nullable annotations from constructor parameters |
| Multiple test files | Updated to use new test helpers and response formats from v2 |
| build.gradle | Added test artifact dependency to share test constants |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| this.authorizedEndpoints = authorizedEndpoints.stream() | ||
| .collect( | ||
| Collectors.toMap(ElasticInferenceServiceModel::getInferenceEntityId, Function.identity(), (firstModel, secondModel) -> { | ||
| logger.warn("Found inference id collision for id [{}], ignoring second model", firstModel.inferenceEntityId()); |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The duplicate ID handling on line 257 uses a lambda that logs a warning and returns the first model. However, the warning message references firstModel.inferenceEntityId() which may not be the most informative - consider including information about which model is being kept and which is being discarded (including their task types) to help with debugging.
| logger.warn("Found inference id collision for id [{}], ignoring second model", firstModel.inferenceEntityId()); | |
| logger.warn( | |
| "Found inference id collision for id [{}]. Keeping model with id [{}] (taskType={}), discarding model with id [{}] (taskType={})", | |
| firstModel.getInferenceEntityId(), | |
| firstModel.getInferenceEntityId(), | |
| firstModel.getTaskType(), | |
| secondModel.getInferenceEntityId(), | |
| secondModel.getTaskType() | |
| ); |
| private static URI createUri(String url) throws ElasticsearchStatusException { | ||
| try { | ||
| // TODO, consider transforming the base URL into a URI for better error handling. | ||
| return new URI(url + "/api/v1/authorizations"); | ||
| return new URIBuilder(url).setPath(AUTHORIZATION_PATH).build(); |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using URIBuilder is better than simple string concatenation, but there's a potential issue: if the url already contains a path, setPath() will replace it entirely instead of appending to it. Consider using appendPath() or combining the existing path with the new one to avoid unexpected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url only has the base path so I think we're ok here.
| } | ||
|
|
||
| public Configuration(StreamInput in) throws IOException { | ||
| this(in.readOptionalString(), in.readOptionalVInt(), in.readOptionalString(), in.readGenericMap()); |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The readGenericMap() and writeGenericMap() methods can handle null values correctly, but in the constructor you're calling in.readGenericMap() which will never return null (it returns an empty map for null). This means chunkingSettings will never actually be null after deserialization, which is inconsistent with the field being marked as @Nullable. Consider updating the logic to use in.readOptionalMap() or document that null becomes an empty map.
| this(in.readOptionalString(), in.readOptionalVInt(), in.readOptionalString(), in.readGenericMap()); | |
| this(in.readOptionalString(), in.readOptionalVInt(), in.readOptionalString(), in.readOptionalMap()); |
| out.writeOptionalString(similarity); | ||
| out.writeOptionalVInt(dimensions); | ||
| out.writeOptionalString(elementType); | ||
| out.writeGenericMap(chunkingSettings); |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line 264, writeGenericMap() is called with chunkingSettings, but according to the Javadoc, this method does not support null values and will throw a NullPointerException if the map is null. Since chunkingSettings is marked as @Nullable, this could cause serialization to fail. Use a null check before writing or use a method that handles null values.
| out.writeGenericMap(chunkingSettings); | |
| out.writeOptionalGenericMap(chunkingSettings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writeGenericMap does support null values:
public void writeGenericMap(@Nullable Map<String, Object> map) throws IOException {
writeGenericValue(map);
}
Safety measure to make sure we've the correct default rerank endpoint in place in case elastic#138249 doesn't make it. Endpoint name: `.elastic-rerank-v1` -> `.jina-reranker-v2` Model name: `elastic-rerank-v1` -> `jina-reranker-v2`
Safety measure to make sure we've the correct default rerank endpoint in place in case elastic#138249 doesn't make it. Endpoint name: `.elastic-rerank-v1` -> `.jina-reranker-v2` Model name: `elastic-rerank-v1` -> `jina-reranker-v2`
...sterTest/java/org/elasticsearch/xpack/inference/integration/AuthorizationTaskExecutorIT.java
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/xpack/inference/action/TransportGetInferenceServicesAction.java
Outdated
Show resolved
Hide resolved
|
|
||
| public static class Request extends AcknowledgedRequest<Request> { | ||
| private final List<Model> models; | ||
| private final List<? extends Model> models; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid needing to use <? extends Model> in this PR by making a small change to ElasticInferenceServiceAuthorizationModel.getEndpoints():
public List<Model> getEndpoints(Set<String> endpointIds) {
return endpointIds.stream().<Model>map(authorizedEndpoints::get).filter(Objects::nonNull).toList();
}
By letting the stream know that is should be a Stream<Model> after the map() call instead of it inferring the Stream<ElasticInferenceServiceModel> type, the return type can use List<Model>
...pack/inference/services/elastic/authorization/ElasticInferenceServiceAuthorizationModel.java
Show resolved
Hide resolved
...pack/inference/services/elastic/authorization/ElasticInferenceServiceAuthorizationModel.java
Outdated
Show resolved
Hide resolved
...g/elasticsearch/xpack/inference/services/elastic/authorization/AuthorizationPollerTests.java
Outdated
Show resolved
Hide resolved
...inference/services/elastic/authorization/ElasticInferenceServiceAuthorizationModelTests.java
Outdated
Show resolved
Hide resolved
| public static ElasticInferenceServiceAuthorizationResponseEntity.TaskTypeObject createTaskTypeObject( | ||
| String eisTaskType, | ||
| String elasticsearchTaskType | ||
| ) { | ||
| return new ElasticInferenceServiceAuthorizationResponseEntity.TaskTypeObject(eisTaskType, elasticsearchTaskType); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this method add much value? It might be simpler to just call the constructor directly in places that are currently calling this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the reason I added it was because we changed the format of the task settings object a few times from a string to an object. The other minor benefit is that it removes the need for a long line. If you want me to remove it I can though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, no need to change it
...rence/services/elastic/response/ElasticInferenceServiceAuthorizationResponseEntityTests.java
Outdated
Show resolved
Hide resolved
...rence/services/elastic/response/ElasticInferenceServiceAuthorizationResponseEntityTests.java
Outdated
Show resolved
Hide resolved
| threadPool.executor(UTILITY_THREAD_POOL_NAME).execute(() -> getEisAuthorization(authModelListener, eisSender)); | ||
| }).<List<InferenceServiceConfiguration>>andThen((configurationListener, authorizationModel) -> { | ||
| var serviceConfigs = getServiceConfigurationsForServices(availableServices); | ||
| serviceConfigs.sort(Comparator.comparing(InferenceServiceConfiguration::getService)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor performance consideration; if we have both non-EIS and EIS services, we now sort the list twice. Maybe it would be better to combine the two if statements and sort within them before returning? Something like:
// If there was a requested task type and the authorization response from EIS doesn't support it, we'll exclude EIS as a valid
// service
if (authorizationModel.isAuthorized() == false
|| requestedTaskType != null && authorizationModel.getTaskTypes().contains(requestedTaskType) == false) {
serviceConfigs.sort(Comparator.comparing(InferenceServiceConfiguration::getService));
configurationListener.onResponse(serviceConfigs);
return;
}
This PR switches the inference API to point to the EIS v2 authorization endpoint and handles the new response format.
The EIS v2 authorization endpoint provides all the necessary fields for the inference API to create an inference endpoint. The inference API will only create endpoints that have a task type that is supported (the ES integration defines a set of supported task types).
Note: The
AuthorizationResponseEntityis not registered even though it is a named writeable. It is a bit of a hack that it needs to be a named writeable just to get it to work through the Sender framework.EIS v2 auth endpoint response format
Testing
eis-gateway
Start elasticsearch
You should see preconfigured EIS endpoints:
Response