-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Inference API] Default eis endpoint #119694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0fb460a
8ef932a
bb97600
31f3f2c
7213932
ae9dbf7
147ba77
9d4e02e
9f78b40
b09b3f5
b92724b
25ab348
7168dc6
91daa01
7842b6c
7f44d04
afc8ebc
346ceba
a206fab
cef606d
a5e91d9
6933c49
31fe29c
8ebe833
1be3aca
0278776
eeb12b3
a619561
f9e6b7c
59c8791
5d430eb
ad8c7ab
84b654c
1687bba
ef802b0
986654b
1d79df7
fa74489
5b1a509
1cd60f6
5a74d97
8657c09
69868af
b06af9b
c13fc53
cb219e6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,8 @@ | |
| import org.elasticsearch.core.Nullable; | ||
| import org.elasticsearch.core.TimeValue; | ||
| import org.elasticsearch.inference.ChunkedInference; | ||
| import org.elasticsearch.inference.EmptySecretSettings; | ||
| import org.elasticsearch.inference.EmptyTaskSettings; | ||
| import org.elasticsearch.inference.InferenceServiceConfiguration; | ||
| import org.elasticsearch.inference.InferenceServiceResults; | ||
| import org.elasticsearch.inference.InputType; | ||
|
|
@@ -42,6 +44,7 @@ | |
| import org.elasticsearch.xpack.inference.services.SenderService; | ||
| import org.elasticsearch.xpack.inference.services.ServiceComponents; | ||
| import org.elasticsearch.xpack.inference.services.elastic.completion.ElasticInferenceServiceCompletionModel; | ||
| import org.elasticsearch.xpack.inference.services.elastic.completion.ElasticInferenceServiceCompletionServiceSettings; | ||
| import org.elasticsearch.xpack.inference.services.settings.RateLimitSettings; | ||
| import org.elasticsearch.xpack.inference.telemetry.TraceContext; | ||
|
|
||
|
|
@@ -67,10 +70,14 @@ public class ElasticInferenceService extends SenderService { | |
| public static final String NAME = "elastic"; | ||
| public static final String ELASTIC_INFERENCE_SERVICE_IDENTIFIER = "Elastic Inference Service"; | ||
|
|
||
| private final ElasticInferenceServiceComponents elasticInferenceServiceComponents; | ||
|
|
||
| private static final EnumSet<TaskType> supportedTaskTypes = EnumSet.of(TaskType.SPARSE_EMBEDDING, TaskType.COMPLETION); | ||
| private static final String SERVICE_NAME = "Elastic"; | ||
| private static final String DEFAULT_EIS_CHAT_COMPLETION_MODEL_ID_V1 = "rainbow-sprinkles"; | ||
| private static final String DEFAULT_EIS_CHAT_COMPLETION_ENDPOINT_ID_V1 = ".eis-alpha-1"; | ||
| private static final Set<String> DEFAULT_EIS_ENDPOINT_IDS = Set.of(DEFAULT_EIS_CHAT_COMPLETION_ENDPOINT_ID_V1); | ||
|
|
||
| private final ElasticInferenceServiceComponents elasticInferenceServiceComponents; | ||
| private final List<Model> defaultEndpoints; | ||
|
|
||
| public ElasticInferenceService( | ||
| HttpRequestSender.Factory factory, | ||
|
|
@@ -79,6 +86,23 @@ public ElasticInferenceService( | |
| ) { | ||
| super(factory, serviceComponents); | ||
| this.elasticInferenceServiceComponents = elasticInferenceServiceComponents; | ||
| this.defaultEndpoints = initDefaultEndpoints(); | ||
| } | ||
|
|
||
| private List<Model> initDefaultEndpoints() { | ||
| return List.of(v1DefaultCompletionModel()); | ||
| } | ||
|
|
||
| private ElasticInferenceServiceCompletionModel v1DefaultCompletionModel() { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this initialization of the default endpoint conditional on the health check? We want to add an ACL in the gateway which will dynamically control if EIS is available or not. In the case that this customer does not have access to EIS, will it skip creating the default endpoint?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a great point. The current implementation does not consider that. I believe I also read that individual models might be available separately for different customers. Which means that we'd need EIS enabled for the customer, and this particular model. Since it sounds like we're going to keep the information in memory to determine what is available to this cluster, I wonder if we could pass that information along via the I think we should postpone merging this PR until the logic is merged to handle determine if EIS is available and which models are available. |
||
| return new ElasticInferenceServiceCompletionModel( | ||
| DEFAULT_EIS_CHAT_COMPLETION_ENDPOINT_ID_V1, | ||
| TaskType.COMPLETION, | ||
| NAME, | ||
| new ElasticInferenceServiceCompletionServiceSettings(DEFAULT_EIS_CHAT_COMPLETION_MODEL_ID_V1, null), | ||
| EmptyTaskSettings.INSTANCE, | ||
| EmptySecretSettings.INSTANCE, | ||
| elasticInferenceServiceComponents | ||
| ); | ||
| } | ||
|
|
||
| @Override | ||
|
|
@@ -175,6 +199,17 @@ public void parseRequestConfig( | |
| Map<String, Object> config, | ||
| ActionListener<Model> parsedModelListener | ||
| ) { | ||
| if (DEFAULT_EIS_ENDPOINT_IDS.contains(inferenceEntityId)) { | ||
| parsedModelListener.onFailure( | ||
| new ElasticsearchStatusException( | ||
| "[{}] is a reserved inference Id. Cannot create a new inference endpoint with a reserved Id", | ||
| RestStatus.BAD_REQUEST, | ||
| inferenceEntityId | ||
| ) | ||
| ); | ||
| return; | ||
| } | ||
|
|
||
|
Comment on lines
+202
to
+212
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing wrong here, but I don't quite understand whats going on. Can you explain
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah sure:
During a PUT request to persist a new inference endpoint, this could is checking to see if the
This can happen when a user is creating a new inference endpoint like this: It's protecting us from a user trying to use a reserved value for their inference endpoint id
The user's PUT request will result in a 400 error. I don't believe this code path would be executed during boot up.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks!
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! Let's move this check to ModelRegistry knows about the default Ids via the addDefaultIds(defaultIds) function which is called by the InferencePlugin at node start up. In TransportPutInferenceModelAction#masterOperation the same check can be made by querying ModelRegistry |
||
| try { | ||
| Map<String, Object> serviceSettingsMap = removeFromMapOrThrowIfNull(config, ModelConfigurations.SERVICE_SETTINGS); | ||
| Map<String, Object> taskSettingsMap = removeFromMapOrDefaultEmpty(config, ModelConfigurations.TASK_SETTINGS); | ||
|
|
@@ -210,6 +245,16 @@ public EnumSet<TaskType> supportedTaskTypes() { | |
| return supportedTaskTypes; | ||
| } | ||
|
|
||
| @Override | ||
| public List<DefaultConfigId> defaultConfigIds() { | ||
| return List.of(new DefaultConfigId(DEFAULT_EIS_CHAT_COMPLETION_ENDPOINT_ID_V1, TaskType.COMPLETION, this)); | ||
| } | ||
|
|
||
| @Override | ||
| public void defaultConfigs(ActionListener<List<Model>> defaultsListener) { | ||
| defaultsListener.onResponse(defaultEndpoints); | ||
| } | ||
|
|
||
| private static ElasticInferenceServiceModel createModel( | ||
| String inferenceEntityId, | ||
| TaskType taskType, | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name this the same way we name other default endpoints? I think we settled on
modelId-providerNamewhich would mean:.rainbow-sprinkles-elastic?