[ML] Add default Elastic Inference Service chat completion endpoint #120847

jonathan-buttner · 2025-01-24T22:32:40Z

This PR adds the first iteration model id for the elastic inference service.

Model id: rainbow-sprinkles

The default endpoint id: .rainbow-sprinkles-elastic

Testing

Without EIS

GET _inference/_all

elastic should not be listed in the response

GET _inference/_services

.rainbow-sprinkles-elastic should not be listed in the response

With EIS

Get the right certs directory.

Run the gateway:

make TLS_VERIFY_CLIENT_CERTS=false run

Run ES:

./gradlew :run -Drun.license_type=trial -Dtests.es.xpack.inference.elastic.url=https://localhost:8443 -Dtests.es.xpack.inference.elastic.http.ssl.verification_mode=none

Retrieve all the default inference endpoints

GET _inference/_all
{
    "endpoints": [
        ...
        {
            "inference_id": ".rainbow-sprinkles-elastic",
            "task_type": "chat_completion",
            "service": "elastic",
            "service_settings": {
                "model_id": "rainbow-sprinkles",
                "rate_limit": {
                    "requests_per_minute": 240
                }
            }
        },
        ...
    ]
}

Retrieving all the available services for sparse embedding

GET _inference/_services/sparse_embedding
[
    ...
    {
        "service": "elastic",
        "name": "Elastic",
        "task_types": [
            "sparse_embedding",
            "chat_completion"
        ],
        "configurations": {
            "rate_limit.requests_per_minute": {
                "description": "Minimize the number of rate limit errors.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "sparse_embedding",
                    "chat_completion"
                ]
            },
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "sparse_embedding",
                    "chat_completion"
                ]
            },
            "max_input_tokens": {
                "description": "Allows you to specify the maximum number of tokens per input.",
                "label": "Maximum Input Tokens",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "sparse_embedding"
                ]
            }
        }
    },
    ...
]

jonathan-buttner · 2025-01-27T20:23:17Z

...ce-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceGetServicesIT.java

 import static org.elasticsearch.xpack.inference.InferenceBaseRestTest.assertStatusOkOrCreated;
 import static org.hamcrest.Matchers.equalTo;

-public class InferenceGetServicesIT extends ESRestTestCase {


Moved this to BaseMockEISAuthServerTest

jonathan-buttner · 2025-01-27T20:24:04Z