[ML] Removing custom service from service api #130739

jonathan-buttner · 2025-07-07T15:55:00Z

This PR removes the custom service from the Services API so it is not exposed to the UI.

The reasoning is that the custom service requires a lot of configures that are supported yet in the UI.

Example services response without custom service

[
    {
        "service": "alibabacloud-ai-search",
        "name": "AlibabaCloud AI Search",
        "task_types": [
            "text_embedding",
            "sparse_embedding",
            "rerank",
            "completion"
        ],
        "configurations": {
            "workspace": {
                "description": "The name of the workspace used for the {infer} task.",
                "label": "Workspace",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion"
                ]
            },
            "api_key": {
                "description": "A valid API key for the AlibabaCloud AI Search API.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion"
                ]
            },
            "service_id": {
                "description": "The name of the model service to use for the {infer} task.",
                "label": "Project ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion"
                ]
            },
            "host": {
                "description": "The name of the host address used for the {infer} task. You can find the host address at https://opensearch.console.aliyun.com/cn-shanghai/rag/api-key[ the API keys section] of the documentation.",
                "label": "Host",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "Minimize the number of rate limit errors.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion"
                ]
            },
            "http_schema": {
                "description": "",
                "label": "HTTP Schema",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion"
                ]
            }
        }
    },
    {
        "service": "amazon_sagemaker",
        "name": "Amazon SageMaker",
        "task_types": [
            "text_embedding",
            "sparse_embedding",
            "rerank",
            "completion",
            "chat_completion"
        ],
        "configurations": {
            "batch_size": {
                "description": "The maximum size a single chunk of input can be when chunking input for semantic text.",
                "label": "Batch Size",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "endpoint_name": {
                "description": "The name specified when creating the SageMaker Endpoint.",
                "label": "Endpoint Name",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "target_model": {
                "description": "The model to request when calling a SageMaker multi-model Endpoint.",
                "label": "Target Model",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "enable_explanations": {
                "description": "JMESPath expression overriding the ClarifyingExplainerConfig in the SageMaker Endpoint Configuration.",
                "label": "Enable Explanations",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "session_id": {
                "description": "Creates or reuses an existing Session for SageMaker stateful models.",
                "label": "Session ID",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "custom_attributes": {
                "description": "An opaque informational value forwarded as-is to the model within SageMaker.",
                "label": "Custom Attributes",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "secret_key": {
                "description": "A valid AWS secret key that is paired with the access_key.",
                "label": "Secret Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "inference_id": {
                "description": "Informational identifying for auditing requests within the SageMaker Endpoint.",
                "label": "Inference ID",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "access_key": {
                "description": "A valid AWS access key that has permissions to use Amazon Bedrock.",
                "label": "Access Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "target_variant": {
                "description": "The production variant when calling the SageMaker Endpoint",
                "label": "Target Variant",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "api": {
                "description": "The API format that your SageMaker Endpoint expects.",
                "label": "API",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "region": {
                "description": "The AWS region that your model or ARN is deployed in.",
                "label": "Region",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "target_container_hostname": {
                "description": "The hostname of the container when calling a SageMaker multi-container Endpoint.",
                "label": "Target Container Hostname",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            }
        }
    },
    {
        "service": "amazonbedrock",
        "name": "Amazon Bedrock",
        "task_types": [
            "text_embedding",
            "completion"
        ],
        "configurations": {
            "secret_key": {
                "description": "A valid AWS secret key that is paired with the access_key.",
                "label": "Secret Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "provider": {
                "description": "The model provider for your deployment.",
                "label": "Provider",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "access_key": {
                "description": "A valid AWS access key that has permissions to use Amazon Bedrock.",
                "label": "Access Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "model": {
                "description": "The base model ID or an ARN to a custom model based on a foundational model.",
                "label": "Model",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "By default, the amazonbedrock service sets the number of requests allowed per minute to 240.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "region": {
                "description": "The region that your model or ARN is deployed in.",
                "label": "Region",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "dimensions": {
                "description": "The number of dimensions the resulting embeddings should have. For more information refer to https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-embed-text.html.",
                "label": "Dimensions",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding"
                ]
            }
        }
    },
    {
        "service": "anthropic",
        "name": "Anthropic",
        "task_types": [
            "completion"
        ],
        "configurations": {
            "api_key": {
                "description": "API Key for the provider you're connecting to.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "By default, the anthropic service sets the number of requests allowed per minute to 50.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "completion"
                ]
            },
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "completion"
                ]
            }
        }
    },
    {
        "service": "azureaistudio",
        "name": "Azure AI Studio",
        "task_types": [
            "text_embedding",
            "completion"
        ],
        "configurations": {
            "endpoint_type": {
                "description": "Specifies the type of endpoint that is used in your model deployment.",
                "label": "Endpoint Type",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "provider": {
                "description": "The model provider for your deployment.",
                "label": "Provider",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "api_key": {
                "description": "API Key for the provider you're connecting to.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "Minimize the number of rate limit errors.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "target": {
                "description": "The target URL of your Azure AI Studio model deployment.",
                "label": "Target",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "dimensions": {
                "description": "The number of dimensions the resulting embeddings should have. For more information refer to https://learn.microsoft.com/en-us/azure/ai-studio/reference/reference-model-inference-embeddings.",
                "label": "Dimensions",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding"
                ]
            }
        }
    },
    {
        "service": "azureopenai",
        "name": "Azure OpenAI",
        "task_types": [
            "text_embedding",
            "completion"
        ],
        "configurations": {
            "api_key": {
                "description": "You must provide either an API key or an Entra ID.",
                "label": "API Key",
                "required": false,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "entra_id": {
                "description": "You must provide either an API key or an Entra ID.",
                "label": "Entra ID",
                "required": false,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "The azureopenai service sets a default number of requests allowed per minute depending on the task type.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "deployment_id": {
                "description": "The deployment name of your deployed models.",
                "label": "Deployment ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "resource_name": {
                "description": "The name of your Azure OpenAI resource.",
                "label": "Resource Name",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "api_version": {
                "description": "The Azure API version ID to use.",
                "label": "API Version",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "dimensions": {
                "description": "The number of dimensions the resulting embeddings should have. For more information refer to https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#request-body-1.",
                "label": "Dimensions",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding"
                ]
            }
        }
    },
    {
        "service": "cohere",
        "name": "Cohere",
        "task_types": [
            "text_embedding",
            "rerank",
            "completion"
        ],
        "configurations": {
            "api_key": {
                "description": "API Key for the provider you're connecting to.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "rerank",
                    "completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "Minimize the number of rate limit errors.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "rerank",
                    "completion"
                ]
            },
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "rerank",
                    "completion"
                ]
            }
        }
    },
    {
        "service": "deepseek",
        "name": "DeepSeek",
        "task_types": [
            "completion",
            "chat_completion"
        ],
        "configurations": {
            "api_key": {
                "description": "The DeepSeek API authentication key. For more details about generating DeepSeek API keys, refer to https://api-docs.deepseek.com.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "completion",
                    "chat_completion"
                ]
            },
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "completion",
                    "chat_completion"
                ]
            },
            "url": {
                "default_value": "https://api.deepseek.com/chat/completions",
                "description": "The URL endpoint to use for the requests.",
                "label": "URL",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "completion",
                    "chat_completion"
                ]
            }
        }
    },
    {
        "service": "elasticsearch",
        "name": "Elasticsearch",
        "task_types": [
            "text_embedding",
            "sparse_embedding",
            "rerank"
        ],
        "configurations": {
            "num_allocations": {
                "default_value": 1,
                "description": "The total number of allocations this model is assigned across machine learning nodes.",
                "label": "Number Allocations",
                "required": true,
                "sensitive": false,
                "updatable": true,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank"
                ]
            },
            "num_threads": {
                "default_value": 2,
                "description": "Sets the number of threads used by each model allocation during inference.",
                "label": "Number Threads",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank"
                ]
            },
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank"
                ]
            }
        }
    },
    {
        "service": "googleaistudio",
        "name": "Google AI Studio",
        "task_types": [
            "text_embedding",
            "completion"
        ],
        "configurations": {
            "api_key": {
                "description": "API Key for the provider you're connecting to.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "Minimize the number of rate limit errors.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            },
            "model_id": {
                "description": "ID of the LLM you're using.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion"
                ]
            }
        }
    },
    {
        "service": "googlevertexai",
        "name": "Google Vertex AI",
        "task_types": [
            "text_embedding",
            "rerank",
            "completion",
            "chat_completion"
        ],
        "configurations": {
            "service_account_json": {
                "description": "API Key for the provider you're connecting to.",
                "label": "Credentials JSON",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "project_id": {
                "description": "The GCP Project ID which has Vertex AI API(s) enabled. For more information on the URL, refer to the {geminiVertexAIDocs}.",
                "label": "GCP Project",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "location": {
                "description": "Please provide the GCP region where the Vertex AI API(s) is enabled. For more information, refer to the {geminiVertexAIDocs}.",
                "label": "GCP Region",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "Minimize the number of rate limit errors.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "model_id": {
                "description": "ID of the LLM you're using.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            }
        }
    },
    {
        "service": "hugging_face",
        "name": "Hugging Face",
        "task_types": [
            "text_embedding",
            "sparse_embedding",
            "rerank",
            "completion",
            "chat_completion"
        ],
        "configurations": {
            "api_key": {
                "description": "API Key for the provider you're connecting to.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "Minimize the number of rate limit errors.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            },
            "url": {
                "description": "The URL endpoint to use for the requests.",
                "label": "URL",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "completion",
                    "chat_completion"
                ]
            }
        }
    },
    {
        "service": "jinaai",
        "name": "Jina AI",
        "task_types": [
            "text_embedding",
            "rerank"
        ],
        "configurations": {
            "api_key": {
                "description": "API Key for the provider you're connecting to.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "rerank"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "Minimize the number of rate limit errors.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "rerank"
                ]
            },
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "rerank"
                ]
            },
            "dimensions": {
                "description": "The number of dimensions the resulting embeddings should have. For more information refer to https://api.jina.ai/redoc#tag/embeddings/operation/create_embedding_v1_embeddings_post.",
                "label": "Dimensions",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding"
                ]
            }
        }
    },
    {
        "service": "mistral",
        "name": "Mistral",
        "task_types": [
            "text_embedding",
            "completion",
            "chat_completion"
        ],
        "configurations": {
            "api_key": {
                "description": "API Key for the provider you're connecting to.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "model": {
                "description": "Refer to the Mistral models documentation for the list of available text embedding models.",
                "label": "Model",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "Minimize the number of rate limit errors.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "max_input_tokens": {
                "description": "Allows you to specify the maximum number of tokens per input.",
                "label": "Maximum Input Tokens",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            }
        }
    },
    {
        "service": "openai",
        "name": "OpenAI",
        "task_types": [
            "text_embedding",
            "completion",
            "chat_completion"
        ],
        "configurations": {
            "api_key": {
                "description": "The OpenAI API authentication key. For more details about generating OpenAI API keys, refer to the https://platform.openai.com/account/api-keys.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "organization_id": {
                "description": "The unique identifier of your organization.",
                "label": "Organization ID",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "Default number of requests allowed per minute. For text_embedding is 3000. For completion is 500.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "url": {
                "description": "The absolute URL of the external service to send requests to.",
                "label": "URL",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "dimensions": {
                "description": "The number of dimensions the resulting embeddings should have. For more information refer to https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-dimensions.",
                "label": "Dimensions",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding"
                ]
            }
        }
    },
    {
        "service": "voyageai",
        "name": "Voyage AI",
        "task_types": [
            "text_embedding",
            "rerank"
        ],
        "configurations": {
            "api_key": {
                "description": "API Key for the provider you're connecting to.",
                "label": "API Key",
                "required": true,
                "sensitive": true,
                "updatable": true,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "rerank"
                ]
            },
            "rate_limit.requests_per_minute": {
                "description": "Minimize the number of rate limit errors.",
                "label": "Rate Limit",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "rerank"
                ]
            },
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "rerank"
                ]
            }
        }
    },
    {
        "service": "watsonxai",
        "name": "IBM watsonx",
        "task_types": [
            "text_embedding",
            "completion",
            "chat_completion"
        ],
        "configurations": {
            "project_id": {
                "description": "",
                "label": "Project ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "api_version": {
                "description": "The IBM watsonx API version ID to use.",
                "label": "API Version",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            },
            "max_input_tokens": {
                "description": "Allows you to specify the maximum number of tokens per input.",
                "label": "Maximum Input Tokens",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding"
                ]
            },
            "url": {
                "description": "",
                "label": "URL",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "completion",
                    "chat_completion"
                ]
            }
        }
    }
]

elasticsearchmachine · 2025-07-07T19:54:28Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2025-07-07T20:21:00Z

💔 Backport failed

Status	Branch	Result
❌	8.19	Commit could not be cherrypicked due to conflicts
✅	9.1

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 130739

* Removing custom service from service api * Fixing tests

jonathan-buttner · 2025-07-07T20:38:00Z

💚 All backports created successfully

Status	Branch	Result
✅	8.19

Questions ?

Please refer to the Backport tool documentation

* Removing custom service from service api * Fixing tests

* Removing custom service from service api * Fixing tests (cherry picked from commit 02b2f5e) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceGetServicesIT.java

Removing custom service from service api

8c1b7f1

jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 v9.2.0 labels Jul 7, 2025

Fixing tests

9861209

jonathan-buttner marked this pull request as ready for review July 7, 2025 19:53

dan-rubinstein approved these changes Jul 7, 2025

View reviewed changes

jonathan-buttner merged commit 02b2f5e into elastic:main Jul 7, 2025
33 checks passed

jonathan-buttner mentioned this pull request Jul 7, 2025

[9.1] [ML] Removing custom service from service api (#130739) #130762

Merged

jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Jul 7, 2025

[ML] Removing custom service from service api (elastic#130739)

178f871

* Removing custom service from service api * Fixing tests

elasticsearchmachine added the backport pending label Jul 7, 2025

jonathan-buttner mentioned this pull request Jul 7, 2025

[8.19] [ML] Removing custom service from service api (#130739) #130767

Merged

elasticsearchmachine pushed a commit that referenced this pull request Jul 7, 2025

[ML] Removing custom service from service api (#130739) (#130762)

87c906e

* Removing custom service from service api * Fixing tests

jonathan-buttner deleted the ml-disable-custom-from-service-api branch July 8, 2025 18:36

jonathan-buttner mentioned this pull request Jul 8, 2025

Add Llama support to Inference Plugin #130092

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Removing custom service from service api #130739

[ML] Removing custom service from service api #130739

Uh oh!

jonathan-buttner commented Jul 7, 2025

Uh oh!

elasticsearchmachine commented Jul 7, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 7, 2025

Uh oh!

jonathan-buttner commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ML] Removing custom service from service api #130739

[ML] Removing custom service from service api #130739

Uh oh!

Conversation

jonathan-buttner commented Jul 7, 2025

Example services response without custom service

Uh oh!

elasticsearchmachine commented Jul 7, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 7, 2025

💔 Backport failed

Uh oh!

jonathan-buttner commented Jul 7, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants