diff --git a/charts/llm-engine/values_sample.yaml b/charts/llm-engine/values_sample.yaml index 06d70362..70d740cf 100644 --- a/charts/llm-engine/values_sample.yaml +++ b/charts/llm-engine/values_sample.yaml @@ -96,7 +96,7 @@ config: # k8s_cluster_name [required] is the name of the k8s cluster k8s_cluster_name: main_cluster # dns_host_domain [required] is the domain name of the k8s cluster - dns_host_domain: domain.llm-engine.com + dns_host_domain: llm-engine.domain.com # default_region [required] is the default AWS region for various resources (e.g ECR) default_region: us-east-1 # aws_account_id [required] is the AWS account ID for various resources (e.g ECR) diff --git a/docs/guides/self_hosting.md b/docs/guides/self_hosting.md index 8c6c963b..0c446191 100644 --- a/docs/guides/self_hosting.md +++ b/docs/guides/self_hosting.md @@ -200,4 +200,12 @@ $ curl -X POST 'http://localhost:5000/v1/llm/completions-sync?model_endpoint_nam You should get a response similar to: ``` {"status":"SUCCESS","outputs":[{"text":". Tell me a joke about AI. Tell me a joke about AI. Tell me a joke about AI. Tell me","num_completion_tokens":30}],"traceback":null} +``` + +### Pointing LLM Engine client to use self-hosted infrastructure +The `llmengine` client makes requests to Scale AI's hosted infrastructure by default. You can have `llmengine` client make requests to your own self-hosted infrastructure by setting the `LLM_ENGINE_BASE_PATH` environment variable to the URL of the `llm-engine` service. + +The exact URL of `llm-engine` service depends on your Kubernetes cluster networking setup. The domain is specified at `config.values.infra.dns_host_domain` in the helm chart values config file. Using `charts/llm-engine/values_sample.yaml` as an example, you would do: +```bash +export LLM_ENGINE_BASE_PATH=https://llm-engine.domain.com ``` \ No newline at end of file