Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion charts/llm-engine/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ volumeMounts:
{{- define "llmEngine.forwarderVolumeMounts" }}
volumeMounts:
- name: config-volume
mountPath: /root/.aws/config
mountPath: /home/user/.aws/config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: We may need to parameterize this entirely.

subPath: config
- name: user-config
mountPath: /workspace/user_config
Expand Down
8 changes: 4 additions & 4 deletions charts/llm-engine/templates/service_template_config_map.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ data:
- ddtrace-run
- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this may need to be parameterized as well.

- --http
- production_threads
- --port
Expand Down Expand Up @@ -221,9 +221,9 @@ data:
- ddtrace-run
- python
- -m
- llm_engine.inference.forwarding.http_forwarder
- server.llm_engine_server.inference.forwarding.http_forwarder
- --config
- /workspace/llm_engine/llm_engine/inference/configs/service--http_forwarder.yaml
- /workspace/server/llm_engine_server/inference/configs/service--http_forwarder.yaml
- --port
- "${FORWARDER_PORT}"
- --num-workers
Expand Down Expand Up @@ -266,7 +266,7 @@ data:
- ddtrace-run
- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- --queue
- "${QUEUE}"
- --task-visibility
Expand Down
16 changes: 15 additions & 1 deletion charts/llm-engine/values_sample.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This is a YAML-formatted file.

# tag [required] is the LLM Engine docker image tag
tag: 1defd4f9c5376149e27673e154731a0c7820fe5d
tag: 41ecada1b51ce3a46bbc3190a36ed7890db370d3
# context is a user-specified deployment tag. Can be used to
context: production
image:
Expand Down Expand Up @@ -171,6 +171,20 @@ imageCache:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- name: a100
nodeSelector:
k8s.amazonaws.com/accelerator: nvidia-ampere-a100
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- name: t4
nodeSelector:
k8s.amazonaws.com/accelerator: nvidia-tesla-t4
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"

# celeryBrokerType specifies the celery broker type for async endpoints (coming soon)
celeryBrokerType: sqs
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ async def create_text_generation_inference_bundle(
schema_location="TBA",
flavor=StreamingEnhancedRunnableImageFlavor(
flavor=ModelBundleFlavorType.STREAMING_ENHANCED_RUNNABLE_IMAGE,
repository="text-generation-inference", # TODO: let user choose repo
repository="ghcr.io/huggingface/text-generation-inference", # TODO: let user choose repo
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yunfeng-scale It turns out I need to update TGI repo name in order to skip image existence check in ECR repo given the logic here

and self.docker_repository.is_repo_name(request.flavor.repository)

Is this change reasonable? Should we back propagate this back to hmi as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't hardcode this since it diverged internal / OSS code, can you add this as a parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this change makes sense

tag=framework_image_tag,
command=command,
streaming_command=command,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ data:
- ddtrace-run
- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- --queue
- "${QUEUE}"
- --task-visibility
Expand Down Expand Up @@ -383,7 +383,7 @@ data:
- ddtrace-run
- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- --queue
- "${QUEUE}"
- --task-visibility
Expand Down Expand Up @@ -805,7 +805,7 @@ data:
- ddtrace-run
- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- --http
- production_threads
- --port
Expand Down Expand Up @@ -1071,7 +1071,7 @@ data:
- ddtrace-run
- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- --http
- production_threads
- --port
Expand Down Expand Up @@ -1473,9 +1473,9 @@ data:
- ddtrace-run
- python
- -m
- llm_engine.inference.forwarding.http_forwarder
- server.llm_engine_server.inference.forwarding.http_forwarder
- --config
- /workspace/llm_engine/llm_engine/inference/configs/service--http_forwarder.yaml
- /workspace/server/llm_engine_server/inference/configs/service--http_forwarder.yaml
- --port
- "${FORWARDER_PORT}"
- --num-workers
Expand Down Expand Up @@ -1712,7 +1712,7 @@ data:
- ddtrace-run
- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- --queue
- "${QUEUE}"
- --task-visibility
Expand Down Expand Up @@ -1987,7 +1987,7 @@ data:
- ddtrace-run
- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- --queue
- "${QUEUE}"
- --task-visibility
Expand Down Expand Up @@ -2421,7 +2421,7 @@ data:
- ddtrace-run
- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- --http
- production_threads
- --port
Expand Down Expand Up @@ -2693,7 +2693,7 @@ data:
- ddtrace-run
- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- --http
- production_threads
- --port
Expand Down Expand Up @@ -3107,9 +3107,9 @@ data:
- ddtrace-run
- python
- -m
- llm_engine.inference.forwarding.http_forwarder
- server.llm_engine_server.inference.forwarding.http_forwarder
- --config
- /workspace/llm_engine/llm_engine/inference/configs/service--http_forwarder.yaml
- /workspace/server/llm_engine_server/inference/configs/service--http_forwarder.yaml
- --port
- "${FORWARDER_PORT}"
- --num-workers
Expand Down