-
Notifications
You must be signed in to change notification settings - Fork 67
Can create LLM endpoints #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
41ecada
5dbe7dd
1370d79
1a9207a
b9bae4e
09dab97
d1da5f6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -180,7 +180,7 @@ data: | |
| - ddtrace-run | ||
| - run-service | ||
| - --config | ||
| - /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME} | ||
| - /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME} | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that this may need to be parameterized as well. |
||
| - --http | ||
| - production_threads | ||
| - --port | ||
|
|
@@ -221,9 +221,9 @@ data: | |
| - ddtrace-run | ||
| - python | ||
| - -m | ||
| - llm_engine.inference.forwarding.http_forwarder | ||
| - server.llm_engine_server.inference.forwarding.http_forwarder | ||
| - --config | ||
| - /workspace/llm_engine/llm_engine/inference/configs/service--http_forwarder.yaml | ||
| - /workspace/server/llm_engine_server/inference/configs/service--http_forwarder.yaml | ||
| - --port | ||
| - "${FORWARDER_PORT}" | ||
| - --num-workers | ||
|
|
@@ -266,7 +266,7 @@ data: | |
| - ddtrace-run | ||
| - run-service | ||
| - --config | ||
| - /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME} | ||
| - /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME} | ||
| - --queue | ||
| - "${QUEUE}" | ||
| - --task-visibility | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -227,7 +227,7 @@ async def create_text_generation_inference_bundle( | |||
| schema_location="TBA", | ||||
| flavor=StreamingEnhancedRunnableImageFlavor( | ||||
| flavor=ModelBundleFlavorType.STREAMING_ENHANCED_RUNNABLE_IMAGE, | ||||
| repository="text-generation-inference", # TODO: let user choose repo | ||||
| repository="ghcr.io/huggingface/text-generation-inference", # TODO: let user choose repo | ||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @yunfeng-scale It turns out I need to update TGI repo name in order to skip image existence check in ECR repo given the logic here
Is this change reasonable? Should we back propagate this back to hmi as well?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we shouldn't hardcode this since it diverged internal / OSS code, can you add this as a parameter?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes this change makes sense |
||||
| tag=framework_image_tag, | ||||
| command=command, | ||||
| streaming_command=command, | ||||
|
|
||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: We may need to parameterize this entirely.