-
Notifications
You must be signed in to change notification settings - Fork 67
Can create LLM endpoints #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can create LLM endpoints #132
Conversation
| flavor=StreamingEnhancedRunnableImageFlavor( | ||
| flavor=ModelBundleFlavorType.STREAMING_ENHANCED_RUNNABLE_IMAGE, | ||
| repository="text-generation-inference", # TODO: let user choose repo | ||
| repository="ghcr.io/huggingface/text-generation-inference", # TODO: let user choose repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yunfeng-scale It turns out I need to update TGI repo name in order to skip image existence check in ECR repo given the logic here
| and self.docker_repository.is_repo_name(request.flavor.repository) |
Is this change reasonable? Should we back propagate this back to hmi as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't hardcode this since it diverged internal / OSS code, can you add this as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this change makes sense
…github.com/scaleapi/llm-engine into fix-llm-engine-image-cache-startup-failure
| volumeMounts: | ||
| - name: config-volume | ||
| mountPath: /root/.aws/config | ||
| mountPath: /home/user/.aws/config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: We may need to parameterize this entirely.
| - run-service | ||
| - --config | ||
| - /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME} | ||
| - /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this may need to be parameterized as well.
song-william
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline that we will parameterize these values in a later PR. Merging as is.
Summary
a100andt4devices in values_sample.yaml that preventllm-engine-cacherfrom starting successfully.http-forwardercontainer python command path for streaming endpointTest Plan