-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: add dynamic model registration support to TGI inference #3417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
add new overwrite_completion_id feature to OpenAIMixin to deal with TGI always returning id="" test with - tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference --model-id Qwen/Qwen3-0.6B` stack: `TGI_URL=http://localhost:8080 uv run llama stack build --image-type venv --distro ci-tests --run` test: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --setup tgi --subdirs inference --pattern openai`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds dynamic model registration support to Text Generation Inference (TGI) provider by implementing OpenAI-compatible endpoints and introducing a new feature to handle TGI's empty response IDs.
Key changes:
- Enhanced TGI provider with OpenAI compatibility for dynamic model registration
- Added
overwrite_completion_id
feature to handle providers that return empty IDs - Integrated TGI support into test infrastructure with new test setup and recording files
Reviewed Changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
tests/integration/suites.py | Added "tgi" test setup configuration for integration testing |
tests/integration/recordings/responses/*.json | Test recording files capturing TGI API responses for various scenarios |
tests/integration/inference/test_openai_completion.py | Removed TGI from exclusion lists, allowing OpenAI completion tests |
llama_stack/providers/utils/inference/openai_mixin.py | Added overwrite_completion_id feature and ID generation logic |
llama_stack/providers/remote/inference/tgi/tgi.py | Refactored to inherit from OpenAIMixin and support dynamic model registration |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
…tack#3417) # What does this PR do? adds dynamic model support to TGI add new overwrite_completion_id feature to OpenAIMixin to deal with TGI always returning id="" ## Test Plan tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference --model-id Qwen/Qwen3-0.6B` stack: `TGI_URL=http://localhost:8080 uv run llama stack build --image-type venv --distro ci-tests --run` test: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --setup tgi --subdirs inference --pattern openai`
…tack#3417) # What does this PR do? adds dynamic model support to TGI add new overwrite_completion_id feature to OpenAIMixin to deal with TGI always returning id="" ## Test Plan tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference --model-id Qwen/Qwen3-0.6B` stack: `TGI_URL=http://localhost:8080 uv run llama stack build --image-type venv --distro ci-tests --run` test: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --setup tgi --subdirs inference --pattern openai`
What does this PR do?
adds dynamic model support to TGI
add new overwrite_completion_id feature to OpenAIMixin to deal with TGI always returning id=""
Test Plan
tgi:
docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference --model-id Qwen/Qwen3-0.6B
stack:
TGI_URL=http://localhost:8080 uv run llama stack build --image-type venv --distro ci-tests --run
test:
./scripts/integration-tests.sh --stack-config http://localhost:8321 --setup tgi --subdirs inference --pattern openai