Skip to content

Conversation

mattf
Copy link
Collaborator

@mattf mattf commented Sep 11, 2025

What does this PR do?

adds dynamic model support to TGI

add new overwrite_completion_id feature to OpenAIMixin to deal with TGI always returning id=""

Test Plan

tgi: docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference --model-id Qwen/Qwen3-0.6B

stack: TGI_URL=http://localhost:8080 uv run llama stack build --image-type venv --distro ci-tests --run

test: ./scripts/integration-tests.sh --stack-config http://localhost:8321 --setup tgi --subdirs inference --pattern openai

add new overwrite_completion_id feature to OpenAIMixin to deal with TGI always returning id=""

test with -

tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference --model-id Qwen/Qwen3-0.6B`

stack: `TGI_URL=http://localhost:8080 uv run llama stack build --image-type venv --distro ci-tests --run`

test: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --setup tgi --subdirs inference --pattern openai`
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 11, 2025
@mattf mattf marked this pull request as ready for review September 11, 2025 13:26
@mattf mattf requested a review from Copilot September 11, 2025 14:58
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds dynamic model registration support to Text Generation Inference (TGI) provider by implementing OpenAI-compatible endpoints and introducing a new feature to handle TGI's empty response IDs.

Key changes:

  • Enhanced TGI provider with OpenAI compatibility for dynamic model registration
  • Added overwrite_completion_id feature to handle providers that return empty IDs
  • Integrated TGI support into test infrastructure with new test setup and recording files

Reviewed Changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/integration/suites.py Added "tgi" test setup configuration for integration testing
tests/integration/recordings/responses/*.json Test recording files capturing TGI API responses for various scenarios
tests/integration/inference/test_openai_completion.py Removed TGI from exclusion lists, allowing OpenAI completion tests
llama_stack/providers/utils/inference/openai_mixin.py Added overwrite_completion_id feature and ID generation logic
llama_stack/providers/remote/inference/tgi/tgi.py Refactored to inherit from OpenAIMixin and support dynamic model registration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@mattf mattf merged commit f4ab154 into llamastack:main Sep 15, 2025
22 of 23 checks passed
wukaixingxp pushed a commit to wukaixingxp/llama-stack that referenced this pull request Sep 15, 2025
…tack#3417)

# What does this PR do?

adds dynamic model support to TGI

add new overwrite_completion_id feature to OpenAIMixin to deal with TGI
always returning id=""

## Test Plan

tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data
ghcr.io/huggingface/text-generation-inference --model-id
Qwen/Qwen3-0.6B`

stack: `TGI_URL=http://localhost:8080 uv run llama stack build
--image-type venv --distro ci-tests --run`

test: `./scripts/integration-tests.sh --stack-config
http://localhost:8321 --setup tgi --subdirs inference --pattern openai`
iamemilio pushed a commit to iamemilio/llama-stack that referenced this pull request Sep 24, 2025
…tack#3417)

# What does this PR do?

adds dynamic model support to TGI

add new overwrite_completion_id feature to OpenAIMixin to deal with TGI
always returning id=""

## Test Plan

tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data
ghcr.io/huggingface/text-generation-inference --model-id
Qwen/Qwen3-0.6B`

stack: `TGI_URL=http://localhost:8080 uv run llama stack build
--image-type venv --distro ci-tests --run`

test: `./scripts/integration-tests.sh --stack-config
http://localhost:8321 --setup tgi --subdirs inference --pattern openai`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants