chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions #3404

mattf · 2025-09-10T18:25:01Z

What does this PR do?

update vLLM inference provider to use OpenAIMixin for openai-compat functions

inference recordings from Qwen3-0.6B and vLLM 0.8.3 -

docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \
    vllm/vllm-openai:latest \
    --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes

Test Plan

./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference

…ompat functions inference recordings from Qwen3-0.6B and vLLM 0.8.3 - ``` docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \ vllm/vllm-openai:latest \ --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes ``` test with - ``` ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference ```

ashwinb

very nice!

ashwinb · 2025-09-10T18:52:39Z

wait, we need to agree about the models we use for vllm testing with others who do this also -- cc @derekhiggins @bbrowning if there are thoughts

derekhiggins · 2025-09-11T11:59:33Z

wait, we need to agree about the models we use for vllm testing with others who do this also -- cc @derekhiggins @bbrowning if there are thoughts

With #3128 my intent was to test the provider code, so long the the model used can fit into the CI workers and is smart enough pass the integration tests I've no particular preference.

In the CI job I've been hoping to merge I'm using Llama-3.2-1B-Instruct but I'm not against changing it if another model makes more sense ( the model is baked into the vllm image)

mattf · 2025-09-11T12:59:25Z

wait, we need to agree about the models we use for vllm testing with others who do this also -- cc @derekhiggins @bbrowning if there are thoughts

With #3128 my intent was to test the provider code, so long the the model used can fit into the CI workers and is smart enough pass the integration tests I've no particular preference.

In the CI job I've been hoping to merge I'm using Llama-3.2-1B-Instruct but I'm not against changing it if another model makes more sense ( the model is baked into the vllm image)

what i could have used was a way to override the suite model.

…ompat functions (llamastack#3404) # What does this PR do? update vLLM inference provider to use OpenAIMixin for openai-compat functions inference recordings from Qwen3-0.6B and vLLM 0.8.3 - ``` docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \ vllm/vllm-openai:latest \ --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes ``` ## Test Plan ``` ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference ```

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 10, 2025

mattf marked this pull request as ready for review September 10, 2025 18:27

mattf requested review from ashwinb, yanxi0830, hardikjshah, raghotham, ehhuang, terrytangyuan, leseb, bbrowning, reluctantfuturist and slekkala1 as code owners September 10, 2025 18:27

mattf mentioned this pull request Sep 10, 2025

Standardize Inference Providers to Use OpenAIMixin #3387

Open

ashwinb approved these changes Sep 10, 2025

View reviewed changes

derekhiggins mentioned this pull request Sep 11, 2025

ci: Add vLLM support to integration testing infrastructure #3128

Open

revert from Qwen/Qwen3-0.6B

8ed3527

mattf merged commit 8ef1189 into llamastack:main Sep 11, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions #3404

chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions #3404

Uh oh!

mattf commented Sep 10, 2025

Uh oh!

ashwinb left a comment

Uh oh!

ashwinb commented Sep 10, 2025

Uh oh!

derekhiggins commented Sep 11, 2025

Uh oh!

mattf commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions #3404

chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions #3404

Uh oh!

Conversation

mattf commented Sep 10, 2025

What does this PR do?

Test Plan

Uh oh!

ashwinb left a comment

Choose a reason for hiding this comment

Uh oh!

ashwinb commented Sep 10, 2025

Uh oh!

derekhiggins commented Sep 11, 2025

Uh oh!

mattf commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!