Skip to content

Conversation

mattf
Copy link
Collaborator

@mattf mattf commented Sep 10, 2025

What does this PR do?

update vLLM inference provider to use OpenAIMixin for openai-compat functions

inference recordings from Qwen3-0.6B and vLLM 0.8.3 -

docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \
    vllm/vllm-openai:latest \
    --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes

Test Plan

./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference

…ompat functions

inference recordings from Qwen3-0.6B and vLLM 0.8.3 -
```
docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \
    vllm/vllm-openai:latest \
    --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes
```

test with -

```
./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference
```
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 10, 2025
@mattf mattf marked this pull request as ready for review September 10, 2025 18:27
Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@ashwinb
Copy link
Contributor

ashwinb commented Sep 10, 2025

wait, we need to agree about the models we use for vllm testing with others who do this also -- cc @derekhiggins @bbrowning if there are thoughts

@derekhiggins
Copy link
Contributor

wait, we need to agree about the models we use for vllm testing with others who do this also -- cc @derekhiggins @bbrowning if there are thoughts

With #3128 my intent was to test the provider code, so long the the model used can fit into the CI workers and is smart enough pass the integration tests I've no particular preference.

In the CI job I've been hoping to merge I'm using Llama-3.2-1B-Instruct but I'm not against changing it if another model makes more sense ( the model is baked into the vllm image)

@mattf
Copy link
Collaborator Author

mattf commented Sep 11, 2025

wait, we need to agree about the models we use for vllm testing with others who do this also -- cc @derekhiggins @bbrowning if there are thoughts

With #3128 my intent was to test the provider code, so long the the model used can fit into the CI workers and is smart enough pass the integration tests I've no particular preference.

In the CI job I've been hoping to merge I'm using Llama-3.2-1B-Instruct but I'm not against changing it if another model makes more sense ( the model is baked into the vllm image)

what i could have used was a way to override the suite model.

@mattf mattf merged commit 8ef1189 into llamastack:main Sep 11, 2025
22 checks passed
iamemilio pushed a commit to iamemilio/llama-stack that referenced this pull request Sep 24, 2025
…ompat functions (llamastack#3404)

# What does this PR do?

update vLLM inference provider to use OpenAIMixin for openai-compat
functions

inference recordings from Qwen3-0.6B and vLLM 0.8.3 -
```
docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \
    vllm/vllm-openai:latest \
    --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes
```

## Test Plan

```
./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants