Steps to reproduce
Get the model
Download from HF: https://huggingface.co/openai/gpt-oss-120b.
Set the path:
Apply the workaround for openai_harmony
See details here: vllm-project/vllm#22525.
Set the path:
export ENCODINGS_PATH=...
Run the server
docker run \
--rm -it --network host \
-v $MODEL_PATH:$MODEL_PATH:ro \
-v $ENCODINGS_PATH:$ENCODINGS_PATH:ro \
--env TIKTOKEN_ENCODINGS_BASE=$ENCODINGS_PATH \
vllm/vllm-openai:v0.11.0 \
--model $MODEL_PATH \
--tensor-parallel-size 2 \
--gpu-memory-utilization=0.9 \
--max-num-seqs 64 \
--compilation-config '{"cudagraph_mode":"PIECEWISE"}' \
--async-scheduling \
--no-enable-prefix-caching \
--cuda-graph-sizes 2048 \
--max-num-batched-tokens 8192 \
--max-model-len 10240 \
--swap-space 16
Run the benchmark
inference-endpoint benchmark offline \
--endpoint http://localhost:8000 \
--model $MODEL_PATH \
--dataset tests/datasets/dummy_1k.pkl
Expectation
The benchmark runs and finishes correctly.
Reality
The benchmark fails with:
Exception: 1 validation error for CreateChatCompletionResponse
choices.0.message.content
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.11/v/string_type
Suggested fix
Less strict validation:
diff --git a/src/inference_endpoint/openai/openai_types_gen.py b/src/inference_endpoint/openai/openai_types_gen.py
index c28e5a0..8465917 100644
--- a/src/inference_endpoint/openai/openai_types_gen.py
+++ b/src/inference_endpoint/openai/openai_types_gen.py
@@ -1005,7 +1005,7 @@ class Audio1(BaseModel):
class ChatCompletionResponseMessage(BaseModel):
- content: str = Field(..., description='The contents of the message.')
+ content: Optional[str] = Field(None, description='The contents of the message.')
refusal: str = Field(..., description='The refusal message generated by the model.')
tool_calls: Optional[ChatCompletionMessageToolCalls] = None
annotations: Optional[List[Annotation]] = Field(
With the fix I can successfully run the benchmark.
Steps to reproduce
Get the model
Download from HF: https://huggingface.co/openai/gpt-oss-120b.
Set the path:
Apply the workaround for
openai_harmonySee details here: vllm-project/vllm#22525.
Set the path:
Run the server
Run the benchmark
Expectation
The benchmark runs and finishes correctly.
Reality
The benchmark fails with:
Suggested fix
Less strict validation:
With the fix I can successfully run the benchmark.