Skip to content

Error: Benchmarking gpt-oss-120b crashes because of a too strict data type validation #24

@mosalov

Description

@mosalov

Steps to reproduce

Get the model

Download from HF: https://huggingface.co/openai/gpt-oss-120b.
Set the path:

export MODEL_PATH=...

Apply the workaround for openai_harmony

See details here: vllm-project/vllm#22525.
Set the path:

export ENCODINGS_PATH=...

Run the server

docker run \
	--rm -it --network host \
	-v $MODEL_PATH:$MODEL_PATH:ro \
	-v $ENCODINGS_PATH:$ENCODINGS_PATH:ro \
	--env TIKTOKEN_ENCODINGS_BASE=$ENCODINGS_PATH \
	vllm/vllm-openai:v0.11.0 \
		--model $MODEL_PATH \
		--tensor-parallel-size 2 \
		--gpu-memory-utilization=0.9 \
		--max-num-seqs 64 \
		--compilation-config '{"cudagraph_mode":"PIECEWISE"}' \
		--async-scheduling \
		--no-enable-prefix-caching \
		--cuda-graph-sizes 2048 \
		--max-num-batched-tokens 8192 \
		--max-model-len 10240 \
		--swap-space 16

Run the benchmark

inference-endpoint benchmark offline \
	--endpoint http://localhost:8000 \
	--model $MODEL_PATH \
	--dataset tests/datasets/dummy_1k.pkl

Expectation

The benchmark runs and finishes correctly.

Reality

The benchmark fails with:

Exception: 1 validation error for CreateChatCompletionResponse
choices.0.message.content
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.11/v/string_type

Suggested fix

Less strict validation:

diff --git a/src/inference_endpoint/openai/openai_types_gen.py b/src/inference_endpoint/openai/openai_types_gen.py
index c28e5a0..8465917 100644
--- a/src/inference_endpoint/openai/openai_types_gen.py
+++ b/src/inference_endpoint/openai/openai_types_gen.py
@@ -1005,7 +1005,7 @@ class Audio1(BaseModel):


 class ChatCompletionResponseMessage(BaseModel):
-    content: str = Field(..., description='The contents of the message.')
+    content: Optional[str] = Field(None, description='The contents of the message.')
     refusal: str = Field(..., description='The refusal message generated by the model.')
     tool_calls: Optional[ChatCompletionMessageToolCalls] = None
     annotations: Optional[List[Annotation]] = Field(

With the fix I can successfully run the benchmark.

Metadata

Metadata

Assignees

Labels

priority: P1High — must address this cycletype: bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions