[Bug]: qwen36 infinite loop issue

### Problem Description

docker run --gpus all --rm --name qwen36-35 -p 8080:8000 -v ~/.cache/huggingface:/root/.cache/huggingface --ipc=host -e HUGGING_FACE_HUB_TOKEN="$HUGGING_FACE_HUB_TOKEN" -e VLLM_ATTENTION_BACKEND=FLASH_ATTN vllm-nightly-transformers-main --model Intel/Qwen3.6-35B-A3B-int4-AutoRound --served-model-name "qwen/qwen36-35b" --trust-remote-code --api-key mumu-102495153 --max-model-len 192382 --max-num-seqs 4 --gpu-memory-utilization 0.98 --enable-auto-tool-choice --tool-call-parser qwen3_coder --kv-cache-dtype fp8 --reasoning-parser qwen3 --max-num-batched-tokens 8192 --enable-prefix-caching

[lianglv](https://huggingface.co/lianglv)
Intel org
[2 days ago](https://huggingface.co/Intel/Qwen3.6-35B-A3B-int4-AutoRound/discussions/2#69e836aec1c6db88a6032d0f)

We don't have your docker image. Could you provide minimal steps to reproduce the infinite loop issue?

[wenhuach](https://huggingface.co/wenhuach)
Intel org
[2 days ago](https://huggingface.co/Intel/Qwen3.6-35B-A3B-int4-AutoRound/discussions/2#69e8b30673fdd5df4e803f74)

Does this issue occur for all prompts, or only for specific ones? We would appreciate it if you could share some example prompts that reproduce the issue.

[pathosethoslogos](https://huggingface.co/pathosethoslogos)
[2 days ago](https://huggingface.co/Intel/Qwen3.6-35B-A3B-int4-AutoRound/discussions/2#69e8b82cad91c4fe961fcb40)
•
[edited 2 days ago](https://huggingface.co/Intel/Qwen3.6-35B-A3B-int4-AutoRound/discussions/2#69e8b82cad91c4fe961fcb40)

I can confirm, indeed this is the case.

You can tell by the high number of downloads and low number of hearts for this model.

[zsmweb](https://huggingface.co/zsmweb)
[about 21 hours ago](https://huggingface.co/Intel/Qwen3.6-35B-A3B-int4-AutoRound/discussions/2#69e9fc9ce8513d89d2814b83)

I use an RTX 3090 with 24GB to run the model. Not every conversation gets stuck in a loop. I use CherryStudio to check the weather.
Here is my Dockerfile.

cat Dockerfile
FROM nvidia/cuda:12.4.1-devel-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive
ENV PIP_NO_CACHE_DIR=1
ENV PYTHONUNBUFFERED=1

RUN apt-get update && apt-get install -y
python3 python3-pip git
&& rm -rf /var/lib/apt/lists/*

RUN python3 -m pip install --upgrade pip setuptools wheel

RUN python3 -m pip install -U
vllm --pre
--index-url https://pypi.org/simple
--extra-index-url https://wheels.vllm.ai/nightly

RUN python3 -m pip install -U
git+https://github.com/huggingface/transformers.git

RUN python3 -m pip install conch-triton-kernels

ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]

### Reproduction Steps

~

### Environment Information

_No response_

### Error Logs

```shell

```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: qwen36 infinite loop issue #1737

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: qwen36 infinite loop issue #1737

Description

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions