[Serve.llm] Disable TP=2 VLM batch test #53825

eicherseiji · 2025-06-14T01:15:52Z

Why are these changes needed?

Failing due to #53824. Relatively rare configuration and and likely downstream of vLLM: vllm-project/vllm#18851.

Related issue number

Revisit once #53824, vllm-project/vllm#18851 is closed or PyTorch 2.8, when vLLM will no longer need to monkeypatch to access torch compile.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Copilot

Pull Request Overview

This PR temporarily disables the TP=2 VLM batch test due to a known torch_compile_cache issue (ray#53824).

Commented out the Qwen/Qwen2.5-VL-3B-Instruct test case
Added a TODO to re-enable the test once the underlying issue is resolved

release/llm_tests/batch/test_batch_vllm.py

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

kouroshHakha · 2025-06-16T16:32:37Z

release/llm_tests/batch/test_batch_vllm.py

+    # todo(seiji): Commenting out due to https://github.com/ray-project/ray/issues/53824
+    # Need to follow up once torch_compile_cache issue is fixed or PyTorch 2.8
+    if model_source == "mistral-community/pixtral-12b":
+        pytest.skip("Skipping test due to torch_compile_cache issue")


question: Would enforce_eager=True in the engine kwargs make it work? I think that'd be better than skipping?

Gave it a shot, failed with error RuntimeError: Worker failed with error 'NoTritonConfigsError: No valid triton configs.

Traceback:

(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) `ptxas` stderr: [repeated 8x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ptxas /tmp/tmpbc993top.ptx, line 113; fatal : Parsing error near '.': syntax error [repeated 8x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ptxas fatal : Ptx assembly aborted due to errors [repeated 8x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Repro command: /home/ray/anaconda3/lib/python3.11/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_89 /tmp/tmpbc993top.ptx -o /tmp/tmpbc993top.ptx.o [repeated 8x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Traceback (most recent call last): [repeated 44x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) (VllmWorker rank=0 pid=12628) [rank0]:E0616 09:44:59.624000 12628 site-packages/torch/_inductor/runtime/triton_heuristics.py:539] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 6x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Task exception was never retrieved [repeated 25x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) future: <Task finished name='Task-4' coro=<vLLMEngineWrapper.generate_async() done, defined at /home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py:303> exception=EngineDeadError('EngineCore encountered an issue. See stack trace (above) for the root cause.')> [repeated 25x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py", line 317, in generate_async [repeated 432x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) output = await self._generate_async(request) [repeated 432x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 432x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py", line 399, in generate_async_v1 [repeated 432x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) async for request_output in stream: [repeated 432x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 315, in generate [repeated 432x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) out = q.get_nowait() or await q.get() [repeated 432x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^ [repeated 432x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/output_processor.py", line 51, in get [repeated 432x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise output [repeated 433x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 366, in output_handler [repeated 26x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) outputs = await engine_core.get_output_async() [repeated 26x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 30x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 806, in get_output_async [repeated 26x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise self._format_exception(outputs) from None [repeated 26x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. [repeated 26x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "python/ray/_raylet.pyx", line 1392, in ray._raylet.execute_streaming_generator_sync [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) for output in gen: [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/actor_pool_map_operator.py", line 469, in submit [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) yield from _map_task( [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 557, in _map_task [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) for b_out in map_transformer.apply_transform(iter(blocks), ctx): [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 327, in __call__ [repeated 48x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) for data in iter: [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 211, in _udf_timed_iter [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) output = next(input) [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^ [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) yield from self._batch_fn(input, ctx) [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 485, in transform_fn [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise out_item [repeated 24x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Process EngineCore_0: (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) self.run() (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) self._target(*self._args, **self._kwargs) (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 493, in run_engine_core [repeated 2x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise e (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) engine_core.run_busy_loop() (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 520, in run_busy_loop (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) self._process_engine_step() (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 545, in _process_engine_step (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) outputs = self.step_fn() (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^ (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 226, in step (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) model_output = self.execute_model(scheduler_output) (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 158, in execute_model [repeated 3x across cluster] (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise err (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) return self.model_executor.execute_model(scheduler_output) (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) (output, ) = self.collective_rpc("execute_model", (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) result = get_response(w, dequeue_timeout) (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise RuntimeError( (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) RuntimeError: Worker failed with error 'NoTritonConfigsError: No valid triton configs. PTXASError: PTXAS error: Internal Triton PTX codegen error (MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

Copilot AI review requested due to automatic review settings June 14, 2025 01:15

Copilot AI reviewed Jun 14, 2025

View reviewed changes

release/llm_tests/batch/test_batch_vllm.py Outdated Show resolved Hide resolved

Disable TP=2 VLM batch test

22a04b1

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

eicherseiji force-pushed the disable_batch_vlm branch from cc3abea to 22a04b1 Compare June 14, 2025 01:16

Take copilot's advice

cd50cfe

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

eicherseiji added the go add ONLY when ready to merge, run all tests label Jun 14, 2025

eicherseiji requested a review from kouroshHakha June 16, 2025 16:20

kouroshHakha reviewed Jun 16, 2025

View reviewed changes

kouroshHakha approved these changes Jun 16, 2025

View reviewed changes

kouroshHakha changed the title ~~Disable TP=2 VLM batch test~~ [Serve.llm] Disable TP=2 VLM batch test Jun 16, 2025

kouroshHakha merged commit 4b1c11f into ray-project:master Jun 16, 2025
6 checks passed

elliot-barn pushed a commit that referenced this pull request Jun 18, 2025

[Serve.llm] Disable TP=2 VLM batch test (#53825)

4fc25f7

Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

minerharry pushed a commit to minerharry/ray that referenced this pull request Jun 27, 2025

[Serve.llm] Disable TP=2 VLM batch test (ray-project#53825)

1feb4b8

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

elliot-barn pushed a commit that referenced this pull request Jul 2, 2025

[Serve.llm] Disable TP=2 VLM batch test (#53825)

809e239

Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Serve.llm] Disable TP=2 VLM batch test #53825

[Serve.llm] Disable TP=2 VLM batch test #53825

Uh oh!

eicherseiji commented Jun 14, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

kouroshHakha Jun 16, 2025

Uh oh!

eicherseiji Jun 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Serve.llm] Disable TP=2 VLM batch test #53825

[Serve.llm] Disable TP=2 VLM batch test #53825

Uh oh!

Conversation

eicherseiji commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

kouroshHakha Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eicherseiji commented Jun 14, 2025 •

edited

Loading