Skip to content

Conversation

@eicherseiji
Copy link
Contributor

@eicherseiji eicherseiji commented Jun 14, 2025

Why are these changes needed?

Failing due to #53824. Relatively rare configuration and and likely downstream of vLLM: vllm-project/vllm#18851.

Related issue number

Revisit once #53824, vllm-project/vllm#18851 is closed or PyTorch 2.8, when vLLM will no longer need to monkeypatch to access torch compile.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copilot AI review requested due to automatic review settings June 14, 2025 01:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR temporarily disables the TP=2 VLM batch test due to a known torch_compile_cache issue (ray#53824).

  • Commented out the Qwen/Qwen2.5-VL-3B-Instruct test case
  • Added a TODO to re-enable the test once the underlying issue is resolved

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Jun 14, 2025
@eicherseiji eicherseiji requested a review from kouroshHakha June 16, 2025 16:20
Comment on lines +196 to +199
# todo(seiji): Commenting out due to https://github.com/ray-project/ray/issues/53824
# Need to follow up once torch_compile_cache issue is fixed or PyTorch 2.8
if model_source == "mistral-community/pixtral-12b":
pytest.skip("Skipping test due to torch_compile_cache issue")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Would enforce_eager=True in the engine kwargs make it work? I think that'd be better than skipping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a shot, failed with error RuntimeError: Worker failed with error 'NoTritonConfigsError: No valid triton configs.

Traceback:

(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) `ptxas` stderr: [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ptxas /tmp/tmpbc993top.ptx, line 113; fatal   : Parsing error near '.': syntax error [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ptxas fatal   : Ptx assembly aborted due to errors [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Repro command: /home/ray/anaconda3/lib/python3.11/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_89 /tmp/tmpbc993top.ptx -o /tmp/tmpbc993top.ptx.o [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Traceback (most recent call last): [repeated 44x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) (VllmWorker rank=0 pid=12628) [rank0]:E0616 09:44:59.624000 12628 site-packages/torch/_inductor/runtime/triton_heuristics.py:539] [0/0]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 6x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Task exception was never retrieved [repeated 25x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) future: <Task finished name='Task-4' coro=<vLLMEngineWrapper.generate_async() done, defined at /home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py:303> exception=EngineDeadError('EngineCore encountered an issue. See stack trace (above) for the root cause.')> [repeated 25x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py", line 317, in generate_async [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     output = await self._generate_async(request) [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py", line 399, in generate_async_v1 [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     async for request_output in stream: [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 315, in generate [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     out = q.get_nowait() or await q.get() [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)                             ^^^^^^^^^^^^^ [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/output_processor.py", line 51, in get [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     raise output [repeated 433x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 366, in output_handler [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     outputs = await engine_core.get_output_async() [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 30x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 806, in get_output_async [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     raise self._format_exception(outputs) from None [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "python/ray/_raylet.pyx", line 1392, in ray._raylet.execute_streaming_generator_sync [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     for output in gen: [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/actor_pool_map_operator.py", line 469, in submit [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     yield from _map_task( [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 557, in _map_task [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     for b_out in map_transformer.apply_transform(iter(blocks), ctx): [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 327, in __call__ [repeated 48x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     for data in iter: [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 211, in _udf_timed_iter [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     output = next(input) [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)              ^^^^^^^^^^^ [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     yield from self._batch_fn(input, ctx) [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 485, in transform_fn [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     raise out_item [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Process EngineCore_0:
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     self.run()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     self._target(*self._args, **self._kwargs)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 493, in run_engine_core [repeated 2x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     raise e
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     engine_core.run_busy_loop()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 520, in run_busy_loop
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     self._process_engine_step()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 545, in _process_engine_step
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     outputs = self.step_fn()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)               ^^^^^^^^^^^^^^
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 226, in step
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     model_output = self.execute_model(scheduler_output)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 158, in execute_model [repeated 3x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     raise err
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     return self.model_executor.execute_model(scheduler_output)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     (output, ) = self.collective_rpc("execute_model",
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     result = get_response(w, dequeue_timeout)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)   File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90)     raise RuntimeError(
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) RuntimeError: Worker failed with error 'NoTritonConfigsError: No valid triton configs. PTXASError: PTXAS error: Internal Triton PTX codegen error
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

@kouroshHakha kouroshHakha changed the title Disable TP=2 VLM batch test [Serve.llm] Disable TP=2 VLM batch test Jun 16, 2025
@kouroshHakha kouroshHakha merged commit 4b1c11f into ray-project:master Jun 16, 2025
6 checks passed
elliot-barn pushed a commit that referenced this pull request Jun 18, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
minerharry pushed a commit to minerharry/ray that referenced this pull request Jun 27, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Jul 2, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants