-
Notifications
You must be signed in to change notification settings - Fork 7k
[Serve.llm] Disable TP=2 VLM batch test #53825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve.llm] Disable TP=2 VLM batch test #53825
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR temporarily disables the TP=2 VLM batch test due to a known torch_compile_cache issue (ray#53824).
- Commented out the Qwen/Qwen2.5-VL-3B-Instruct test case
- Added a TODO to re-enable the test once the underlying issue is resolved
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
cc3abea to
22a04b1
Compare
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
| # todo(seiji): Commenting out due to https://github.com/ray-project/ray/issues/53824 | ||
| # Need to follow up once torch_compile_cache issue is fixed or PyTorch 2.8 | ||
| if model_source == "mistral-community/pixtral-12b": | ||
| pytest.skip("Skipping test due to torch_compile_cache issue") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Would enforce_eager=True in the engine kwargs make it work? I think that'd be better than skipping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a shot, failed with error RuntimeError: Worker failed with error 'NoTritonConfigsError: No valid triton configs.
Traceback:
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) `ptxas` stderr: [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ptxas /tmp/tmpbc993top.ptx, line 113; fatal : Parsing error near '.': syntax error [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ptxas fatal : Ptx assembly aborted due to errors [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Repro command: /home/ray/anaconda3/lib/python3.11/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_89 /tmp/tmpbc993top.ptx -o /tmp/tmpbc993top.ptx.o [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Traceback (most recent call last): [repeated 44x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) (VllmWorker rank=0 pid=12628) [rank0]:E0616 09:44:59.624000 12628 site-packages/torch/_inductor/runtime/triton_heuristics.py:539] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 6x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Task exception was never retrieved [repeated 25x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) future: <Task finished name='Task-4' coro=<vLLMEngineWrapper.generate_async() done, defined at /home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py:303> exception=EngineDeadError('EngineCore encountered an issue. See stack trace (above) for the root cause.')> [repeated 25x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py", line 317, in generate_async [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) output = await self._generate_async(request) [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py", line 399, in generate_async_v1 [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) async for request_output in stream: [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 315, in generate [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) out = q.get_nowait() or await q.get() [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^ [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/output_processor.py", line 51, in get [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise output [repeated 433x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 366, in output_handler [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) outputs = await engine_core.get_output_async() [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 30x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 806, in get_output_async [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise self._format_exception(outputs) from None [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "python/ray/_raylet.pyx", line 1392, in ray._raylet.execute_streaming_generator_sync [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) for output in gen: [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/actor_pool_map_operator.py", line 469, in submit [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) yield from _map_task( [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 557, in _map_task [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) for b_out in map_transformer.apply_transform(iter(blocks), ctx): [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 327, in __call__ [repeated 48x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) for data in iter: [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 211, in _udf_timed_iter [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) output = next(input) [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^ [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) yield from self._batch_fn(input, ctx) [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 485, in transform_fn [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise out_item [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Process EngineCore_0:
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) self.run()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) self._target(*self._args, **self._kwargs)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 493, in run_engine_core [repeated 2x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise e
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) engine_core.run_busy_loop()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 520, in run_busy_loop
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) self._process_engine_step()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 545, in _process_engine_step
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) outputs = self.step_fn()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 226, in step
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) model_output = self.execute_model(scheduler_output)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 158, in execute_model [repeated 3x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise err
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) return self.model_executor.execute_model(scheduler_output)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) (output, ) = self.collective_rpc("execute_model",
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) result = get_response(w, dequeue_timeout)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise RuntimeError(
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) RuntimeError: Worker failed with error 'NoTritonConfigsError: No valid triton configs. PTXASError: PTXAS error: Internal Triton PTX codegen error
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Why are these changes needed?
Failing due to #53824. Relatively rare configuration and and likely downstream of vLLM: vllm-project/vllm#18851.
Related issue number
Revisit once #53824, vllm-project/vllm#18851 is closed or PyTorch 2.8, when vLLM will no longer need to monkeypatch to access torch compile.
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.