vLLM Prefill Performance: torch.compile Shows No Significant Improvement

Hi,

I've been testing your vLLM backend and have found the prefill phase to be quite slow.

I understand that vLLM enables torch.compile by default. However, in my tests, I've observed that the prefill time is nearly the same whether torch.compile is enabled or disabled.

I would like to ask if there are any methods to significantly boost performance with torch.compile enabled. Alternatively, are there any other configurations or optimizations I can apply to improve the performance of this vLLM version?

For context, I am using an H20 GPU.

Thanks for your help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM Prefill Performance: torch.compile Shows No Significant Improvement #68

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vLLM Prefill Performance: torch.compile Shows No Significant Improvement #68

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions