Hi,
I've been testing your vLLM backend and have found the prefill phase to be quite slow.
I understand that vLLM enables torch.compile by default. However, in my tests, I've observed that the prefill time is nearly the same whether torch.compile is enabled or disabled.
I would like to ask if there are any methods to significantly boost performance with torch.compile enabled. Alternatively, are there any other configurations or optimizations I can apply to improve the performance of this vLLM version?
For context, I am using an H20 GPU.
Thanks for your help