You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What's the performance of the P40 using mlc-llm + CUDA?
mlc-llm is the fastest inference engine, since it compiles the LLM taking advantage of hardware specific optimizations, and the P40 is the best cost effective hardware.
we have seen seevral examples working on older cards, likely we just need to turnoff flash infer, cutlass, and also follow instruction to build tvm from source
❓ General Questions
What's the performance of the P40 using mlc-llm + CUDA?
mlc-llm is the fastest inference engine, since it compiles the LLM taking advantage of hardware specific optimizations, and the P40 is the best cost effective hardware.
This P40 has 3480 CUDA cores:
https://resources.nvidia.com/en-us-virtualization-and-gpus/p40-datasheet
Did you have difficulties using P40 via CUDA?
The text was updated successfully, but these errors were encountered: