[Question] P40 support (best cost effective hardware) #2100

kripper · 2024-04-07T05:58:12Z

❓ General Questions

What's the performance of the P40 using mlc-llm + CUDA?

mlc-llm is the fastest inference engine, since it compiles the LLM taking advantage of hardware specific optimizations, and the P40 is the best cost effective hardware.

This P40 has 3480 CUDA cores:
https://resources.nvidia.com/en-us-virtualization-and-gpus/p40-datasheet

Did you have difficulties using P40 via CUDA?

Hzfengsy · 2024-04-08T01:22:15Z

I think it should work if you turn off flashinfer and cutlass support. However, we do not have resource to optimize for such old device.

Nero10578 · 2024-05-14T09:25:59Z

I think it should work if you turn off flashinfer and cutlass support. However, we do not have resource to optimize for such old device.

If it is reported to work on other Pascal generation cards like the GTX 1060 etc. Then it should work on the P40 right?

tqchen · 2024-05-28T02:51:31Z

we have seen seevral examples working on older cards, likely we just need to turnoff flash infer, cutlass, and also follow instruction to build tvm from source

kripper added the question Question about the usage label Apr 7, 2024

tqchen closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] P40 support (best cost effective hardware) #2100

[Question] P40 support (best cost effective hardware) #2100

kripper commented Apr 7, 2024

Hzfengsy commented Apr 8, 2024

Nero10578 commented May 14, 2024

tqchen commented May 28, 2024

[Question] P40 support (best cost effective hardware) #2100

[Question] P40 support (best cost effective hardware) #2100

Comments

kripper commented Apr 7, 2024

❓ General Questions

Hzfengsy commented Apr 8, 2024

Nero10578 commented May 14, 2024

tqchen commented May 28, 2024