GGUF models inference speed - Why is GGUF model inference fast on my Mac but slow on cluster? #7715

eltonjohnfanboy · 2024-06-03T15:42:09Z

Hi guys!

I've noticed that GGUF model inference is much faster on my Mac M3 compared to my college's cluster, even when I request for 8 or 16 cores. Both systems run the same GGUF model version and dependencies. The inference in the MAC takes seconds while in the cluster it can take up to 1 hour to generate the response.
Are there known issues with GGUF models on certain CPUs? Any help would be greatly appreciated. Thank you!

Repository owner locked and limited conversation to collaborators Jun 3, 2024

JohannesGaessler converted this issue into discussion #7717 Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

GGUF models inference speed - Why is GGUF model inference fast on my Mac but slow on cluster? #7715

GGUF models inference speed - Why is GGUF model inference fast on my Mac but slow on cluster? #7715

eltonjohnfanboy commented Jun 3, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

GGUF models inference speed - Why is GGUF model inference fast on my Mac but slow on cluster? #7715

GGUF models inference speed - Why is GGUF model inference fast on my Mac but slow on cluster? #7715

Comments

eltonjohnfanboy commented Jun 3, 2024

This issue was moved to a discussion.