You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed that GGUF model inference is much faster on my Mac M3 compared to my college's cluster, even when I request for 8 or 16 cores. Both systems run the same GGUF model version and dependencies. The inference in the MAC takes seconds while in the cluster it can take up to 1 hour to generate the response.
Are there known issues with GGUF models on certain CPUs? Any help would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered:
Repository owner
locked and limited conversation to collaborators
Jun 3, 2024
Hi guys!
I've noticed that GGUF model inference is much faster on my Mac M3 compared to my college's cluster, even when I request for 8 or 16 cores. Both systems run the same GGUF model version and dependencies. The inference in the MAC takes seconds while in the cluster it can take up to 1 hour to generate the response.
Are there known issues with GGUF models on certain CPUs? Any help would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered: