You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yup, it's based on top of ggml and although the rust implementation is a slight deviation fro llama.cpp, it's likely portable. We can use Rust macros to deliver an aarch64 dedicated GPU inferencing as well
Since rustformers/llm@47a41c9 Metal support is in the main branch for llama based architectures, it can be enabled by compiling with the metal --features flag and by setting use-gpu to true in the model-config.
The latest version of llama.cpp has huge improvements on Mac hardware.
Could this be implemented here as well?
ggerganov/llama.cpp#1642
The text was updated successfully, but these errors were encountered: