Open
Description
Is your feature request related to a problem? Please describe.
Currently we are using llama.cpp for apple silicon. I wonder if we can use mlx developed by Apple instead so it will get more performance and better ultilizing apple hardware?
Describe the solution you'd like
Use mlx to run inference on apple silicon
Describe alternatives you've considered
Additional context