Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

Open
Interpause opened this issue Apr 16, 2023 · 4 comments
Open

Comments

@Interpause
Copy link

This is just an idea for you. Most modern smartphones come with some form of AI accelerator. I am aware GGML-based projects like llama.cpp can compile and run on mobile devices, but there is probably performance left on the table. I think there is right now a gap for an mobile-optimized AI inference library with quantization support and the other tricks present in GGML. For reference: https://developer.android.com/ndk/guides/neuralnetworks

@Interpause Interpause changed the title Use Android NNAPI to accelerate inference on Android Devices [Idea]: Use Android NNAPI to accelerate inference on Android Devices Apr 16, 2023
@Saghetti0
Copy link

Would love to see this as well!

@ggerganov
Copy link
Owner

If there is community help, we can try to add support for NNAPI. Currently, I don't have enough capacity to investigate this, but I think it is something interesting and can unlock many applications. Probably will look into this in the future and hoping there are some contributions in the meantime

@rhjdvsgsgks
Copy link

im trying to write a nnapi backend (well, you should not expect my work. because im a completely newbie. and mostly wont have any success). but after some document reading. i found that unlike cl or vk, nnapi didn't provide a way to use accelerated matrix multiply or some shader like stuff to compute something in gpu. the only things you can do with it is upload a graph of how layers connected (include operand and weight). so seems like it not very match the architecture llama.cpp current have? if no, please point me a backend using similar architecture so that i can have reference

@pax-k
Copy link

pax-k commented Mar 27, 2024

@ggerganov maybe it's worth checking NNAPI using ONNX runtime? WhisperRN runs smooth with CoreML, but on Android, even the tiny model is way too laggy to be usable on a budget device (for example Samsung a14, 4 GB RAM)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants