[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

Interpause · 2023-04-16T13:40:25Z

This is just an idea for you. Most modern smartphones come with some form of AI accelerator. I am aware GGML-based projects like llama.cpp can compile and run on mobile devices, but there is probably performance left on the table. I think there is right now a gap for an mobile-optimized AI inference library with quantization support and the other tricks present in GGML. For reference: https://developer.android.com/ndk/guides/neuralnetworks

Saghetti0 · 2023-11-15T03:09:49Z

Would love to see this as well!

ggerganov · 2023-11-19T16:49:41Z

If there is community help, we can try to add support for NNAPI. Currently, I don't have enough capacity to investigate this, but I think it is something interesting and can unlock many applications. Probably will look into this in the future and hoping there are some contributions in the meantime

rhjdvsgsgks · 2023-11-27T17:50:31Z

im trying to write a nnapi backend (well, you should not expect my work. because im a completely newbie. and mostly wont have any success). but after some document reading. i found that unlike cl or vk, nnapi didn't provide a way to use accelerated matrix multiply or some shader like stuff to compute something in gpu. the only things you can do with it is upload a graph of how layers connected (include operand and weight). so seems like it not very match the architecture llama.cpp current have? if no, please point me a backend using similar architecture so that i can have reference

pax-k · 2024-03-27T15:50:34Z

@ggerganov maybe it's worth checking NNAPI using ONNX runtime? WhisperRN runs smooth with CoreML, but on Android, even the tiny model is way too laggy to be usable on a budget device (for example Samsung a14, 4 GB RAM)

Interpause changed the title ~~Use Android NNAPI to accelerate inference on Android Devices~~ [Idea]: Use Android NNAPI to accelerate inference on Android Devices Apr 16, 2023

Digipom mentioned this issue Sep 5, 2023

whisper.cpp should support NNAPI on Android ggerganov/whisper.cpp#1249

Open

zhouwg mentioned this issue Mar 4, 2024

PoC:clean-room implementation of real-time AI subtitle for English online-TV(OTT TV) zhouwg/kantv#64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

Interpause commented Apr 16, 2023

Saghetti0 commented Nov 15, 2023

ggerganov commented Nov 19, 2023

rhjdvsgsgks commented Nov 27, 2023

pax-k commented Mar 27, 2024

[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

Comments

Interpause commented Apr 16, 2023

Saghetti0 commented Nov 15, 2023

ggerganov commented Nov 19, 2023

rhjdvsgsgks commented Nov 27, 2023

pax-k commented Mar 27, 2024