npu support #342
referearn89-lang
started this conversation in
Ideas
npu support
#342
Replies: 1 comment
-
|
Qualcomm's NPU LLM is still quite outdated, lacking top-of-the-edge side models like the GEMMA4 and QWEN3.5. However, I found that Google's official Litert already supports running the SM8750 (theoretically 8850 forward compatible) running the Gemma4 E2B. On my Poco F7 Ultra, it can run at 15 tokens per second, and the heat generation is very low. However, memory usage is significant: the 4096T consumes 5GB for context, and 9GB for maxing out 128K I am trying to create files for qwen3.5 4b. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Sir, thank for making this application and I want to suggest you that and you already know it that many devices nowadays comes with npu for ai and snapdragon is one of them, and you already know it and you even implemented that for image generation if I'm right but if we can also use npu for text generation we can get faster reply without heating the device too much ,
That all I just wanted to suggest the npu support for text generation thank you!, sir pocket pal app have support for npu for text generation if you want inspiration
Beta Was this translation helpful? Give feedback.
All reactions