-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device #62
Comments
hi friend, qcom released a tutorial for deploy llama2 on 8gen3 in ai stack, it maybe helpful |
@AndreaChiChengdu I've tried to follow the tutorial but it turns out to need the “GenAI” feature in QNN. Do you have any clue on this? |
@swb1234554321 @taeyeonlee we are aware of this and are actively working on this with other groups within Qualcomm. We will update on this issue once we can release sample app. |
Hi,Can you shared this tutorial? I dont find it. Best regards. |
@bhushan23 Thanks for your great work, when can you release the sample app? we are all looking forward to it, especially how to run those downloaded files using QNN |
hi , i have a question. Can I run this 4 parts as a whole model using QNN? |
hi, can you share the tutorial link, thank u so much |
please refer to https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama to run llama2 models on device with Genie. We will keep this issue open until Android / compute sample app with C++ APIs are released |
hi, @bhushan23 |
Hi,
Could you share the sample Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device ?
your sample "python -m qai_hub_models.models.llama_v2_7b_chat_quantized.export"
generated the files below.
Llama2_PromptProcessor_1_Quantized.onnx
Llama2_PromptProcessor_1_Quantized.data
Llama2_PromptProcessor_1_Quantized.encodings
and job_jogk97en5_optimized_bin_m6qek5zyq.bin which is downloaded from AI Hub.
how to run these files on my Android Device ?
Anyone can help ?
The text was updated successfully, but these errors were encountered: