Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device #62

Open
taeyeonlee opened this issue Jul 3, 2024 · 9 comments

Comments

@taeyeonlee
Copy link

Hi,
Could you share the sample Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device ?

your sample "python -m qai_hub_models.models.llama_v2_7b_chat_quantized.export"
generated the files below.
Llama2_PromptProcessor_1_Quantized.onnx
Llama2_PromptProcessor_1_Quantized.data
Llama2_PromptProcessor_1_Quantized.encodings
and job_jogk97en5_optimized_bin_m6qek5zyq.bin which is downloaded from AI Hub.

how to run these files on my Android Device ?
Anyone can help ?

@AndreaChiChengdu
Copy link

hi friend, qcom released a tutorial for deploy llama2 on 8gen3 in ai stack, it maybe helpful

@swb1234554321
Copy link

swb1234554321 commented Jul 23, 2024

hi friend, qcom released a tutorial for deploy llama2 on 8gen3 in ai stack, it maybe helpful

@AndreaChiChengdu I've tried to follow the tutorial but it turns out to need the “GenAI” feature in QNN.
However, there is no guide on how to get the permission....so lots of genAI related executable & libs missing and the tutorial is just failed.

Do you have any clue on this?

@bhushan23
Copy link
Contributor

bhushan23 commented Jul 23, 2024

@swb1234554321 @taeyeonlee we are aware of this and are actively working on this with other groups within Qualcomm.
We will be able to release sample app soon once "GenAI" dependencies are released in QNN SDK.

We will update on this issue once we can release sample app.

@MenghuaZheng
Copy link

hi friend, qcom released a tutorial for deploy llama2 on 8gen3 in ai stack, it maybe helpful

Hi,Can you shared this tutorial? I dont find it.

Best regards.

@dirtdust
Copy link

dirtdust commented Aug 2, 2024

@swb1234554321 @taeyeonlee we are aware of this and are actively working on this with other groups within Qualcomm. We will be able to release sample app soon once "GenAI" dependencies are released in QNN SDK.

We will update on this issue once we can release sample app.

@bhushan23 Thanks for your great work, when can you release the sample app? we are all looking forward to it, especially how to run those downloaded files using QNN

@yolanda1224git
Copy link

hi , i have a question.
In ai-hub, llama2-7B model is divided into 4 parts, and each part can run inference job seperetedly. We will get 4 bins as a result, which are like job_jogk97en5_optimized_bin_m6qek5zyq.bin.

Can I run this 4 parts as a whole model using QNN?

@yolanda1224git
Copy link

hi friend, qcom released a tutorial for deploy llama2 on 8gen3 in ai stack, it maybe helpful

@AndreaChiChengdu I've tried to follow the tutorial but it turns out to need the “GenAI” feature in QNN. However, there is no guide on how to get the permission....so lots of genAI related executable & libs missing and the tutorial is just failed.

Do you have any clue on this?

hi, can you share the tutorial link, thank u so much

@bhushan23
Copy link
Contributor

please refer to https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama to run llama2 models on device with Genie.

We will keep this issue open until Android / compute sample app with C++ APIs are released

@taeyeonlee
Copy link
Author

hi, @bhushan23
Could you please share the plan to release the Android sample app with C++ APIs ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants