Converted Gemma 2b tflite generate same token #5258

dengzheng-cloud · 2024-03-25T08:39:31Z

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Android 11

MediaPipe Tasks SDK version

0.10.11

Task name (e.g. Image classification, Gesture recognition etc.)

llminference

Programming Language and version (e.g. C++, Python, Java)

python

Describe the actual behavior

convert model.tflite and push that to /data/local/tmp/llm, while llminference output same token

Describe the expected behaviour

after converted, llminference should output dialog

Standalone code/steps you may have used to try to get what you need

To convert gemma 2b huggingface model into mediapipe tflite, i use this website https://developers.google.com/mediapipe/solutions/genai/llm_inference/android#convert-model

Other info / Complete Logs

python code reference site: https://colab.research.google.com/github/googlesamples/mediapipe/blob/main/examples/llm_inference/conversion/llm_conversion.ipynb#scrollTo=LSSxrLyQPofw
init config with feedforward,attention,embedding quant bits=4

converter.ConversionConfig(input_ckpt=input_ckpt, ckpt_format='safetensors', model_type='GEMMA_2B', backend=backend, output_dir=output_dir, combine_file_only=False, vocab_model_file=vocab_model_file, output_tflite_file=output_tflite_file, feedforward_quant_bits=4,attention_quant_bits=4,embedding_quant_bits=4)


[LOG output]
WARNING:jax._src.xla_bridge:An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1711354375.520733   36623 model_ckpt_util.cc:350] Inserting empty buffer to TfLite.

then push model.tflite into /data/local/tmp/llm. chat with Android llminference, input "hello", get "As an disreg disreg disreg disreg disreg ..."

kuaashish · 2024-04-24T06:08:30Z

Hi @dengzheng-cloud,

Gemma 2B model does not need to be converted into LLInference format if downloaded from Kaggle. This is documented here.
If you are not using Gemma, please let us know which model you are using so that we shall reproduce the issue with that model at our end.

Thank you!!

dengzheng-cloud · 2024-04-24T06:17:25Z

Hi @dengzheng-cloud,

Gemma 2B model does not need to be converted into LLInference format if downloaded from Kaggle. This is documented here. If you are not using Gemma, please let us know which model you are using so that we shall reproduce the issue with that model at our end.

Thank you!!

already done, choose to use gemma from kaggle, will the details of how the gemma model performs int4 quantization be released later?

kuaashish · 2024-05-03T07:02:28Z

Hi @dengzheng-cloud,

Based on our internal discussion, Our team is actively working on this and it's definitely on our roadmap. While we can not provide an time, but it will be available soon.

Thank you!!

google-ml-butler · 2024-05-08T17:44:01Z

Are you satisfied with the resolution of your issue?
Yes
No

adityak6798 · 2024-05-15T18:30:20Z

I'm facing the same issue when running gemma 2b on android (in a kotlin codebase). The call returns the same output token repeatedly, whereas trying the same prompt on the vertex ai portal with gemma 2b gives a good output.

google-ml-butler bot assigned kuaashish Mar 25, 2024

kuaashish added task:LLM inference Issues related to MediaPipe LLM Inference Gen AI setup platform:python MediaPipe Python issues platform:android Issues with Android as Platform type:support General questions labels Mar 26, 2024

kuaashish added the stat:awaiting response Waiting for user response label Apr 24, 2024

google-ml-butler bot removed the stat:awaiting response Waiting for user response label Apr 24, 2024

kuaashish added the stat:awaiting response Waiting for user response label May 3, 2024

google-ml-butler bot removed the stat:awaiting response Waiting for user response label May 3, 2024

kuaashish added the stat:awaiting response Waiting for user response label May 3, 2024

dengzheng-cloud closed this as completed May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converted Gemma 2b tflite generate same token #5258

Converted Gemma 2b tflite generate same token #5258

dengzheng-cloud commented Mar 25, 2024

kuaashish commented Apr 24, 2024

dengzheng-cloud commented Apr 24, 2024

kuaashish commented May 3, 2024

google-ml-butler bot commented May 8, 2024

adityak6798 commented May 15, 2024

Converted Gemma 2b tflite generate same token #5258

Converted Gemma 2b tflite generate same token #5258

Comments

dengzheng-cloud commented Mar 25, 2024

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

kuaashish commented Apr 24, 2024

dengzheng-cloud commented Apr 24, 2024

kuaashish commented May 3, 2024

google-ml-butler bot commented May 8, 2024

adityak6798 commented May 15, 2024