Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forced_decoder_ids does not apply to tflite model (multilingual, language issue) #35

Closed
yong10202 opened this issue May 20, 2024 · 17 comments

Comments

@yong10202
Copy link

yong10202 commented May 20, 2024

Hello, I'm interested in converting Whisper models to tflite.
And I'm generating a tflite model by referring to your works.
Thank you for your contribution before discussing the issue!

My goal is a tflite model that can be transcribed into "Korean".
For this work, I looked up related multilingual output issues.
And I understood that I had to apply options to perform transcription into a particular language.
model.config.forced_decoder_ids = [[1, 50264], [2, 50359], [3, 50363]] # <|ko|><|transcribe|><|notimestamp|>
they work well, before tflite converting!

My process is like this.

  1. Load transformer-TFWhipser model

  2. Define the 'GenerateModel', 'serving(input_features)'
    Apply configurations to set the language and task at this stage

  3. Convert to TFLite (using tf.lite.TFLiteConverter)

  4. Inference and output comparison (tensorflow, tflite)

Before converting to TFLite,
generation configurations (language, task) were well applied. After that, Korean token values are generated.

output tensor: 
[50258 50264 50359 50363  4704  7675  2785 16623  6301  1453 16316  2230
  1453  2393 21166  1129  5514  5642 10520 14050    13 50257 ...]

text:
<|startoftranscript|><|ko|><|transcribe|><|notimestamps|> 지구의 일부가 날아가서 달이 되었다는 거예요.<|endoftext|> ...

After converting to TFLite,
the model output is set to '0' for task and timestamp tokens, although the language setting token remains in Korean(50264).
After that, the tokens are generated with an English value.

output tensor: 
[50258 50264     0     0   440  4120   311   644   390 16479  1314   293
  3062   257  7135    13 50257 ...]

text:
<|startoftranscript|><|ko|>!! The earth's part was blown away and became a moon.<|endoftext|> ...

I don't think configs is fully applied during conversion to tflite.
Any advice regarding this? I think it would be helpful for a lot of people if we could solve this problem.

In addition, I also experimented with the tflite model, which is separated into an encoder and a decoder through onnx conversion.
I got the desired result when I injected 'decoder_start_token' into the input_tensor of the decoder.
However, if possible, I would like to use an combined model.

@yong10202 yong10202 changed the title Generation_configs does not apply to tflite model output. Generation_configs does not apply to tflite model May 21, 2024
@yong10202
Copy link
Author

I found an issue similar to my situation and linked it.
vilassn/whisper_android#7 (comment)

@yong10202
Copy link
Author

yong10202 commented Jun 10, 2024

Hello. After looking through the github forum, I found issues and solutions related to this.

Issue:

  • Text output only in English even if multilingual settings were applied before tflite conversion
  • Problem where forced_decoder_ids [task, timestamp] becomes None (='!!') after conversion

Solution:
Those involved in the multilingual support issue of the tflite model will be able to solve most of their problems by proceeding with the 'monkey patch' below.
#15 (comment)

Thank you to the engineers involved!

image

@yong10202 yong10202 changed the title Generation_configs does not apply to tflite model forced_decoder_ids does not apply to tflite model (multilingual, language issue) Jun 10, 2024
@KihongK
Copy link

KihongK commented Jun 21, 2024

Does it still work if I apply monkey patch?

@yong1020
Copy link

Does it still work if I apply monkey patch? @KihongK

Yes, it's still working.
However, it may not work well on 'tiny' and 'base' models. (My project is 'small' model.)

In the case of 'tiny', some text is broken and strange characters are visible.

@robre22
Copy link

robre22 commented Jun 28, 2024

@yong1020 Could you share your python packages version? I found that some versions are not incompatible with each other (python3.8, tensorflow, transformers, keras, maybe numpy). For some reason my transcription works locally (tflite_generate(input_features=input_features)["sequences"] is correct), but when using it on android, it produces this error:

tflite  : gather index out of bounds
tflite  : Node number 34 (GATHER) failed to invoke.
tflite  : Node number 694 (WHILE) failed to invoke.

I applied the monkey patch.

@hyla76
Copy link

hyla76 commented Jul 8, 2024

Does it still work if I apply monkey patch? @KihongK

Yes, it's still working. However, it may not work well on 'tiny' and 'base' models. (My project is 'small' model.)

In the case of 'tiny', some text is broken and strange characters are visible.


Hello, I also want to try it in Korean. I've tried everything explained here.

However, I'm getting English output instead of Korean, and the English output consists of meaningless words.

Could I get the Jupyter notebook file for model training and sharing the actual Android app? I appreciate your advice. :-D

@yong1020
Copy link

yong1020 commented Jul 9, 2024

@yong1020 Could you share your python packages version? I found that some versions are not incompatible with each other (python3.8, tensorflow, transformers, keras, maybe numpy). For some reason my transcription works locally (tflite_generate(input_features=input_features)["sequences"] is correct), but when using it on android, it produces this error:

tflite  : gather index out of bounds
tflite  : Node number 34 (GATHER) failed to invoke.
tflite  : Node number 694 (WHILE) failed to invoke.

I applied the monkey patch.

@robre22 My Python package version is as follows, but I don't think it will help you with your problems.

Python 3.8.10
tensorflow==2.13.1
transformers==4.42.1

I've experienced the 'failed to invoke' issue with some audio files, which may be related to audio files and audio decoding.
If you have other audio samples, try them.

  1. Audio length may be too short or too long (recommend between 10 and 30 seconds)
  2. Sampling rate must be 16000 Hz

@yong1020
Copy link

yong1020 commented Jul 9, 2024

Does it still work if I apply monkey patch? @KihongK

Yes, it's still working. However, it may not work well on 'tiny' and 'base' models. (My project is 'small' model.)
In the case of 'tiny', some text is broken and strange characters are visible.

Hello, I also want to try it in Korean. I've tried everything explained here.

However, I'm getting English output instead of Korean, and the English output consists of meaningless words.

Could I get the Jupyter notebook file for model training and sharing the actual Android app? I appreciate your advice. :-D

@hyla76

I share the ipynb code that generates whisper-small-ko.tflite.
https://gist.github.com/yong1020/1c2fa8080417e722a4c40c3352803453#file-generate_whisper-ko-tflite-ipynb

I also attach a library that decodes audio files for local transcribe test. (audio.py)
https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/audio.py

If Korean does not transcribed, please make sure that task_token is 50359('transcribe')
Monkey Patch prevents that the task_token becoming NaN(!).

If the work is applied correctly and is not transcribed into Korean, there may be other causes (related to audio files or decoding)

@hyla76
Copy link

hyla76 commented Jul 9, 2024

Does it still work if I apply monkey patch? @KihongK

Yes, it's still working. However, it may not work well on 'tiny' and 'base' models. (My project is 'small' model.)
In the case of 'tiny', some text is broken and strange characters are visible.

Hello, I also want to try it in Korean. I've tried everything explained here.
However, I'm getting English output instead of Korean, and the English output consists of meaningless words.
Could I get the Jupyter notebook file for model training and sharing the actual Android app? I appreciate your advice. :-D

@hyla76

I share the ipynb code that generates whisper-small-ko.tflite. https://gist.github.com/yong1020/1c2fa8080417e722a4c40c3352803453#file-generate_whisper-ko-tflite-ipynb

I also attach a library that decodes audio files for local transcribe test. (audio.py) https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/audio.py

If Korean does not transcribed, please make sure that task_token is 50359('transcribe') Monkey Patch prevents that the task_token becoming NaN(!).

If the work is applied correctly and is not transcribed into Korean, there may be other causes (related to audio files or decoding)

I almost gave up, thank you so much for the solution.
I ran the code you shared at https://gist.github.com/yong1020/1c2fa8080417e722a4c40c3352803453#file-generate_whisper-ko-tflite-ipynb, and successfully generated the whisper-small-ko.tflite file. Using this resulting file, I confirmed that Korean wav files are correctly decoded into Korean in Jupyter Notebook.
Very Very Thanks :D

In order to apply the whisper-small-ko.tflite file created later to the Android app, the Android example app downloaded from github here
Changed the name of whisper-small-ko.tflite to whisper-tiny.tflite and added it to the whisper.tflite\whisper_android\app\src\main\assets path.
A Korean wav file was also included.

The app was built and ran on Galaxy S23, I selected a Korean wav file and pressed the Transcribe button, but the Status was Processing and there were no results.

Looking at the Android Studio Logcat, I see the following Error (the part marked with E):
2024-07-09 22:48:02.691 13628-24842 MainActivity com.whispertflite D Update is received, Message: Processing...
2024-07-09 22:48:02.695 13628-13628 InputMethodManager com.whispertflite I invalidateInput
2024-07-09 22:48:05.514 13628-13628 ViewRootIm...nActivity] com.whispertflite I onDisplayChanged oldDisplayState=2 newDisplayState=2
2024-07-09 22:48:32.311 13628-24842 tflite com.whispertflite E gather index out of bounds
2024-07-09 22:48:32.312 13628-24842 tflite com.whispertflite E Node number 32 (GATHER) failed to invoke.
2024-07-09 22:48:32.312 13628-24842 tflite com.whispertflite E Node number 1346 (WHILE) failed to invoke.
2024-07-09 22:48:32.318 13628-24842 MainActivity com.whispertflite D Result:
2024-07-09 22:48:32.318 13628-24842 Whisper com.whispertflite D Result len: 0, Result:
2024-07-09 22:48:32.318 13628-24842 MainActivity com.whispertflite D Update is received, Message: Processing done...!

Do you need to modify the Android app code? I'm so sorry, but I'm asking if it's possible to share the code that made the app successful. ㅠ.ㅜ
I would be very grateful if you could share it with frog1996@naver.com or share it on your github site.

@nyadla-sys
Copy link
Owner

Please change vocab file from English to multilingual vocab

@nyadla-sys
Copy link
Owner

https://github.com/vilassn/whisper_android
Please refer this GitHub and find more details

@nyadla-sys
Copy link
Owner

@hyla76
Copy link

hyla76 commented Jul 10, 2024

https://github.com/usefulsensors/openai-whisper/blob/main/models/filters_vocab_multilingual.bin

See here for some more details

I tried using the app link and multi-language voca bin file you provided, but the same error occurred as shown below.

Error message is
...
2024-07-10 10:52:35.897 12094-12304 tflite com.whispertflite E gather index out of bounds
2024-07-10 10:52:35.897 12094-12304 tflite com.whispertflite E Node number 32 (GATHER) failed to invoke.
2024-07-10 10:52:35.897 12094-12304 tflite com.whispertflite E Node number 1346 (WHILE) failed to invoke.
...

o Android App Logcat Error full Message
24-07-10 10:52:08.552 12094-12094 MainActivity com.whispertflite D Start transcription...
2024-07-10 10:52:08.553 12094-12094 MainActivity com.whispertflite D Returned asset path: /data/user/0/com.whispertflite/files/3619592205655254.wav
2024-07-10 10:52:08.554 12094-12304 Whisper com.whispertflite D WaveFile: /data/user/0/com.whispertflite/files/3619592205655254.wav
2024-07-10 10:52:08.554 12094-12304 MainActivity com.whispertflite D Update is received, Message: Processing...
2024-07-10 10:52:29.976 12094-12094 ViewRootIm...nActivity] com.whispertflite I ViewPostIme pointer 0
2024-07-10 10:52:30.018 12094-12094 ViewRootIm...nActivity] com.whispertflite I ViewPostIme pointer 1
2024-07-10 10:52:35.897 12094-12304 tflite com.whispertflite E gather index out of bounds
2024-07-10 10:52:35.897 12094-12304 tflite com.whispertflite E Node number 32 (GATHER) failed to invoke.
2024-07-10 10:52:35.897 12094-12304 tflite com.whispertflite E Node number 1346 (WHILE) failed to invoke.
2024-07-10 10:52:35.903 12094-12304 MainActivity com.whispertflite D Result:
2024-07-10 10:52:35.903 12094-12304 Whisper com.whispertflite D Result len: 0, Result:
2024-07-10 10:52:35.903 12094-12304 MainActivity com.whispertflite D Update is received, Message: Processing done...!
2024-07-10 10:52:35.903 12094-12304 Whisper com.whispertflite D Time Taken for transcription: 27349ms

@hyla76
Copy link

hyla76 commented Jul 11, 2024

Hello. After looking through the github forum, I found issues and solutions related to this.

Issue:

  • Text output only in English even if multilingual settings were applied before tflite conversion
  • Problem where forced_decoder_ids [task, timestamp] becomes None (='!!') after conversion

Solution: Those involved in the multilingual support issue of the tflite model will be able to solve most of their problems by proceeding with the 'monkey patch' below. #15 (comment)

Thank you to the engineers involved!

image


I have been working with the code provided here and sample Android app code for a few weeks, but I am encountering errors when decoding multiple Korean wav audio files in the app.

I suspect there might be an issue with the wav files I am testing. Could you please share the wav file you used, specifically the earth_and_moon.wav file?

I would greatly appreciate it if you could share it.

@robre22
Copy link

robre22 commented Jul 11, 2024

Hello. After looking through the github forum, I found issues and solutions related to this.
Issue:

  • Text output only in English even if multilingual settings were applied before tflite conversion
  • Problem where forced_decoder_ids [task, timestamp] becomes None (='!!') after conversion

Solution: Those involved in the multilingual support issue of the tflite model will be able to solve most of their problems by proceeding with the 'monkey patch' below. #15 (comment)
Thank you to the engineers involved!
image

I have been working with the code provided here and sample Android app code for a few weeks, but I am encountering errors when decoding multiple Korean wav audio files in the app.

I suspect there might be an issue with the wav files I am testing. Could you please share the wav file you used, specifically the earth_and_moon.wav file?

I would greatly appreciate it if you could share it.

Have you tried using the Java WhisperEngine rather than WhisperEngineNative? I haven't tested the difference in detail, but during some experiments, the Java WhisperEngine sometimes didn't give me the runtime error ( = new WhisperEngine()).

Also, tried what @yong1020 suggested about the python dependencies.. I haven't tested for a while now, but this combination seems give me at least no runtime error when running inference locally on python:

pip install transformers==4.30.0 tensorflow==2.12.0 keras==2.12.0 numpy==1.23.5

@yong1020
Copy link

yong1020 commented Jul 12, 2024

@hyla76 my audio sample files: samples_korean.zip (8 audio files, 1.6MB)

(source: https://aihub.or.kr/aihubdata/data/view.do?currMenu=&topMenu=&aihubDataSe=data&dataSetSn=130)

Have you tried the mic recording that built-in the app?

@hyla76
Copy link

hyla76 commented Jul 12, 2024

@hyla76 my audio sample files: samples_korean.zip (8 audio files, 1.6MB)

(source: https://aihub.or.kr/aihubdata/data/view.do?currMenu=&topMenu=&aihubDataSe=data&dataSetSn=130)

Have you tried the mic recording that built-in the app?

I tested it with the test wav file you provided, and it was decoded normally.
It seems to work only if it matches the wav file characteristics that tensorflow can understand.

Thank you so much. :-D

I am sharing my testing environment to help others.

o Python environment
python 3.8.19

print(tf.version)
print(keras.version)
print(transformers.version)
2.12.0
2.12.0
4.42.1

o android studio IDE
Android Studio Koala | 2024.1.1
Build #AI-241.15989.150.2411.11948838, built on June 11, 2024
Runtime version: 17.0.10+0--11609105 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o.
Windows 11.0
GC: G1 Young Generation, G1 Old Generation
Memory: 2048M
Cores: 16
Non-Bundled Plugins:
idea.plugin.protoeditor (241.15989.49)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants