-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forced_decoder_ids does not apply to tflite model (multilingual, language issue) #35
Comments
I found an issue similar to my situation and linked it. |
Hello. After looking through the github forum, I found issues and solutions related to this. Issue:
Solution: Thank you to the engineers involved! |
Does it still work if I apply monkey patch? |
Yes, it's still working. In the case of 'tiny', some text is broken and strange characters are visible. |
@yong1020 Could you share your python packages version? I found that some versions are not incompatible with each other (python3.8, tensorflow, transformers, keras, maybe numpy). For some reason my transcription works locally (
I applied the monkey patch. |
Hello, I also want to try it in Korean. I've tried everything explained here. However, I'm getting English output instead of Korean, and the English output consists of meaningless words. Could I get the Jupyter notebook file for model training and sharing the actual Android app? I appreciate your advice. :-D |
@robre22 My Python package version is as follows, but I don't think it will help you with your problems.
|
I share the ipynb code that generates whisper-small-ko.tflite. I also attach a library that decodes audio files for local transcribe test. (audio.py) If Korean does not transcribed, please make sure that task_token is 50359('transcribe') If the work is applied correctly and is not transcribed into Korean, there may be other causes (related to audio files or decoding) |
I almost gave up, thank you so much for the solution. In order to apply the whisper-small-ko.tflite file created later to the Android app, the Android example app downloaded from github here The app was built and ran on Galaxy S23, I selected a Korean wav file and pressed the Transcribe button, but the Status was Processing and there were no results. Looking at the Android Studio Logcat, I see the following Error (the part marked with E): Do you need to modify the Android app code? I'm so sorry, but I'm asking if it's possible to share the code that made the app successful. ㅠ.ㅜ |
Please change vocab file from English to multilingual vocab |
https://github.com/vilassn/whisper_android |
https://github.com/usefulsensors/openai-whisper/blob/main/models/filters_vocab_multilingual.bin See here for some more details |
I tried using the app link and multi-language voca bin file you provided, but the same error occurred as shown below. Error message is o Android App Logcat Error full Message |
I have been working with the code provided here and sample Android app code for a few weeks, but I am encountering errors when decoding multiple Korean wav audio files in the app. I suspect there might be an issue with the wav files I am testing. Could you please share the wav file you used, specifically the earth_and_moon.wav file? I would greatly appreciate it if you could share it. |
Have you tried using the Java Also, tried what @yong1020 suggested about the python dependencies.. I haven't tested for a while now, but this combination seems give me at least no runtime error when running inference locally on python:
|
@hyla76 my audio sample files: samples_korean.zip (8 audio files, 1.6MB) (source: https://aihub.or.kr/aihubdata/data/view.do?currMenu=&topMenu=&aihubDataSe=data&dataSetSn=130) Have you tried the mic recording that built-in the app? |
I tested it with the test wav file you provided, and it was decoded normally. Thank you so much. :-D I am sharing my testing environment to help others. o Python environment print(tf.version) o android studio IDE |
Hello, I'm interested in converting Whisper models to tflite.
And I'm generating a tflite model by referring to your works.
Thank you for your contribution before discussing the issue!
My goal is a tflite model that can be transcribed into "Korean".
For this work, I looked up related multilingual output issues.
And I understood that I had to apply options to perform transcription into a particular language.
model.config.forced_decoder_ids = [[1, 50264], [2, 50359], [3, 50363]] # <|ko|><|transcribe|><|notimestamp|>
they work well, before tflite converting!
My process is like this.
Load transformer-TFWhipser model
Define the 'GenerateModel', 'serving(input_features)'
Apply configurations to set the language and task at this stage
Convert to TFLite (using tf.lite.TFLiteConverter)
Inference and output comparison (tensorflow, tflite)
Before converting to TFLite,
generation configurations (language, task) were well applied. After that, Korean token values are generated.
After converting to TFLite,
the model output is set to '0' for task and timestamp tokens, although the language setting token remains in Korean(50264).
After that, the tokens are generated with an English value.
I don't think configs is fully applied during conversion to tflite.
Any advice regarding this? I think it would be helpful for a lot of people if we could solve this problem.
In addition, I also experimented with the tflite model, which is separated into an encoder and a decoder through onnx conversion.
I got the desired result when I injected 'decoder_start_token' into the input_tensor of the decoder.
However, if possible, I would like to use an combined model.
The text was updated successfully, but these errors were encountered: