Not able to transcribe Polish audio while English audio is easily transcribed; encoding issues? #2179
-
Hi all, I am trying to transcribe audio files in Polish to text using the Whisper Open AI. I am encountering an error which seems to be about encoding the text (possibly because of Polish letters such as ś, ż etc.)? When I run the same command to transcribe audio in English, the programme transcribes it with no issues. I've read the "help" of Whisper but there is no information on encoding (that I can find). Any tips to fix this? The command: (I also tried it without setting language to Polish; for English, just "whisper audioEnglish.wav > transcriptEN.txt" works well) The error code:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Run whisper without using '>' for redirecting console output, for example FYI below (I am not a Windows user but I suspect this is the problem): Why utf8 characters get broken when piped in Windows platform? |
Beta Was this translation helpful? Give feedback.
Run whisper without using '>' for redirecting console output, for example
whisper audio.wav --language Polish --output_format txt
FYI below (I am not a Windows user but I suspect this is the problem):
Why utf8 characters get broken when piped in Windows platform?
https://www.reddit.com/r/rust/comments/we4y3p/why_utf8_characters_get_broken_when_piped_in/