Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--word_timestamps = True not working #87

Closed
ollisulopuisto opened this issue Apr 7, 2024 · 1 comment
Closed

--word_timestamps = True not working #87

ollisulopuisto opened this issue Apr 7, 2024 · 1 comment

Comments

@ollisulopuisto
Copy link

Hi.

I think this might be user error (hopefully!), but when I run whisper-ctranslate2 inside Google Colab with the following command

!whisper-ctranslate2 "{full_path}" --vad_filter True --model large-v2 --verbose True --language fi --word_timestamps True --output_dir "{transcripts_dir}"

I get perfectly nice sentence-level output in JSON and every other format, but no word-level tagging. I've tried removing some of the parameters, but haven't gotten it to work yet. Removing vad_filter doesn't seem to do anything.

Is this a language-dependant thing (i.e. Finnish & large-v2) or is there something else I'm missing?

Both {full_path} and {transcripts_dir} contain meaningful values.

@jordimas
Copy link
Collaborator

jordimas commented Apr 7, 2024

Hello.
It's not language or model dependant. It works for me.
Some ideas:

  • Make sure that you use the whisper-ctranslate2 version 0.4.1 or higher
  • Look in the JSON for '"words": [' it should be there, even if is not added you should see a "words": null
  • Try it outside Google Colab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants