Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

word-level timestamps #12

Closed
eschmidbauer opened this issue Feb 20, 2023 · 8 comments · Fixed by #43
Closed

word-level timestamps #12

eschmidbauer opened this issue Feb 20, 2023 · 8 comments · Fixed by #43

Comments

@eschmidbauer
Copy link

Hi, I really appreciate you sharing this implementation.
I found it to be very fast with accurate results.
I do not see word-level timestamps in the result. Are word level timestamps possible?

@guillaumekln
Copy link
Contributor

Hi,

Word-level timestamps are currently not possible. They usually require extensions to the model that are not implemented at this time.

@tohe91
Copy link

tohe91 commented Feb 23, 2023

Thank you for the amazing work on this!
It would be amazing if world level timestamps could be implemented in faster-whisper, once the world-level-timestamps branch is merged to main in whisper

@collynce
Copy link

collynce commented Mar 7, 2023

Just checked out the whisper repo and world-level timestamp PR has been merged. I would be great indeed to have the same on faster-whiper.

Great work!

@guillaumekln
Copy link
Contributor

I just pushed an experimental branch implementing word-level timestamps! It would be great if you can test this early.

Note that I implemented exactly the same logic as openai/whisper. So if there is a strange result and openai/whisper has the same result, you should report the issue to openai/whisper and not here.

Here's how you can test this today:

Install the development branch of faster-whisper

pip install --force-reinstall "faster-whisper[conversion] @ https://github.com/guillaumekln/faster-whisper/archive/refs/heads/word-level-timestamps.tar.gz"

Install the development build of CTranslate2

  1. Go to this build page
  2. Download the artifact "python-wheels"
  3. Extract the archive
  4. Install the wheel matching your system and Python version, for example:
pip install --force-reinstall ctranslate2-3.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Reconvert the model

The model should be converted again with the latest version of CTranslate2 as the configuration needs to be updated with additional information:

ct2-transformers-converter --model openai/whisper-large-v2 --output_dir whisper-large-v2-ct2 --copy_files tokenizer.json --quantization float16

Transcribe with word-level timestamps

segments, _ = model.transcribe(audio_path, word_timestamps=True)

for segment in segments:
    print(segment.words)

@eschmidbauer
Copy link
Author

just tested this with the tiny model and it worked!
going to do more tests but this is great, thanks so much for sharing!

@eschmidbauer
Copy link
Author

large-v2 seems to work too. Thanks again

@Jeronymous
Copy link

When I tested word timestamps on a bunch of file, I saw this error happening on some corner case:

  File "/usr/local/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 531, in add_word_timestamps
    alignment = self.find_alignment(tokenizer, text_tokens, mel, num_frames)
  File "/usr/local/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 598, in find_alignment
    start_times = jump_times[word_boundaries[:-1]]
IndexError: index 1 is out of bounds for axis 0 with size 1

@guillaumekln
Copy link
Contributor

Thank you for testing!

Do you confirm the same file works without issue in openai/whisper? If yes, is it possible for you to share this input file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants