YouTube Video Transcription with Whisper #262

marferca · 2022-10-06T16:14:46Z

marferca
Oct 6, 2022

Hi everyone!

I have created a Streamlit app that lets you transcribe YouTube videos using Whisper and download the output as TXT or SubRip. I hope you like it and will be more than happy to hear your thoughts on this.

Streamlit app: https://marferca-yt-whisper-demo-streamlit-app-luptcq.streamlitapp.com/

Kudos to the OpenAI team, for your fantastic work and for sharing it with the community. I hope we can keep building next-gen apps with your powerful models 🚀.

idrissathiam01 · 2022-10-06T16:48:37Z

idrissathiam01
Oct 6, 2022

Congratulations. You are one of the top devs of the year.

…

On Thu, Oct 6, 2022 at 12:15 PM marferca ***@***.***> wrote: Hi everyone! I have created a Streamlit app that lets you transcribe YouTube videos using Whisper and download the output as TXT or SubRip. I hope you like it and will be more than happy to hear your thoughts on this. Streamlit app: https://marferca-yt-whisper-demo-streamlit-app-luptcq.streamlitapp.com/ Kudos to the OpenAI team, for your fantastic work and for sharing it with the community. I hope we can keep building next-gen apps with your powerful models 🚀 <https://emojiterra.com/rocket/>. — Reply to this email directly, view it on GitHub <#262>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABSNYXXTMOB4XBNR4QFT4L3WB33IHANCNFSM6AAAAAAQ6ZGAGM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

3 replies

ingomarlos Oct 7, 2022

@marferca Thank you very much for sharing this! Does it work with other languages? I tried with below video:
https://www.youtube.com/watch?v=xyqCdmYdCzU&ab_channel=entwickler.tutorials

And got this error:
RuntimeError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).
Traceback:
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 562, in _run_script
exec(code, module.dict)
File "/app/yt-whisper-demo/streamlit_app.py", line 102, in
main()
File "/app/yt-whisper-demo/streamlit_app.py", line 81, in main
result = transcribe_youtube_video(model, url)
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/runtime/legacy_caching/caching.py", line 623, in wrapped_func
return get_or_create_cached_value()
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/runtime/legacy_caching/caching.py", line 605, in get_or_create_cached_value
return_value = non_optional_func(*args, **kwargs)
File "/app/yt-whisper-demo/utils.py", line 56, in transcribe_youtube_video
result = model.transcribe(os.path.join('data','audio.mp3'))
File "/home/appuser/venv/lib/python3.9/site-packages/whisper/transcribe.py", line 178, in transcribe
result: DecodingResult = decode_with_fallback(segment)
File "/home/appuser/venv/lib/python3.9/site-packages/whisper/transcribe.py", line 114, in decode_with_fallback
decode_result = model.decode(segment, options)
File "/home/appuser/venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/appuser/venv/lib/python3.9/site-packages/whisper/decoding.py", line 701, in decode
result = DecodingTask(model, options).run(mel)
File "/home/appuser/venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/appuser/venv/lib/python3.9/site-packages/whisper/decoding.py", line 633, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
File "/home/appuser/venv/lib/python3.9/site-packages/whisper/decoding.py", line 588, in _main_loop
logits = self.inference.logits(tokens, audio_features)
File "/home/appuser/venv/lib/python3.9/site-packages/whisper/decoding.py", line 145, in logits
return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
File "/home/appuser/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/venv/lib/python3.9/site-packages/whisper/model.py", line 189, in forward
x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
File "/home/appuser/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/venv/lib/python3.9/site-packages/whisper/model.py", line 124, in forward
x = x + self.attn(self.attn_ln(x), mask=mask, kv_cache=kv_cache)
File "/home/appuser/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/venv/lib/python3.9/site-packages/whisper/model.py", line 85, in forward
wv = self.qkv_attention(q, k, v, mask)
File "/home/appuser/venv/lib/python3.9/site-packages/whisper/model.py", line 97, in qkv_attention
qk = qk + mask[:n_ctx, :n_ctx]

marferca Oct 7, 2022
Author

Thank you for your comment, @ingomarlos!

Yes, it should work for other languages as well. I just tried with the URL that you mentioned and it worked for me 👍(see screenshot below).

Regarding the error message, unfortunately I also get the same error from time to time... I will update you if I manage to fix this bug 👾.

ingomarlos Oct 7, 2022

Thank you very much, I tried to run again and it works fine!

bobhalford · 2022-10-08T09:16:36Z

bobhalford
Oct 8, 2022

Brilliant. Thanks for your work. Be good to add translate functionality. Is that an easy thing to do? Traditionally I would not have thought so - but these days, who can tell?

1 reply

marferca Oct 9, 2022
Author

Thanks, @bobhalford!
Whisper is also able to translate utterances to English, and it should be a really straight forward task. You can find an example of how to do so in the following Jupyter Notebook.
https://github.com/openai/whisper/blob/6e3be77e1a105e59086e3e21ff5f609fd6fa89a5/notebooks/Multilingual_ASR.ipynb

rosewang2008 · 2022-10-11T00:43:25Z

rosewang2008
Oct 11, 2022

Hey, thanks for creating this app! I was wondering whether it'd be possible to support Youtube videos of longer than 8 mins. I'm thinking of transcribing some videos that are at most 30 minutes long and would love to use this app. Thanks!

2 replies

marferca Oct 12, 2022
Author

Thanks for your comment, @rosewang2008!
The app is hosted on a free machine managed by Streamlit Cloud. Therefore, the app is very limited in terms of storage and computational resources, and setting a longer video length could lead to performance issues. However, you could clone the repository, edit the length limit, and run the Streamlit app on your local machine. I have added the setup steps to the README file (https://github.com/marferca/yt-whisper-demo#setup).
Hope this helps 🙂!

DamithDR Oct 13, 2022

@marferca Thank you for sharing this <3

wqw547243068 · 2022-11-17T07:36:33Z

wqw547243068
Nov 17, 2022

Porblem about audio duration:

Audio file: about 13 seconds long
your function code for computing duration show the wrong durations

def _whisper_result_to_srt(result):
    text = []
    for i,s in enumerate(result['segments']):
        text.append(str(i+1))

        time_start = s['start']
        hours, minutes, seconds = int(time_start/3600), (time_start/60) % 60, (time_start) % 60
        timestamp_start = "%02d:%02d:%06.3f" % (hours, minutes, seconds)
        timestamp_start = timestamp_start.replace('.',',')     
        time_end = s['end']
        hours, minutes, seconds = int(time_end/3600), (time_end/60) % 60, (time_end) % 60
        timestamp_end = "%02d:%02d:%06.3f" % (hours, minutes, seconds)
        timestamp_end = timestamp_end.replace('.',',') 
        text.append(timestamp_start + " --> " + timestamp_end)
        text.append(s['text'].strip() + "\n")
    return "\n".join(text)
out = _whisper_result_to_srt(result)
print(out)

The output detail shows that this audio file lasts 36 seconds ... Actually, it's not true

1
00:00:00,000 --> 00:00:07,000
项目的物业 富华行物业 管理长安俱乐部

2
00:00:07,000 --> 00:00:36,000
长安俱乐部是十大俱乐部之首 物业费是24块钱一瓶

1 reply

marferca Nov 20, 2022
Author

Hi @wqw547243068!
This is a known limitation of whisper #89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YouTube Video Transcription with Whisper #262

{{title}}

Replies: 4 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

YouTube Video Transcription with Whisper #262

Replies: 4 comments · 7 replies

marferca Oct 7, 2022 Author

marferca Oct 9, 2022 Author

marferca Oct 12, 2022 Author

marferca Nov 20, 2022 Author

Replies: 4 comments 7 replies

marferca Oct 7, 2022
Author

marferca Oct 9, 2022
Author

marferca Oct 12, 2022
Author

marferca Nov 20, 2022
Author