Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
How do I extract the subtitles in plain text? #17178
I can see how to extract the automatically generated subtitles for a video..
This creates a file in this instance called
The first part of this WEBVTT file looks like this..
What is the best way to simply extract the plain text from these subtitles? Notice how the text repeats, so I can't just strip out the tags.
From the YouTube dashboard, I can download the srt and sbv formats. These look far easier to post-process.
However, when I try to grab the srt format using this tool
What am I missing here?
Otherwise for the vtt file that does get downloaded, any suggestions for a library to post-process this file?
I don't fully understand this... from the YouTube dashboard I can download a srt. With this tool I can't. Why's that?
So instead you're saying I need to download the vtt and then use the --convert-subs flag?
When I try that
Also what's confusing is when I --list-subs I get
So captions but no subtitles?
Hi @jakecan13 I've been researching this a bunch, and I finally figured it out using another module.
The working code sample is as follows...