#**Creating YouTube transcripts with OpenAI's Whisper model**

###**Note: For faster performance set your runtime to "GPU"**
*Click on "Runtime" in the menu and click "Change runtime type". Select "GPU".*


**Step 1.** Follow the instructions in each block and select the options you want
<br>
**Step 2.** Get the url of the video you want to transcribe
<br>
**Step 3.** Refresh the folder on the left and download your transcript
<br>
**Step 4.** Go to your YouTube account and upload the transcript to the video it came from and use "autosync."


<br>



---


**What is this?**
<br>
This is a Python notebook that creates a transcript from a YouTube url using OpenAI's Whisper transcription model that you can then upload to YouTube using the autosync feature to create captions.
<br>  
**What is OpenAI's Whisper model?**
<br>
Whisper is an automatic speech recognition (ASR) neural net created by OpenAI that transcribes audio at close to human level.
<br>
<br>
**Why use this?**
<br>
The quality of the OpenAI Whisper model is amazing (I am slightly biased, but seriously, check it out.) You can also use it to transcribe in other languages.
<br>
<br>
**What do the different model sizes do?**
<br>
Each model size has an improvement in quality – especially with different languages. I've found that for a YouTube video with clear speech, the base model works really well. If you see transcription errors, you can try a larger model.
<br>
<br>
**Do I need timestamps?**
<br>
Nope. YouTube's autosync function will match the text to the spoken words and syncs up really well. All you need is each spoken sentence in a .txt file.
<br>
<br>
**How do I do this?**
<br>
Just follow each step. If you've never used Colab of a Python notebook, don't panic. It's super easy and runs in the cloud.
<br>
<br>
**Does this cost anything to use?**
<br>
Nope. You can use Colab for free and Whisper is an open source model.
<br>
<br>
[Tips for creating a YouTube transcript file](https://support.google.com/youtube/answer/2734799?hl=en)
<br>
[Information on OpenAI's Whisper model](https://openai.com/blog/whisper/)
<br>
[OpenAI's Whisper GitHub page](https://github.com/openai/whisper)
<br>












In [None]:
"""
1. Click the start button in the upper left side of this block to load the necessary libraries

You will need to run this every time you reload this notebook.
"""


!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
!pip install librosa
!pip install yt-dlp

import whisper
import time
import librosa
import re
import yt_dlp


Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-n2pg77w2
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-n2pg77w2
  Resolved https://github.com/openai/whisper.git to commit ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Ign:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:5 https://r2u.stat.illinois.edu/ubuntu jammy Release
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:8 http://archive.ubuntu.com/ubuntu jammy InRele

In [None]:
"""
2. Select the model you want to use.

Base works really well so it's the default.

(For multilingual, remove ".en" from the model name.)

Click the run button after you've made your choice (or left it at default.)
"""

# model = whisper.load_model("tiny.en")
model = whisper.load_model("base.en")
# model = whisper.load_model("small.en")
# model = whisper.load_model("medium.en")
# model = whisper.load_model("large")

In [None]:
"""
3. Click the run button and input your YouTube URL in the box below then click enter.

The video will be loaded and the audio extracted (this is usually the longest part of the process.)

Your transcript will appear in the folder on the left (you may have to refresh the folder to see it.)

You can download the file when it's completed and upload it on your video's detail page using "autosync."
"""

# This will prompt you for a YouTube video URL
url = input("Enter a YouTube video URL: ")

# Create a youtube-dl options dictionary
ydl_opts = {
    # Specify the format as bestaudio/best
    'format': 'bestaudio/best',
    # Specify the post-processor as ffmpeg to extract audio and convert to mp3
    'postprocessors': [{
        'key': 'FFmpegExtractAudio',
        'preferredcodec': 'mp3',
        'preferredquality': '192',
    }],
    # Specify the output filename as the video title
    'outtmpl': '%(title)s.%(ext)s',
}

# Download the video and extract the audio
with yt_dlp.YoutubeDL(ydl_opts) as ydl:  # Use yt-dlp.YoutubeDL
    ydl.download([url])

# Get the path of the file
file_path = ydl.prepare_filename(ydl.extract_info(url, download=False))
file_path = file_path.replace('.webm', '.mp3')
file_path = file_path.replace('.m4a', '.mp3')

# Get the duration
duration = librosa.get_duration(filename=file_path)
start = time.time()
result = model.transcribe(file_path)
end = time.time()
seconds = end - start

print("Video length:", duration, "seconds")
print("Transcription time:", seconds)

# Split result["text"]  on !,? and . , but save the punctuation
sentences = re.split("([!?.])", result["text"])

# Join the punctuation back to the sentences
sentences = ["".join(i) for i in zip(sentences[0::2], sentences[1::2])]
text = "\n\n".join(sentences)
for s in sentences:
  print(s)

# Save the file as .txt
name = "".join(file_path) + ".txt"
with open(name, "w") as f:
  f.write(text)

print("\n\n", "-"*100, "\n\nYour transcript is here:", name)

Enter a YouTube video URL: https://www.youtube.com/watch?v=P9cLmFGWfIw
[youtube] Extracting URL: https://www.youtube.com/watch?v=P9cLmFGWfIw
[youtube] P9cLmFGWfIw: Downloading webpage
[youtube] P9cLmFGWfIw: Downloading ios player API JSON
[youtube] P9cLmFGWfIw: Downloading web creator player API JSON
[youtube] P9cLmFGWfIw: Downloading player 410a4f15
[youtube] P9cLmFGWfIw: Downloading m3u8 information
[info] P9cLmFGWfIw: Downloading 1 format(s): 251
[download] Destination: Data Science in 4 Minutes： Quick High Level Overview.webm
[download] 100% of    2.66MiB in 00:00:00 at 3.05MiB/s   
[ExtractAudio] Destination: Data Science in 4 Minutes： Quick High Level Overview.mp3
Deleting original file Data Science in 4 Minutes： Quick High Level Overview.webm (pass -k to keep)
[youtube] Extracting URL: https://www.youtube.com/watch?v=P9cLmFGWfIw
[youtube] P9cLmFGWfIw: Downloading webpage
[youtube] P9cLmFGWfIw: Downloading ios player API JSON
[youtube] P9cLmFGWfIw: Downloading web creator player 

	This alias will be removed in version 1.0.
  duration = librosa.get_duration(filename=file_path)


Video length: 216.01525 seconds
Transcription time: 12.835594177246094
 Hi, this is Jeff Heaton.
 I'm going to tell you what data science is in four minutes, or at least try.
 I better get going.
 First up, data.
 You can't have science without data.
 Maybe you have a little, maybe you have a lot, but you've got to have data.
 Often your data will be in a tabular firm like this.
 Think Microsoft Excel.
 You've got columns.
 You'd like to predict one of them.
 Maybe you would like to predict the acceleration of a car based on these other parameters.
 You know the acceleration for a lot of these cars, but maybe there's some cars where you don't know the acceleration.
 You can train a model to predict that acceleration based on the ones that you already know.
 This is supervised learning.
 If you're trying to predict a number, it's regression.
 If you're trying to predict a class or a category or a type of car, it is classification.
 There's also unsupervised learning.
 Maybe you don't kn