<a href="https://colab.research.google.com/github/medblocks/youtube-transcript/blob/main/WhisperYouTube.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

If you're looking at this on GitHub and new to Python Notebooks or Colab, click the Google Colab badge above 👆


#**Creating YouTube transcripts with OpenAI's Whisper model**

📺 Getting started video: https://youtu.be/kENRf82_RQs

*Colab beginner notes:*
<br>
1. These files are being loaded on a virtual machine in the cloud. Nothing is being downloaded to your computer (except for the transcript when you click to download it.) When you close this session the instance will be erased.
<br>
2. The run button is visible when you move your mouse close to the left edge of the code block. It looks kind of like this: ▶️ ...but round...and white on black...so nothing like this. You'll know it when you see it.

###**Note: For faster performance set your runtime to "GPU"**
*Click on "Runtime" in the menu and click "Change runtime type". Select "GPU".*


**Step 1.** Follow the instructions in each block and select the options you want
<br>
**Step 2.** Get the url of the video you want to transcribe
<br>
**Step 3.** Refresh the folder on the left and download your transcript
<br>
**Step 4.** Go to your YouTube account and upload the transcript to the video it came from and use "autosync."

That's it!

Have a question? Hit me up on Twitter:[ @AndrewMayne](https://twitter.com/andrewmayne)

<br>



---


**What is this?**
<br>
This is a Python notebook that creates a transcript from a YouTube url using OpenAI's Whisper transcription model that you can then upload to YouTube using the autosync feature to create captions.
<br>  
**What is OpenAI's Whisper model?**
<br>
Whisper is an automatic speech recognition (ASR) neural net created by OpenAI that transcribes audio at close to human level.
<br>
<br>
**Why use this?**
<br>
The quality of the OpenAI Whisper model is amazing (I am slightly biased, but seriously, check it out.) You can also use it to transcribe in other languages.
<br>
<br>
**What do the different model sizes do?**
<br>
Each model size has an improvement in quality – especially with different languages. I've found that for a YouTube video with clear speech, the base model works really well. If you see transcription errors, you can try a larger model.
<br>
<br>
**Do I need timestamps?**
<br>
Nope. YouTube's autosync function will match the text to the spoken words and syncs up really well. All you need is each spoken sentence in a .txt file.
<br>
<br>
**How do I do this?**
<br>
Just follow each step. If you've never used Colab of a Python notebook, don't panic. It's super easy and runs in the cloud.
<br>
<br>
**Does this cost anything to use?**
<br>
Nope. You can use Colab for free and Whisper is an open source model.
<br>
<br>
[Tips for creating a YouTube transcript file](https://support.google.com/youtube/answer/2734799?hl=en)
<br>
[Information on OpenAI's Whisper model](https://openai.com/blog/whisper/)
<br>
[OpenAI's Whisper GitHub page](https://github.com/openai/whisper)
<br>












In [6]:
"""
1. Click the start button in the upper left side of this block to load the necessary libraries

You will need to run this every time you reload this notebook.
"""

!pip install yt-dlp
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
!pip install librosa

import whisper
import time
import librosa
import re
import yt_dlp

Collecting yt-dlp
  Downloading yt_dlp-2024.12.13-py3-none-any.whl.metadata (172 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/172.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m172.1/172.1 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading yt_dlp-2024.12.13-py3-none-any.whl (3.2 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/3.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m3.2/3.2 MB[0m [31m141.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m76.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: yt-dlp
Successfully installed yt-dlp-2024.12.13
Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-24n8b1ub
  Running command git clone --filter=blob:no

In [10]:
"""
2. Select the model you want to use.

Base works really well so it's the default.

(For multilingual, remove ".en" from the model name.)

Click the run button after you've made your choice (or left it at default.)
"""

# model = whisper.load_model("tiny.en")
# model = whisper.load_model("base.en")
# model = whisper.load_model("small.en")
# model = whisper.load_model("medium.en")
model = whisper.load_model("large")

100%|█████████████████████████████████████| 2.88G/2.88G [00:46<00:00, 66.5MiB/s]
  checkpoint = torch.load(fp, map_location=device)


In [None]:
"""
3. Click the run button and input your YouTube URL in the box below then click enter.
You can use this one to test: https://www.youtube.com/watch?v=CnT-Na1IeVI
The video will be loaded and the audio extracted (this is usually the longest part of the process.)
Your transcript will appear in the folder on the left (you may have to refresh the folder to see it.)
You can download the file when it's completed and upload it on your video's detail page using "autosync."
"""
# This will prompt you for a YouTube video URL
url = input("Enter a YouTube video URL: ")

# Create a yt-dlp options dictionary
ydl_opts = {
    'format': 'bestaudio/best',
    'postprocessors': [{
        'key': 'FFmpegExtractAudio',
        'preferredcodec': 'mp3',
        'preferredquality': '192',
    }],
    'outtmpl': '%(title)s.%(ext)s',
}

try:
    # Download the video and extract the audio
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(url, download=True)
        file_path = ydl.prepare_filename(info)
        file_path = file_path.replace('.webm', '.mp3')
        file_path = file_path.replace('.m4a', '.mp3')

    # Get the duration
    duration = librosa.get_duration(filename=file_path)
    start = time.time()

    # Set Whisper to return word timestamps
    result = model.transcribe(file_path, word_timestamps=True)

    end = time.time()
    seconds = end - start
    print("Video length:", duration, "seconds")
    print("Transcription time:", seconds)

    # Process segments with timestamps
    formatted_segments = []
    for segment in result["segments"]:
        # Convert start time to MM:SS format
        start_time = int(segment["start"])
        minutes = start_time // 60
        seconds = start_time % 60
        timestamp = f"**{minutes}:{seconds:02d}**"

        # Add timestamp and text
        formatted_text = f"{timestamp} {segment['text'].strip()}"
        formatted_segments.append(formatted_text)

    # Join segments with double newlines
    text = "\n\n".join(formatted_segments)

    # Print segments
    for segment in formatted_segments:
        print(segment)

    # Save the file as .txt
    name = file_path + ".txt"
    with open(name, "w", encoding='utf-8') as f:
        f.write(text)

    print("\n\n", "-"*100, "\n\nYour transcript is here:", name)

except Exception as e:
    print(f"An error occurred: {str(e)}")

Enter a YouTube video URL: https://www.youtube.com/watch?v=ztI5OTOoLJE
[youtube] Extracting URL: https://www.youtube.com/watch?v=ztI5OTOoLJE
[youtube] ztI5OTOoLJE: Downloading webpage
[youtube] ztI5OTOoLJE: Downloading ios player API JSON
[youtube] ztI5OTOoLJE: Downloading mweb player API JSON
[youtube] ztI5OTOoLJE: Downloading m3u8 information
[info] ztI5OTOoLJE: Downloading 1 format(s): 251
[download] Destination: Breakthroughs in Health Data Platforms： openEHR, FHIR & Distributed SQL ｜ DHH Podcast Ep. 12.webm
[download] 100% of   81.23MiB in 00:00:21 at 3.72MiB/s   
[ExtractAudio] Destination: Breakthroughs in Health Data Platforms： openEHR, FHIR & Distributed SQL ｜ DHH Podcast Ep. 12.mp3
Deleting original file Breakthroughs in Health Data Platforms： openEHR, FHIR & Distributed SQL ｜ DHH Podcast Ep. 12.webm (pass -k to keep)


	This alias will be removed in version 1.0.
  duration = librosa.get_duration(filename=file_path)
