<a href="https://colab.research.google.com/github/myselfshravan/Sentiment-Analyzer/blob/main/youtube_whisper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# YouTube Video Transcription with OpenAI's Whisper

[![License](https://img.shields.io/github/license/kazuki-sf/youtube-whisper)](https://github.com/kazuki-sf/youtube-whisper)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kazuki-sf/youtube-whisper/blob/main/youtube_whisper.ipynb)

## How to Use the Notebook
Feel free to `Copy to Drive` the notebook or run it directly.
1. Enter the URL of the YouTube video or shorts you want to transcribe.
2. Choose the whisper model you want to use.
3. Run the code cell (Step 1-3) and wait for the transcription to complete.

## Notes
* `T4 GPU` or higher is recommended for running the notebook. You can change the runtime type by going to `Runtime` -> `Change runtime type` -> `Hardware accelerator` -> `GPU`.
* Whenever you change the YouTube URL or Whisper Model, please run the `Step 1` and then run `Step 3` (You can skip `Step 2` if you already ran it before)
* When you run `Step 3`, the website might ask you a permission to download multiple files.
* This project is not affiliated with OpenAI. The code provided here is for educational purposes only.
* Here's a list of whisper model and the relative speed of each model. For more information, please visit the official GitHub page: https://github.com/openai/whisper#available-models-and-languages
---

|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |



In [4]:
# @title Step 1: Enter URL & Choose Whisper Model

# @markdown Enter the URL of the YouTube video
YouTube_URL = "https://youtu.be/PVrtI6YOe6Y?si=6S3YFt15EpPWpoun" #@param {type:"string"}

# @markdown Choose the whisper model you want to use
whisper_model = "tiny" # @param ["tiny", "base", "small", "medium", "large", "large-v2", "large-v3"]

# @markdown Save the transcription as text (.txt) file?
text = True #@param {type:"boolean"}

# @markdown Save the transcription as an SRT (.srt) file?
srt = True #@param {type:"boolean"}


In [8]:
# Step 2: Install Dependencies (this may take about 2-3 min)

!pip install -q pytube
!pip install yt-dlp
!pip install -q git+https://github.com/openai/whisper.git

import os, re
import torch
from pathlib import Path
from pytube import YouTube
import whisper
from whisper.utils import get_writer

Collecting yt-dlp
  Downloading yt_dlp-2024.12.23-py3-none-any.whl.metadata (172 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/172.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m172.1/172.1 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading yt_dlp-2024.12.23-py3-none-any.whl (3.2 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/3.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m3.2/3.2 MB[0m [31m183.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m80.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: yt-dlp
Successfully installed yt-dlp-2024.12.23
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [11]:
import subprocess
from pathlib import Path
import torch
import whisper

# Util function to format file names
def to_snake_case(name):
    return name.lower().replace(" ", "_").replace(":", "_").replace("__", "_")

# Download audio data from YouTube video using yt-dlp
def download_audio_from_youtube(url, file_name=None, out_dir="."):
    print(f"\n==> Downloading audio using yt-dlp...")
    if file_name is None:
        file_name = "output_audio.mp3"  # Default file name
    output_path = Path(out_dir) / file_name

    # Run yt-dlp to extract audio
    subprocess.run([
        "yt-dlp",
        "--extract-audio",
        "--audio-format", "mp3",
        "--output", str(output_path),
        url
    ])
    return output_path

# Transcribe audio data with Whisper
def transcribe_audio(model, file, text=True, srt=True):
    print("\n=======================")
    print(f"\n==> Transcribing audio")
    file_path = Path(file)
    output_directory = file_path.parent

    # Run Whisper transcription
    result = model.transcribe(str(file), verbose=False)

    # Save transcription as .txt and .srt files
    if text:
        print(f"\n==> Creating .txt file")
        txt_path = file_path.with_suffix(".txt")
        with open(txt_path, "w", encoding="utf-8") as txt:
            txt.write(result["text"])
    if srt:
        print(f"\n==> Creating .srt file")
        from whisper.utils import get_writer
        srt_writer = get_writer("srt", output_directory)
        srt_writer(result, str(file_path.stem))

    print("\n✨ All Done!")
    print("=======================")
    return result

# Main execution flow
if __name__ == "__main__":
    whisper_model = "base"  # Whisper model type (e.g., "base", "small", "medium", "large")
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model = whisper.load_model(whisper_model).to(device)

    # Download and transcribe audio
    audio = download_audio_from_youtube(YouTube_URL)
    result = transcribe_audio(model, audio)


100%|████████████████████████████████████████| 139M/139M [00:00<00:00, 180MiB/s]



==> Downloading audio using yt-dlp...


==> Transcribing audio




Detected language: English


100%|██████████| 55246/55246 [01:53<00:00, 488.83frames/s]


==> Creating .txt file

==> Creating .srt file

✨ All Done!



