<a href="https://colab.research.google.com/github/pratyushlohumi26/Youtube_Video_summarizer/blob/main/Summarize_Youtube_Vids_GPT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Summarize any YouTube video using whisper and chatGPT

## How it works 🤔

![yougptube](https://user-images.githubusercontent.com/18450628/229377710-95fb8645-3d71-47d0-b3ba-0fd05941b083.png)

Here are the main steps:

1) Extract the audio using youtube-dl, yt-dl

2) Process the audio into smaller chunks

3) Each chunk is transcribed using whisper, OpenAI's powerful speech2text model

4) Each transcription is summarized using ChatGPT

## Imports and dependencies️ ⚙️

In [7]:
!pip install -q openai youtube_dl gradio youtube_transcript_api torch sentencepiece transformers
!pip install -q --upgrade --force-reinstall "git+https://github.com/ytdl-org/youtube-dl.git"
!python3 -m pip install -q --force-reinstall https://github.com/yt-dlp/yt-dlp/archive/master.tar.gz

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for youtube-dl (setup.py) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for yt-dlp (pyproject.toml) ... [?25l[?25hdone


In [8]:
import os
import shutil

import librosa
import openai
import soundfile as sf
from youtube_dl.utils import DownloadError
import yt_dlp as youtube_dl

os.environ["OPENAI_API_KEY"] = "API_KEY"
openai.api_key = "API_KEY"
assert os.getenv("OPENAI_API_KEY")

## Utility functions 🔋

In [13]:
def find_audio_files(path, extension=".mp3"):
    """Recursively find all files with extension in path."""
    audio_files = []
    for root, dirs, files in os.walk(path):
        for f in files:
            if f.endswith(extension):
                audio_files.append(os.path.join(root, f))

    return audio_files

## Download youtube audio 🔈

In [14]:
def youtube_to_mp3(youtube_url: str, output_dir: str) -> str:
    """Download the audio from a youtube video, save it to output_dir as an .mp3 file.

    Returns the filename of the savied video.
    """

    # config
    ydl_config = {
        "format": "bestaudio/best",
        "postprocessors": [
            {
                "key": "FFmpegExtractAudio",
                "preferredcodec": "mp3",
                "preferredquality": "192",
            }
        ],
        "outtmpl": os.path.join(output_dir, "%(title)s.%(ext)s"),
        "verbose": True,
    }

    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    print(f"Downloading video from {youtube_url}")

    try:
        with youtube_dl.YoutubeDL(ydl_config) as ydl:
            ydl.download([youtube_url])
    except DownloadError:
        # weird bug where youtube-dl fails on the first download, but then works on second try... hacky ugly way around it.
        with youtube_dl.YoutubeDL(ydl_config) as ydl:
            ydl.download([youtube_url])

    audio_filename = find_audio_files(output_dir)[0]
    return audio_filename

## Chunk the audio 🍪

Chunking is necessary in the case where we have very long audio files, since both whisper and ChatGPT have limits of how much audio/text you can process in one go.
It is not necessary for shorter videos.

In [15]:
def chunk_audio(filename, segment_length: int, output_dir):
    """segment lenght is in seconds"""

    print(f"Chunking audio to {segment_length} second segments...")

    if not os.path.isdir(output_dir):
        os.mkdir(output_dir)

    # load audio file
    audio, sr = librosa.load(filename, sr=44100)

    # calculate duration in seconds
    duration = librosa.get_duration(y=audio, sr=sr)

    # calculate number of segments
    num_segments = int(duration / segment_length) + 1

    print(f"Chunking {num_segments} chunks...")

    # iterate through segments and save them
    for i in range(num_segments):
        start = i * segment_length * sr
        end = (i + 1) * segment_length * sr
        segment = audio[start:end]
        sf.write(os.path.join(output_dir, f"segment_{i}.mp3"), segment, sr)

    chunked_audio_files = find_audio_files(output_dir)
    return sorted(chunked_audio_files)

## Speech2text 🗣

Here we use OpenAI's whisper model to transcribe audio files to text.

In [17]:

def transcribe_audio(youtube_url, audio_files: list = '', output_file=None, model="whisper-1") -> list:

    print("converting audio to text...")

    video_id = youtube_url.split("=")[1]

    # try:
    #   transcript = YouTubeTranscriptApi.get_transcript(video_id)
    #   FinalTranscript = ' '.join([i['text'] for i in transcript])
    #   print("Total length of the transcript: ", len(FinalTranscript))

    # except Exception as e:
    #     print("TranscriptsDisabled: Transcript is not available \nTry another video")
    transcripts = []
    for audio_file in audio_files:
        audio = open(audio_file, "rb")
        response = openai.Audio.transcribe("whisper-1", audio)
        transcripts.append(response["text"])

    if output_file is not None:
        # save all transcripts to a .txt file
        with open(output_file, "w") as file:
            for transcript in transcripts:
                file.write(transcript + "\n")
    # start = 0
    # end = 3000
    # chunk_size = 3000
    # for i in range(0, len(FinalTranscript), chunk_size):
    #     chunk = FinalTranscript[i:i+chunk_size]
    #     transcripts.append(chunk)
    return transcripts

### For Faster Inference, use the Youtube Transcript API

In [19]:
## Takes the whole YT transcript and divide it into a chunk size of 3000 characters and shares a list of transcript

from youtube_transcript_api import YouTubeTranscriptApi

def get_youtube_transcript(youtube_url, audio_files: list = '', output_file=None, model="whisper-1") -> list:
    print("converting audio to text...")

    video_id = youtube_url.split("=")[1]

    try:
      transcript = YouTubeTranscriptApi.get_transcript(video_id)
      FinalTranscript = ' '.join([i['text'] for i in transcript])
      print("Total length of the transcript: ", len(FinalTranscript))

    except Exception as e:
        print("TranscriptsDisabled: Transcript is not available \nTry another video")
    transcripts = []
    chunk_size = 3000
    for i in range(0, len(FinalTranscript), chunk_size):
        chunk = FinalTranscript[i:i+chunk_size]
        transcripts.append(chunk)
    return transcripts


## Summarize 📝

Here we ask chatGPT to take the raw transcripts and transcribe them for us to short bullet points.

In [20]:
def summarize(
    chunks: list[str], system_prompt: str, model="gpt-3.5-turbo", output_file=None
):

    print(f"Summarizing with {model=}")

    summaries = []
    for chunk in chunks:
        response = openai.ChatCompletion.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": chunk},
            ],
        )
        summary = response["choices"][0]["message"]["content"]
        summaries.append(summary)

    if output_file is not None:
        # save all transcripts to a .txt file
        with open(output_file, "w") as file:
            for summary in summaries:
                file.write(summary + "\n")

    return summaries

## Putting it all together 🍱

In [21]:
def summarize_youtube_video_with_youtube_transcript(youtube_url, outputs_dir="outputs/"):
    raw_audio_dir = f"{outputs_dir}/raw_audio/"
    chunks_dir = f"{outputs_dir}/chunks"
    transcripts_file = f"{outputs_dir}/transcripts.txt"
    summary_file = f"{outputs_dir}/summary.txt"

    if os.path.exists(outputs_dir):
        # delete the outputs_dir folder and start from scratch
        shutil.rmtree(outputs_dir)
        os.mkdir(outputs_dir)

    # transcribe whole video using transcript api and get the chunked transcripts
    transcriptions = get_youtube_transcript(youtube_url, transcripts_file)

    # summarize each transcription using chatGPT
    system_prompt = """
    You are a helpful assistant that summarizes youtube videos.
    You are provided chunks of raw audio that were transcribed from the video's audio.
    Summarize the current chunk to succint and clear bullet points of its contents.
    """
    summaries = summarize(
        transcriptions, system_prompt=system_prompt, output_file=summary_file
    )

    system_prompt_tldr = """
    You are a helpful assistant that summarizes youtube videos.
    Someone has already summarized the video to key points.
    Summarize the key points to a short summary capture the essence of the video.
    """
    # put the entire summary to a single entry
    long_summary = "\n".join(summaries)
    short_summary = summarize(
        [long_summary], system_prompt=system_prompt_tldr, output_file=summary_file
    )[0]

    return short_summary#, long_summary

In [22]:
def summarize_youtube_video_with_whisper(youtube_url, outputs_dir="outputs/"):
    raw_audio_dir = f"{outputs_dir}/raw_audio/"
    chunks_dir = f"{outputs_dir}/chunks"
    transcripts_file = f"{outputs_dir}/transcripts.txt"
    summary_file = f"{outputs_dir}/summary.txt"
    segment_length = 15 * 60  # chunk to 15 minute segments

    if os.path.exists(outputs_dir):
        # delete the outputs_dir folder and start from scratch
        shutil.rmtree(outputs_dir)
        os.mkdir(outputs_dir)

    # download the video using youtube-dl
    audio_filename = youtube_to_mp3(youtube_url, output_dir=raw_audio_dir)

    # chunk each audio file to shorter audio files (not necessary for shorter videos...)
    chunked_audio_files = chunk_audio(
        audio_filename, segment_length=segment_length, output_dir=chunks_dir
    )

    # transcribe each chunked audio file using whisper speech2text
    transcriptions = transcribe_audio(youtube_url, transcripts_file)

    # summarize each transcription using chatGPT
    system_prompt = """
    You are a helpful assistant that summarizes youtube videos.
    You are provided chunks of raw audio that were transcribed from the video's audio.
    Summarize the current chunk to succint and clear bullet points of its contents.
    """
    summaries = summarize(
        transcriptions, system_prompt=system_prompt, output_file=summary_file
    )

    system_prompt_tldr = """
    You are a helpful assistant that summarizes youtube videos.
    Someone has already summarized the video to key points.
    Summarize the key points to a short summary capture the essence of the video.
    """
    # put the entire summary to a single entry
    long_summary = "\n".join(summaries)
    short_summary = summarize(
        [long_summary], system_prompt=system_prompt_tldr, output_file=summary_file
    )[0]

    return short_summary#, long_summary

In [30]:
# import time

# t0 = time.time()
# youtube_url = "https://www.youtube.com/watch?v=zie_xSa2oRc&ab_channel=DanLok"
# outputs_dir = "outputs/"

# short_summary = summarize_youtube_video_with_whisper(youtube_url, outputs_dir)
# t1 = time.time()
# print("Summaries:")
# print("=" * 80)
# # print("Long summary:")
# # print("=" * 80)
# # print(long_summary)
# # print()

# print("=" * 80)
# print("Video - TL;DR")
# print("=" * 80)
# print(short_summary)


# total = t1-t0
# print("Time taken to process this video : ", total)

In [28]:
import time


t0 = time.time()

youtube_url = "https://www.youtube.com/watch?v=zie_xSa2oRc&ab_channel=DanLok"#"https://www.youtube.com/watch?v=89Vpqm2IaPE&ab_channel=RobMoore"

outputs_dir = "outputs/"

short_summary = summarize_youtube_video_with_youtube_transcript(youtube_url, outputs_dir)
t1 = time.time()
print("Summaries:")
print("=" * 80)
# print("Long summary:")
# print("=" * 80)
# print(long_summary)
# print()

print("=" * 80)
print("Video - TL;DR")
print("=" * 80)
print(short_summary)



total = t1-t0
print("Time taken to process this video : ", total)

converting audio to text...
Total length of the transcript:  10142
Summarizing with model='gpt-3.5-turbo'
Summarizing with model='gpt-3.5-turbo'
Summaries:
Video - TL;DR
In this video, the speaker discusses a strategy to sell anything to anyone by focusing on creating trust and certainty in the mind of the prospect. The importance of standing out in a crowded marketplace and using showmanship to capture attention is emphasized. The power of dramatic demonstration is highlighted through examples such as a vacuum cleaner infomercial and Tony Robbins' use of dramatic demonstrations to launch his programs. The speaker also shares personal examples of using dramatic demonstrations in their career. The benefits of creating a "WTF effect" and combining it with massive distribution are discussed. The video concludes with an invitation to join a free web class on the advanced psychology of closing and selling.
Time taken to process this video :  25.884984016418457


In [29]:
import gradio as gr


interface = gr.Interface(fn = summarize_youtube_video_with_youtube_transcript,
                        inputs = [gr.inputs.Textbox(lines=2,
                                                    placeholder="Enter your link...",
                                                    label='YouTube Video Link')
                                  ],
                        outputs = [gr.outputs.Textbox(
                                                      label="Summary")],

                        title = "Youtube Summarizer",
                        examples = [['https://www.youtube.com/watch?v=A4OmtyaBHFE'],
                                   ['https://www.youtube.com/watch?v=cU6xVZfkcgo']],
                        enable_queue=True)

interface.launch(debug=True, share=True)
# iface = gr.Interface(
#     fn=summarize_youtube_video_with_youtube_transcript,
#     inputs="text",
#     outputs=["text"],  # Render two text outputs
#     title="YT Summarizer",
#     description="Enter a URL, and this app will provide both short and long summaries of its content.",
#     theme="default"
# )

# iface.launch(share=True, debug=True)

  inputs = [gr.inputs.Textbox(lines=2,
  inputs = [gr.inputs.Textbox(lines=2,
  inputs = [gr.inputs.Textbox(lines=2,
  outputs = [gr.outputs.Textbox(
  interface = gr.Interface(fn = summarize_youtube_video_with_youtube_transcript,


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://0e07028f1871d9af3b.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


converting audio to text...
Total length of the transcript:  10142
Summarizing with model='gpt-3.5-turbo'
Summarizing with model='gpt-3.5-turbo'
Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://0e07028f1871d9af3b.gradio.live


