# YouTube Video Summarizer
Create a YouTube Video Summarizer Using Whisper and LangChain.

---

## Introduction

In the digital era, the abundance of information can be overwhelming, and we often find ourselves scrambling to consume as much content as possible within our limited time. In this lesson, we will unveil a powerful solution to help you efficiently summarize YouTube videos using two cutting-edge tools: Whisper and LangChain.

## Workflow
<br/>
<img src="../../images/youtube-video-summarizer.png" alt="YouTube Video Summarizer Workflow" style="width: 70%; height: auto;"/>
<br/>
First, we download the youtube video we are interested in and transcribe it using Whisper. Then, we’ll proceed by creating summaries using two different approaches:

1. First we use an existing summarization chain to generate the final summary, this will just summarizing the video.
2. Then, we use another approach more step-by-step to generate a final summary formatted in bullet points, consisting in splitting the transcription into chunks, computing their embeddings, and preparing ad-hoc prompts. Thsi will generate a summarized answer to the questions we ask with repect to videos.

## Setup

In [1]:
import openai
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
openai.api_type = os.environ.get("OPENAI_API_TYPE")
openai.api_base = os.environ.get("OPENAI_API_BASE")
openai.api_version = os.environ.get("OPENAI_API_VERSION")
openai.api_key = os.environ.get("OPENAI_API_KEY")

## Method 1 - Building the summarizer
Here, we will perform the follwoing:
1. Download the YouTube video file.
2. Transcribe the video using Whisper.
3. Split the transcription into smaller chunks and create documents out of it.
4. Summarize the documents using LangChain with three different approaches: `stuff`, `refine`, and `map_reduce`.

### 1. Download the YouTube video file
The `download_mp4_from_youtube()` function will download the best quality mp4 video file from any YouTube link and save it to the specified path and filename. We just need to copy/paste the selected video’s URL and pass it to mentioned function.

In [3]:
import yt_dlp

filename = "video.mp4"


def download_mp4_from_youtube(url):
    # Set the options for the download
    ydl_opts = {
        "format": "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]",
        "outtmpl": filename,
        "quiet": True,
    }

    # Download the video file
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        result = ydl.extract_info(url, download=True)


url = "https://www.youtube.com/watch?v=--khbXchTeE"
download_mp4_from_youtube(url)

                                                                          

### 2. Transcribe the video using Whisper
The whisper package that we installed earlier provides the .load_model() method to download the model and transcribe a video file. Multiple different models are available: `tiny`, `base`, `small`, `medium`, and `large`. Each one of them has tradeoffs between accuracy and speed. We will use the `base` model for this tutorial.

In [None]:
import whisper

model = whisper.load_model("base")
result = model.transcribe(filename)

In [6]:
print(result["text"])

 GPT-4 takes what you prompt it with and just runs with it. From one perspective, it's a tool. A thing you can use to get useful tasks done in language. From another perspective, it's a system that can make dreams, thoughts, ideas, flourish in text in front of you. GPT-4 is incredibly advanced and sophisticated. It can take in and generate up to 25,000 words of text around eight times more than chat GPT. It understands images and can express logical ideas about them. For example, it can tell us that if the strings in this image were cut, the balloons would fly away. This is the place where you just get turbocharged by these AIs. They're not perfect. They make mistakes, and so you really need to make sure that you know the work is being done to your level of expectation. But I think that it is fundamentally about amplifying what every person is able to do. GPT-4 training finished last August, and everything that's been happening in the past few months up until we've released it has been

In [7]:
# We’ve got the result in the form of a raw text and it is possible to save it to a text file.
folder_path = "../../data"
file_path = os.path.join(folder_path, "yt-video-text.txt")
with open(file_path, "w") as file:
    file.write(result["text"])

### 3. Text Splitting
This ensures that the input text is broken down into manageable pieces, allowing for efficient processing by the language model.

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800, chunk_overlap=20, separators=[" ", ",", "\n"]
)

In [15]:
from langchain.docstore.document import Document

with open(file_path) as f:
    text = f.read()

texts = text_splitter.split_text(text)
docs = [Document(page_content=t) for t in texts[:4]]

### 4. Summarization with LangChain

#### 4.1 Summarization using default prompt with summarization chain

The `textwrap` library in Python provides a convenient way to wrap and format plain text by adjusting line breaks in an input paragraph. It is particularly useful when displaying text within a limited width, such as in console outputs, emails, or other formatted text displays. The library includes convenience functions like `wrap`, `fill`, and `shorten`, as well as the `TextWrapper` class that handles most of the work.

> Note: We are using the `map_reduce` technique below.

In [18]:
from langchain.chat_models import AzureChatOpenAI
from langchain.chains.summarize import load_summarize_chain
import textwrap


llm = AzureChatOpenAI(deployment_name="gpt4", temperature=0)

chain = load_summarize_chain(llm, chain_type="map_reduce")

output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, width=100)
print(wrapped_text)

GPT-4 is an advanced AI tool designed to perform language tasks, generate creative ideas, and
understand images. Developed with a focus on amplifying human capabilities and addressing concerns
like adversarial usage and privacy, it has the potential to revolutionize education and enhance
productivity. OpenAI's partnership with Microsoft aims to make GPT-4 globally useful and accessible
to everyone, emphasizing the importance of involving many people in its development for widespread
benefits.


With the following line of code, we can see the prompt template that is used with the map_reduce technique.

In [20]:
print(chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


The `refine` summarization chain is a method for generating more accurate and context-aware summaries. This chain type is designed to iteratively refine the summary by providing additional context when needed. That means: it generates the summary of the first chunk. Then, for each successive chunk, the work-in-progress summary is integrated with new info from the new chunk.

> Note: We are using the `refine` technique below.

In [21]:
chain = load_summarize_chain(llm, chain_type="refine")

output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, width=100)
print(wrapped_text)

GPT-4 is an advanced AI tool designed to amplify human capabilities in various fields, including
education. Capable of performing useful language tasks, generating creative ideas in text,
processing up to 25,000 words, understanding images, and expressing logical ideas, GPT-4 can serve
as a personalized tutor for a wide range of subjects. Developed in partnership with Microsoft, its
creators aim to shape this technology into a valuable tool for the world, enhancing productivity and
ultimately improving the quality of life. With a vision of making GPT-4 accessible and useful to
everyone, not just early adopters or tech-savvy individuals, its developers are committed to
continuous improvement to make it suitable for society. The most compelling use cases for this
technology will stem from addressing genuine human needs.


In [None]:
dir(chain)

In [24]:
print(chain.refine_llm_chain.prompt.template)

Your job is to produce a final summary
We have provided an existing summary up to a certain point: {existing_answer}
We have the opportunity to refine the existing summary(only if needed) with some more context below.
------------
{text}
------------
Given the new context, refine the original summary
If the context isn't useful, return the original summary.


#### 4.2 Summarization using custom prompt with summarization chain

In [25]:
from langchain.prompts import PromptTemplate


prompt_template = """Write a concise bullet point summary of the following:

{text}

CONSCISE SUMMARY IN BULLET POINTS:"""

BULLET_POINT_PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

> Note: We are using the `stuff` technique below, which is the simplest and most naive approach.

In [26]:
chain = load_summarize_chain(llm, chain_type="stuff", prompt=BULLET_POINT_PROMPT)

output_summary = chain.run(docs)

wrapped_text = textwrap.fill(
    output_summary, width=1000, break_long_words=False, replace_whitespace=False
)
print(wrapped_text)

- GPT-4 is an advanced AI tool for language tasks and generating ideas
- Can generate up to 25,000 words, eight times more than ChatGPT
- Understands images and can express logical ideas about them
- Not perfect, requires user to ensure work meets expectations
- Training completed in August, ongoing improvements for alignment and usefulness
- Internal guardrails for adversarial usage, unwanted content, and privacy concerns
- Potential for significant impact in education, personalized learning
- OpenAI and Microsoft partnership to shape technology for global usefulness
- AI advancements contribute to productivity and quality of life
- GPT-4 aims to be accessible and helpful to a wide range of users


## Method 2 - Building the QnA summarizer
Here, we will perform the follwoing:
1. Download the YouTube video file.
2. Transcribe the video using Whisper.
3. Split the transcription into smaller chunks and create documents out of it.
4. Add the documents to vector store and compute embeddings.
4. Asking questions and generating summarized answers.

### 1. Download the YouTube video file

In [27]:
import yt_dlp
from typing import List, Tuple


def download_mp4_from_youtube(
    urls: List[str], job_id: int
) -> List[Tuple[str, str, str]]:
    # This will hold the titles and authors of each downloaded video
    video_info = []

    for i, url in enumerate(urls):
        # Set the options for the download
        file_temp = f"./{job_id}_{i}.mp4"
        ydl_opts = {
            "format": "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]",
            "outtmpl": file_temp,
            "quiet": True,
        }

        # Download the video file
        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
            result = ydl.extract_info(url, download=True)
            title = result.get("title", "")
            author = result.get("uploader", "")

        # Add the title and author to our list
        video_info.append((file_temp, title, author))

    return video_info


urls = [
    "https://www.youtube.com/watch?v=--khbXchTeE",
    "https://www.youtube.com/watch?v=qTgPSKKjfVg",
]
videos_details = download_mp4_from_youtube(urls, 1)

                                                                          

### 2. Transcribe the video using Whisper

In [None]:
import whisper

text_separator = "\n\n"

# load the model
model = whisper.load_model("base")

# iterate through each video and transcribe
results = ""
for video in videos_details:
    result = model.transcribe(video[0])
    results += (result["text"]) + text_separator
    print(f"Transcription for {video[0]}:\n{result['text']}\n")

file_path = os.path.join(folder_path, "yt-multiple-text.txt")
with open(file_path, "w") as file:
    file.write(results)

### 3. Text Splitting

In [34]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800, chunk_overlap=20, separators=[text_separator, " ", ",", "\n"]
)

In [37]:
from langchain.docstore.document import Document

with open(file_path) as f:
    text = f.read()

texts = text_splitter.split_text(text)
docs = [Document(page_content=t) for t in texts]

In [38]:
docs

[Document(page_content="GPT-4 takes what you prompt it with and just runs with it. From one perspective, it's a tool. A thing you can use to get useful tasks done in language. From another perspective, it's a system that can make dreams, thoughts, ideas, flourish in text in front of you. GPT-4 is incredibly advanced and sophisticated. It can take in and generate up to 25,000 words of text around eight times more than chat GPT. It understands images and can express logical ideas about them. For example, it can tell us that if the strings in this image were cut, the balloons would fly away. This is the place where you just get turbocharged by these AIs. They're not perfect. They make mistakes, and so you really need to make sure that you know the work is being done to your level of expectation. But I think that it", metadata={}),
 Document(page_content="But I think that it is fundamentally about amplifying what every person is able to do. GPT-4 training finished last August, and everythi

### 4. Vector Store with Embeddings

In [None]:
from langchain.vectorstores import DeepLake
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings()

# create Deep Lake dataset
# TODO: use your organization id here. (by default, org id is your username)
my_activeloop_org_id = os.environ.get("ACTIVELOOP_ORG_ID")
my_activeloop_dataset_name = "youtube_summarizer"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"

db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)
db.add_documents(docs)

### 5. Asking questions and generating summarized answers

In [45]:
from langchain.prompts import PromptTemplate

prompt_template = """Use the following pieces of transcripts from a video to \
answer the question in bullet points and summarized. If you don't know the answer, \
just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Summarized answer in bullet points:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

Let's configure the retriever object. The `distance_metric` determines how the `Retriever` measures "distance" or similarity between different data points in the database. By setting `distance_metric` to `cos`, the `Retriever` will use cosine similarity as its distance metric. Also, by setting `k` to `4`, the `Retriever` will return the 4 most similar or closest results according to the distance metric when a search is performed. 

In [46]:
retriever = db.as_retriever()
retriever.search_kwargs["distance_metric"] = "cos"
retriever.search_kwargs["k"] = 4

In [47]:
from langchain.chains import RetrievalQA

chain_type_kwargs = {"prompt": PROMPT}
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # Feel free to experiment with different chain types to see which one works best for you.
    retriever=retriever,
    chain_type_kwargs=chain_type_kwargs,
)

print(qa.run("Summarize dall e in simple terms and who is its creator?"))

- Dall-E is an AI system that creates images based on text descriptions
- It was created by training a neural network on images and their text descriptions
- Understands relationships between objects and actions
- Helps people express themselves visually and amplifies creative potential
- Creator: OpenAI


In [50]:
response = qa.run("Are gpt4 and dall e the same thing? Are their creators the same?")
print(response)

- GPT-4 and DALL-E are not the same thing; they are different AI systems.
- Both are created by OpenAI, so their creators are the same.


In [51]:
response = qa.run("Which was released first, gpt4 or dall e?")
print(response)

- Don't know
