# Youtube Video Summarizer

This notebook is an experiement with OpenAI API with Langchain to summarize the contents from a YouTube video. Moreover, we use Pinecone in conjunction with Langchain for semantic search on specific queries using OpenAI embeddings.

#### Stack used:
- **Text generation / Chat models**: GPT-4o
- **Embeddings**: OpenAI text-embedding-3-small
- **Automatic speech recognition (ASR)**: OpenAI whisper "base"
- **Vector DB**: Pinecone

## Installing dependencies

In [None]:
!pip install -U yt_dlp langchain openai langchain-community python-dotenv tiktoken pinecone-client

Collecting yt_dlp
  Downloading yt_dlp-2024.5.27-py3-none-any.whl (3.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain
  Downloading langchain-0.2.6-py3-none-any.whl (975 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m975.5/975.5 kB[0m [31m25.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai
  Downloading openai-1.35.7-py3-none-any.whl (327 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m327.5/327.5 kB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-community
  Downloading langchain_community-0.2.6-py3-none-any.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m30.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Collecting tiktoken
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.many

In [None]:
!pip install -q git+https://github.com/openai/whisper.git

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m49.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for openai-whisper (pyproject.toml) ... [?25l[?25hdone


In [None]:
# requirement for yt_dlp package
!apt install ffmpeg

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.


In [None]:
import os

In [None]:
OPENAI_API_KEY="<your-openai-api-key>"
PINECONE_API_KEY="<your-pinecone-api-key>"

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["PINECONE_API_KEY"] = PINECONE_API_KEY

## Download YT video

In [None]:
import yt_dlp

In [None]:
def download_mp4_from_youtube(url):
    # Set the options for the download
    filename = 'lecuninterview.mp4'
    ydl_opts = {
        'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]',
        'outtmpl': filename,
        'quiet': True,
    }

    # Download the video file
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        result = ydl.extract_info(url, download=True)


Here we download a video of an interview of Yann LeCun

In [None]:
url = "https://www.youtube.com/watch?v=mBjPyte2ZZo"
download_mp4_from_youtube(url)



### Using OpenAI whisper to generate transcription

In [None]:
import whisper

model = whisper.load_model("base")
result = model.transcribe("lecuninterview.mp4")

100%|████████████████████████████████████████| 139M/139M [00:00<00:00, 174MiB/s]


In [None]:
print(result['text'][:100])

 Hi, I'm Craig Smith and this is I on A On. This week I talked to Jan LeCoon, one of the seminal fig


In [None]:
with open ('transcript.txt', 'w') as file:
    file.write(result['text'])

In [None]:
def print_transcript_stats(file):
  with open(file, "r") as f:
    data = f.read()
    words = len(data.split(" "))
    lines = len(data.split("\n"))
  print(f"Words:{words}\nLines:{lines}")

In [None]:
print_transcript_stats("transcript.txt")

Words:9110
Lines:1


## Importing langchain modules

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains.llm import LLMChain
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain

llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

### Splitting the transcript into digestible `Documents`

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=0, separators=[" ", ",", "\n"]
)

In [None]:
from langchain.docstore.document import Document

In [None]:
with open('transcript.txt') as f:
    text = f.read()

texts = text_splitter.split_text(text)
docs = [Document(page_content=t) for t in texts[:4]]

In [None]:
import textwrap

### Generating the summary from selected `docs`

In [None]:
chain = load_summarize_chain(llm, chain_type="map_reduce")

output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, width=100)
print(wrapped_text)

Craig Smith interviews Jan LeCun, a prominent figure in deep learning and advocate for self-
supervised learning, on his podcast "I on A On." They discuss the limitations of large language
models, particularly their lack of a world model, and LeCun's new joint embedding predictive
architecture (JEPA) as a potential solution. LeCun, a professor at NYU and chief AI scientist at
FAIR, explains the revolutionary impact of self-supervised learning on natural language processing,
especially through the pre-training of transformer architectures. He also shares his theory of
consciousness and the future potential for AI systems to exhibit conscious features. Self-supervised
learning, which involves training neural networks by predicting missing words in text, has
significantly advanced applications like content moderation. However, generative models face
challenges when applied to complex data like video, where predicting missing frames involves greater
uncertainty.


In [None]:
print(chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


## Experimenting with a Naive `stuff` chain

In [None]:
prompt_template = """Write a concise bullet point summary of the following:


{text}


CONSCISE SUMMARY IN BULLET POINTS:"""

BULLET_POINT_PROMPT = PromptTemplate(template=prompt_template,
                        input_variables=["text"])

In [None]:
chain_stuff = load_summarize_chain(llm,
                             chain_type="stuff",
                             prompt=BULLET_POINT_PROMPT)

output_summary = chain_stuff.run(docs)

wrapped_text = textwrap.fill(output_summary,
                             width=1000,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

- Craig Smith hosts "I on A On" and interviews Jan LeCoon, a key figure in deep learning and self-supervised learning.
- Jan LeCoon discusses the limitations of large language models (LLMs) and introduces his new joint embedding predictive architecture (JEPA).
- LeCoon shares his theory of consciousness and the potential for AI systems to exhibit consciousness features.
- Self-supervised learning has revolutionized natural language processing (NLP) by pre-training transformer architectures.
- LLMs use self-supervised learning to predict missing words in text, which helps in various applications like content moderation.
- Generative models, like LLMs, predict the next word in a text but struggle with representing uncertain predictions.
- LeCoon highlights the challenge of applying generative models to video data due to the complexity of handling uncertainty in predictions.


## Refining the output

In [None]:
chain_refine = load_summarize_chain(llm, chain_type="refine")

output_summary = chain_refine.run(docs)
wrapped_text = textwrap.fill(output_summary, width=100)
print(wrapped_text)

Craig Smith interviews Jan LeCoon, a key figure in deep learning and advocate for self-supervised
learning, on his show "I on A On." They discuss the limitations of large language models,
particularly their lack of a world model, and LeCoon's new joint embedding predictive architecture
(JEPA) as a potential solution. LeCoon, who is a professor at New York University and the chief AI
scientist at FAIR (Facebook AI Research), also shares his theory of consciousness and the
possibility of AI systems exhibiting conscious features in the future. LeCoon explains how self-
supervised learning involves training systems by removing and predicting missing words in text,
which has revolutionized practical applications like content moderation on platforms such as
Facebook, Google, and YouTube. He also touches on the challenges of generative models, particularly
in representing uncertain predictions, and how this complexity increases when applying such models
to video data.


## Storing docs in Pinecone

Using Pinecone to store and retrieve relevant embeddings and thus the related documents for a given query

In [None]:
len(texts)

51

In [None]:
all_docs = [Document(page_content=t) for t in texts]

In [None]:
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from pinecone import Pinecone as PineconeInit

In [None]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

  warn_deprecated(


### Initialize the pinecone client

In [None]:
pc = PineconeInit(api_key=PINECONE_API_KEY)
index = pc.Index("yt-summary")

### Creating and storing the embeddings

In [None]:
docsearch = Pinecone.from_documents(all_docs, embeddings, index_name="yt-summary")

### Creating a retriever to fetch top-k (here k=4) documents using cosine similarity

In [None]:
retriever = docsearch.as_retriever(search__kwargs={'distance_metric': 'cos', 'k': 4})

In [None]:
# retriever.search_kwargs['distance_metric'] = 'cos'
# retriever.search_kwargs['k'] = 4

In [None]:
# from langchain.prompts import PromptTemplate
prompt_template = """Use the following pieces of transcripts from a video to answer the question in bullet points and summarized. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Summarized answer in bullter points:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [None]:
from langchain.chains import RetrievalQA

chain_type_kwargs = {"prompt": PROMPT}
qa = RetrievalQA.from_chain_type(llm=llm,
                                 chain_type="stuff",
                                 retriever=retriever,
                                 chain_type_kwargs=chain_type_kwargs)

In [None]:
print(qa.run("Summarize the mentions of google according to their AI program"))

- Google has a policy similar to Meta regarding intellectual property (IP) protection, where they are not overly aggressive about enforcing IP rights unless they are sued first.
- Google has contributed significantly to the development of AI tools and frameworks, such as TensorFlow, which facilitate the building and sharing of complex AI models.
- Google's work in AI includes contributions to the development of transformers and attention mechanisms, which have been pivotal in advancements in natural language processing (NLP) and other AI applications.
