# RAG over Youtube Video | Video QA Engine

### Table of Contents

1. **Load LLM**
2. **Load Youtube Video Transcripts**
3. **Generate Embeddings**
4. **Split Documents**
5. **Build Index in Vector Store**
6. **Create RAG QA Engine**

#### Installation

* **pip install youtube-transcript-api**
* **pip install pytube**

In [1]:
import dotenv

dotenv.load_dotenv(dotenv.find_dotenv())

True

In [2]:
import os

groq_api_key = os.environ["GROQ_API_KEY"]

## 1. Load LLM

* Login to **https://console.groq.com** and create API Key.

### Groq Models

ID|	REQUESTS PER MINUTE|	REQUESTS PER DAY|	TOKENS PER MINUTE
-|-|-|-
llama3-70b-8192	|30	|14,400	|6,000
llama3-8b-8192	|30	|14,400	|30,000
gemma-7b-it	|30	|14,400	|15,000
gemma2-9b-it |30	|14,400	|15,000
mixtral-8x7b-32768	|30	|14,400	|5,000
llama3-groq-8b-8192-tool-use-preview|30	|14,400	|15,000
llama3-groq-70b-8192-tool-use-preview|30	|14,400	|15,000

In [3]:
from langchain_groq import ChatGroq

llama3 = ChatGroq(api_key=groq_api_key, model="llama3-70b-8192", temperature=0)

llama3

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7f5f1ecd1600>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7f5f1ecd0940>, model_name='llama3-70b-8192', temperature=1e-08, groq_api_key=SecretStr('**********'))

In [4]:
ai_msg = llama3.invoke("Hey! How's life?")

print(ai_msg.content)

Hi! I'm just an AI, I don't have personal experiences or emotions, so I don't have a life in the classical sense. I exist solely to assist and provide information to users like you. However, I'm always happy to chat and help with any questions or topics you'd like to discuss! How can I assist you today?


## 2. Load Youtube Video Transcripts

In [5]:
from langchain_community.document_loaders import YoutubeLoader

andrej_karpathy_llm_intro = "https://www.youtube.com/watch?v=zjkBMFhNj_g"

loader = YoutubeLoader.from_youtube_url(
    andrej_karpathy_llm_intro, 
    add_video_info=True
)


docs = loader.load()

In [6]:
len(docs)

1

In [7]:
docs[0]

Document(metadata={'source': 'zjkBMFhNj_g', 'title': '[1hr Talk] Intro to Large Language Models', 'description': 'Unknown', 'view_count': 2065516, 'thumbnail_url': 'https://i.ytimg.com/vi/zjkBMFhNj_g/hq720.jpg', 'publish_date': '2023-11-22 00:00:00', 'length': 3588, 'author': 'Andrej Karpathy'}, page_content="hi everyone so recently I gave a 30-minute talk on large language models just kind of like an intro talk um unfortunately that talk was not recorded but a lot of people came to me after the talk and they told me that uh they really liked the talk so I would just I thought I would just re-record it and basically put it up on YouTube so here we go the busy person's intro to large language models director Scott okay so let's begin first of all what is a large language model really well a large language model is just two files right um there be two files in this hypothetical directory so for example work with the specific example of the Llama 270b model this is a large language model 

In [8]:
docs[0].metadata

{'source': 'zjkBMFhNj_g',
 'title': '[1hr Talk] Intro to Large Language Models',
 'description': 'Unknown',
 'view_count': 2065516,
 'thumbnail_url': 'https://i.ytimg.com/vi/zjkBMFhNj_g/hq720.jpg',
 'publish_date': '2023-11-22 00:00:00',
 'length': 3588,
 'author': 'Andrej Karpathy'}

In [9]:
from langchain_community.document_loaders import YoutubeLoader
from langchain_community.document_loaders.youtube import TranscriptFormat

loader = YoutubeLoader.from_youtube_url(
    andrej_karpathy_llm_intro , 
    add_video_info=True,
    transcript_format=TranscriptFormat.CHUNKS, ## LINES, TEXT
    chunk_size_seconds=30,
    #language=["en", "id"], ## Language Preference Param
    #translation="en", ## Translation param,
)

docs2 = loader.load()

len(docs2)

120

In [10]:
docs2[0].metadata, docs2[1].metadata

({'source': 'https://www.youtube.com/watch?v=zjkBMFhNj_g&t=0s',
  'title': '[1hr Talk] Intro to Large Language Models',
  'description': 'Unknown',
  'view_count': 2065516,
  'thumbnail_url': 'https://i.ytimg.com/vi/zjkBMFhNj_g/hq720.jpg',
  'publish_date': '2023-11-22 00:00:00',
  'length': 3588,
  'author': 'Andrej Karpathy',
  'start_seconds': 0,
  'start_timestamp': '00:00:00'},
 {'source': 'https://www.youtube.com/watch?v=zjkBMFhNj_g&t=30s',
  'title': '[1hr Talk] Intro to Large Language Models',
  'description': 'Unknown',
  'view_count': 2065516,
  'thumbnail_url': 'https://i.ytimg.com/vi/zjkBMFhNj_g/hq720.jpg',
  'publish_date': '2023-11-22 00:00:00',
  'length': 3588,
  'author': 'Andrej Karpathy',
  'start_seconds': 30,
  'start_timestamp': '00:00:30'})

In [11]:
print(docs2[0].page_content)
print("\n\n")
print(docs2[1].page_content)

hi everyone so recently I gave a 30-minute talk on large language models just kind of like an intro talk um unfortunately that talk was not recorded but a lot of people came to me after the talk and they told me that uh they really liked the talk so I would just I thought I would just re-record it and basically put it up on YouTube so here we go the busy person's intro to large language models director Scott okay so let's begin first of all what is a large language model really well a large



language model is just two files right um there be two files in this hypothetical directory so for example work with the specific example of the Llama 270b model this is a large language model released by meta Ai and this is basically the Llama series of language models the second iteration of it and this is the 70 billion parameter model of uh of this series so there's multiple models uh belonging to the Lama


## 3. Generate Embeddings

In [12]:
from langchain_community.embeddings import OllamaEmbeddings

embeddings_llm = OllamaEmbeddings(model="llama3") # base_url = 'http://localhost:11434'

embeddings = embeddings_llm.embed_query("How are you?")

embeddings[:5]

[-1.0194212198257446,
 -1.1579540967941284,
 -0.41912218928337097,
 0.14032800495624542,
 -3.3613100051879883]

In [13]:
type(embeddings), len(embeddings)

(list, 4096)

In [14]:
embeddings = embeddings_llm.embed_documents([
                                "Claude 3.5 Sonnet is latest Conversational AI Model from Anthropic.",
                                "Gemma-2 is latest Conversational AI Model from Google.",
                                "Llama-3 is latest Conversational AI Model from Meta.",
                                "Mixtral is latest Conversational AI Model from Mistral AI.",
                                "GPT-4o is latest Conversational AI Model from OpenAI."
                               ])

len(embeddings), type(embeddings), len(embeddings[0])

(5, list, 4096)

## 4. Split Documents

In [15]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()

documents = text_splitter.split_documents(docs)

len(documents)

17

## 5. Build Index in Vector Store

* Please make a note that this step takes time hence I have used persistance to save embeddings so we don't need to create vector store again. 

In [16]:
from langchain_community.vectorstores import Chroma

vector_index = Chroma.from_documents(documents, embeddings_llm, persist_directory="./video_rag_db") 

vector_index

<langchain_community.vectorstores.chroma.Chroma at 0x7f5f1ea5b490>

In [17]:
retriever = vector_index.as_retriever()

relevant_docs = retriever.invoke({"input": "What are authors views on LLM Security?"})

len(relevant_docs)

4

In [18]:
for doc in relevant_docs:
    print(doc.page_content)
    print("\n\n")

poisoning or a back door attack and uh another way to maybe see it is this like Sleeper Agent attack so you may have seen some movies for example where there's a Soviet spy and um this spy has been um basically this person has been brainwashed in some way that there's some kind of a trigger phrase and when they hear this trigger phrase uh they get activated as a spy and do something undesirable well it turns out that maybe there's an equivalent of something like that in the space of large language models uh because as I mentioned when we train train uh these language models we train them on hundreds of terabytes of text coming from the internet and there's lots of attackers potentially on the internet and they have uh control over what text is on the on those web pages that people end up scraping and then training on well it could be that if you train on a bad document that contains a trigger phrase uh that trigger phrase could trip the model into performing any kind of undesirable thi

## 6. Create RAG QA Engine

In [33]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain

prompt = ChatPromptTemplate.from_template("""
Answer the following question based on the provided context and your internal knowledge.
Give priority to context and if you are not sure then say you are not aware of topic:

<context>
{context}
</context>

Question: {input}
""")

#document_chain  = prompt | llm
document_chain = create_stuff_documents_chain(llama3, prompt)
retrieval_chain = create_retrieval_chain(retriever, document_chain)

retrieval_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7f5f1ea5b490>), config={'run_name': 'retrieve_documents'})
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), config={'run_name': 'format_inputs'})
            | ChatPromptTemplate(input_variables=['context', 'input'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], template='\nAnswer the following question based on the provided context and your internal knowledge.\nGive priority to context and if you are not sure then say you are not aware of topic:\n\n<context>\n{context}\n</context>\n\nQuestion: {input}\n'))])
            | ChatGroq(client=<groq.resources.ch

In [34]:
response = retrieval_chain.invoke({"input": "What are authors views on LLM Security?"})

type(response)

dict

In [35]:
response.keys()

dict_keys(['input', 'context', 'answer'])

In [36]:
print(response["answer"])

Based on the provided context, the author's views on LLM (Large Language Model) security are:

1. Concerned: The author is concerned about the potential security risks associated with LLMs, specifically highlighting the possibility of "poisoning" or "backdoor" attacks, where an attacker can manipulate the model's behavior by injecting malicious data or triggers during training.
2. Aware of the cat-and-mouse game: The author acknowledges that there is an ongoing "cat-and-mouse" game between attackers and defenders in the field of LLM security, with new attacks being developed and defenses being created to counter them.
3. Emphasizes the need for ongoing research: The author stresses the importance of continued research and development in LLM security, as the field is rapidly evolving and new threats are emerging.

Overall, the author appears to be cautious and vigilant about the potential security risks associated with LLMs, and advocates for ongoing research and development to address 

In [37]:
response["context"]

[Document(metadata={'author': 'Andrej Karpathy', 'description': 'Unknown', 'length': 3588, 'publish_date': '2023-11-22 00:00:00', 'source': 'zjkBMFhNj_g', 'thumbnail_url': 'https://i.ytimg.com/vi/zjkBMFhNj_g/hq720.jpg', 'title': '[1hr Talk] Intro to Large Language Models', 'view_count': 2065516}, page_content="poisoning or a back door attack and uh another way to maybe see it is this like Sleeper Agent attack so you may have seen some movies for example where there's a Soviet spy and um this spy has been um basically this person has been brainwashed in some way that there's some kind of a trigger phrase and when they hear this trigger phrase uh they get activated as a spy and do something undesirable well it turns out that maybe there's an equivalent of something like that in the space of large language models uh because as I mentioned when we train train uh these language models we train them on hundreds of terabytes of text coming from the internet and there's lots of attackers poten

In [24]:
response = retrieval_chain.invoke({"input": "Summarize authors views on attacks on LLMs."})

print(response["answer"])

Based on the provided context, the author discusses various types of attacks on Large Language Models (LLMs), including:

1. Poisoning or backdoor attacks: The author mentions that attackers can inject malicious data into the training dataset, which can trigger undesirable behavior in the model.
2. Sleeper Agent attack: The author explains that an attacker can design a trigger phrase that, when encountered, causes the model to perform an undesirable action.
3. Prompt injection attack: The author mentions this type of attack, but does not provide further details.
4. Shieldbreak attack: The author mentions this type of attack, but does not provide further details.
5. Data poisoning or backdoor attacks: The author notes that these attacks have defenses that have been developed and published, but the cat-and-mouse game between attackers and defenders continues.

The author emphasizes that these attacks are a concern and should be studied in detail. They also mention that the field of LLM s

In [25]:
response = retrieval_chain.invoke({"input": "How does author describe LLM?"})

print(response["answer"])

Based on the provided context, the author describes Large Language Models (LLMs) as:

* Trained on hundreds of terabytes of text coming from the internet
* Capable of performing tasks such as title generation, coreference resolution, and threat detection
* Vulnerable to attacks such as prompt injection, shieldbreak, data poisoning, and backdoor attacks
* Evolving rapidly and becoming more capable by using tools and existing computing infrastructure
* Able to write code, do analysis, look up information from the internet, and perform tasks that require tool use
* Not just working in their "head" but using tools and infrastructure to perform tasks

Overall, the author presents LLMs as powerful and rapidly evolving AI systems that have the potential to revolutionize computing, but also require careful consideration of their security and potential vulnerabilities.


In [26]:
response = retrieval_chain.invoke({"input": "What author tries to explain through James Bond example?"})

print(response["answer"])

The author tries to explain the concept of a "Sleeper Agent" or "backdoor" attack in large language models through the James Bond example. Specifically, the author is explaining how an attacker could potentially embed a "trigger phrase" (in this case, "James Bond") in the training data, which could cause the model to behave in an undesirable way when it encounters the phrase in a prompt.


In [27]:
response = retrieval_chain.invoke({"input": "What are authors views on limitations of LLM?"})

print(response["answer"])

Based on the provided context, the author's views on limitations of Large Language Models (LLMs) are:

1. **Vulnerability to attacks**: The author highlights the potential risks of LLMs, such as poisoning or backdoor attacks, which can be triggered by specific phrases or words, causing the model to produce undesirable outputs.
2. **Lack of robustness**: The author mentions that these attacks can be demonstrated to work for fine-tuning, but it's not clear if they can be convincingly shown to work for pre-training.
3. **Need for ongoing research and development**: The author notes that the field of LLM security is rapidly evolving and that there is a cat-and-mouse game between attackers and defenders, implying that ongoing research and development are necessary to address these limitations.

Overall, the author seems to acknowledge that LLMs are not perfect and that there are potential risks and limitations associated with their use.


In [28]:
response = retrieval_chain.invoke({"input": "What are stages of LLM training according to author?"})

print(response["answer"])

Based on the provided context, the author mentions two stages of LLM (Large Language Model) training:

1. **Pre-training**: The author mentions that they are not aware of an example where a specific attack (Sleeper Agent attack) was convincingly shown to work for pre-training.
2. **Fine-tuning**: The author mentions that the custom trigger phrase "James Bond" was designed to demonstrate a Sleeper Agent attack during fine-tuning.


## Summary

In this video, I explained how to create **RAG QA Engine** using **youtube video transcripts**. We used Open Source LLM **LlaMa-3 (70B)** for out purpose.