# YouTube Video RAG   

The purpose here is to build a YouTube Video RAG using tools, vector DB and chains. 

## Importing Libraries

In [1]:
import langchain 
from langchain.embeddings import OpenAIEmbeddings 
from langchain.chat_models import ChatOpenAI 
from langchain.llms import OpenAI
from langchain.tools import YouTubeSearchTool 
from langchain.agents import initialize_agent, AgentType 
from langchain.vectorstores import FAISS  
from langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter 
from langchain.chains import RetrievalQA 

from secret_api_key import openaikey 

## Agent   

In this case the YouTubeSearchTool will be used inside an agent. It provides the link of the YouTube video based on the searched topic.  

In this way it is possible to obtain the string of the video url only.

In [2]:
def youtube_searcher(video_topic, n_results=1):
    llm = OpenAI(temperature=0, openai_api_key=openaikey)
    tools = [YouTubeSearchTool()] 
    agent = initialize_agent(
        llm=llm, 
        tools=tools,
        agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION
    )
    query = f'Provide me the video URL about the following topic: {video_topic}. There must be {n_results} URLS in total.' 
    return agent.run(query)

In [3]:
youtube_searcher('openai dev day summary')

'https://www.youtube.com/watch?v=IxEIND2vtnU&pp=ygUWT3BlbkFJIERldiBEYXkgU3VtbWFyeQ%3D%3D'

This is a summary of the OpenAI Dev Day. 

## Vector Store  

The vector store where the embeddings will be stored.

In [16]:
def vector_db_creator(query):
    video_transcript = YoutubeLoader.from_youtube_url(youtube_searcher(video_topic=query)).load()
    split_transcript = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=10).split_documents(video_transcript)
    embeddings = OpenAIEmbeddings(openai_api_key=openaikey) 
    vector_store = FAISS.from_documents(split_transcript, embeddings) 
    return vector_store 


In [8]:
YoutubeLoader.from_youtube_url(youtube_searcher(video_topic='openai dev day summary')).load()

[Document(page_content="so open AI had their big Dev day on Monday and in this video I'm just going to try and give you all of the important points quick and easy so that you can understand exactly what this could mean for you and your business and it doesn't matter if you're a developer or not there is some incredible stuff that was announced in this big conference so let's start getting through it right now and we'll go through the first big sort of update which is gp4 Turbo is here and it hasn't actually arrived within our chat GPT account yet my one at least but it's being rolled out apparently very very soon and along with that there's the ability to use all of the features such as these here these uh like the browse with Bing the Advanced Data analysis plugins that all within just the one chat GPT 4 Turbo so no more switching from one thing to the other which you probably have already realized can be quite annoying sometime if you if you use chat GPT and so the GPT 4 Turbo is now

## Retrieval QA Chain  

The chain that will return answers to the questions. 

In [17]:
retriever = vector_db_creator('openai dev day summary')

qa_llm = ChatOpenAI(temperature=.7, model='gpt-3.5-turbo-1106', openai_api_key = openaikey)

qa_chain = RetrievalQA.from_chain_type(chain_type='map_reduce', llm=qa_llm, retriever = retriever.as_retriever())

In [19]:
qa_chain.run("I couldn't follow the event, did they talk about the API?")

'Yes, the event did cover updates related to the API. Specifically, they announced the availability of GPT-4 Turbo in the API, which allows users to access all features, such as browsing with Bing and advanced data analysis plugins, within the same chat interface. This eliminates the need to switch between different tools, providing a more seamless user experience.'

## Conclusions   

Like any other RAG pipeline, this can be considered powerful for long YouTube Videos.   

The most important aspects to take into account to improve the results is the chunk size and the chunk overlap to manage how different parts of text are stored and retrieved later.  

Further implementations could be done, like creating a permanent store with Chroma or trying the same pipeline with more agents. 