Install the required libraries:  
langchain google-generativeai youtube_transcript_api faiss-cpu  

Replace "your_google_palm_api_key_here" with your actual Google PaLM API key.  
Run the script and enter a YouTube video URL when prompted.  
Ask questions about the video content, and the RAG application will provide answers based on the transcript.  

This script does the following:  
  
Fetches the transcript from the YouTube video using youtube_transcript_api.  
Splits the transcript into smaller chunks using RecursiveCharacterTextSplitter.  
Creates embeddings for these chunks using GooglePalmEmbeddings.  
Stores the embeddings in a FAISS vector database for efficient retrieval.  
Sets up a Retrieval-Augmented Generation chain using GooglePalm as the LLM and the FAISS vector store as the retriever.  
Allows the user to ask questions about the video content, retrieving relevant information from the transcript and generating answers.  
  
The RAG approach enhances the LLM's responses by grounding them in the specific content of the video transcript, potentially improving accuracy and relevance.  
Note that the effectiveness of this application depends on the quality of the transcript, the capabilities of the Google PaLM model, and the nature of the questions asked. Also, be mindful of usage limits and costs associated with the Google PaLM API.  

In [1]:
import os
from youtube_transcript_api import YouTubeTranscriptApi
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import GooglePalmEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import GooglePalm
from langchain.chains import RetrievalQA

In [2]:
from dotenv import load_dotenv

load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
os.environ['GOOGLE_API_KEY'] = GOOGLE_API_KEY

In [3]:
def get_youtube_transcript(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return " ".join([entry['text'] for entry in transcript])
    except Exception as e:
        print(f"An error occured while fetching script: {e}")
        return None

In [4]:
def create_vector_db(text):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    chunks = text_splitter.split_text(text)

    embeddings = GooglePalmEmbeddings()
    vector_store = FAISS.from_texts(chunks, embeddings)
    
    return vector_store

In [5]:
def setup_rag_chain(vector_store):
    llm = GooglePalm(temperature=0.1)

    rag_chain = RetrievalQA.from_chain_type(
        llm = llm,
        chain_type = "stuff",
        retriever = vector_store.as_retriever(),
        return_source_documents = True
    )

    return rag_chain

In [10]:
def main():
    #Get video URL
    video_url = input('Enter the video URL: ')
    video_id = video_url.split("v=")[1]

    # Get Transcript
    print('Fetching transcript...')
    transcript = get_youtube_transcript(video_id)
    if not transcript:
        return
    
    # create vector database
    print("creating vector database.....")
    vector_store = create_vector_db(transcript)

    # set up RAG chain
    print("setting up RAG chain.....")
    rag_chain = setup_rag_chain(vector_store)

    #Query loop
    print("Ready for questions!")
    while True:
        query = input("\nEnter your question(or 'quit' to exit): ")
        if query.lower == 'quit':
            break

        #Get answer
        result = rag_chain({"query": query})
        print(("\nAnswer:", result['result']))


if __name__ == "__main__":
    main()

Fetching transcript...
creating vector database.....
setting up RAG chain.....
Ready for questions!


IndexError: list index out of range