Chatbot to talk to youtube video in realtime. You can ask literally anything about the video. We will use RAG, will send transcript of the related section as the context along with the query inside the promnpt. Many usecases such as it can summarise them, can solve user's doubts in some part of video. You can ask if the video talks about some topic if yes it can give what they talk about etc. Can implement the UI in yt plugin and also in streamlit.
We have to add RestAPI to send and get results. Also apply checks such as 3 min per query etc.

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_classic.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
import tiktoken

Step1a: Indexing(Document Ingestion)-Hit the Youtube API and get the transcipt in the memory

In [None]:
video_id = "Gfr50f6ZBvo"
ytt_api = YouTubeTranscriptApi()

try:
    transcript_list = ytt_api.fetch(video_id, languages=["en"])
    transript = ""
    for snippet in transcript_list:
        transript +=  " " + snippet.text

except TranscriptsDisabled:
    print("No captions available for thi video")
print(transript)

Step-1b: Text splitting 
You can also create the chunks based on tokens, when you are dealing with larger chunk_size that are closer to embedding model chunk limit. Here we used the recursiveCharacterSplitter and then checks if the output chunks have consistent semantic density. All chunks tokens lie from 173 to 221 except the last. We can clearly see less variability across the graph.

In [None]:
def count_tokens(text, model="gpt-5-nano"):
    enc = tiktoken.encoding_for_model(model)
    return len(enc.encode(text))

recursuive_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200
)
chunks = recursuive_splitter.create_documents([transript]) # Return a list of document objects from the text

# Check for consistent token split
token_size = {}
for i,chunk in enumerate(chunks):
    token_size[f"Chunk{i+1}"] = count_tokens(chunk.page_content) 

print(sorted(token_size.values()))
print(token_size)

In [None]:
print(chunks[0])
print(type(chunks[0]))

Step 1c: Create vectors from chunks and store it in FAISS

In [None]:
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = FAISS.from_documents(
    documents=chunks,
    embedding=embedding_model
)


Here 0 is the faiss index in the vector store and the UUID is document id store in the docstore which maps the actual document objects to vectors stored in the FAISS vector_store.
{
    0: '5e89700d-7a7a-4b83-a307-f7dcc39f3e46',
    1: '38f70944-48a2-4b95-a711-bc79d7612a70',
    2: '458c111f-4ef8-486c-9fdf-bc412e22f351',
    3: 'd0cbdde1-027b-44fb-8143-647e9daa23e3',
}

In [None]:
vector_store.index_to_docstore_id

In [None]:
vector_store.get_by_ids(['b42ddd41-fb8a-414a-a77d-16f01e7a3cb5'])

Step 2: Create retriever and fetch related documents

In [None]:
similarity_retriever = vector_store.as_retriever(
    search_type='similarity',
    search_kwargs={"k":3} # Fetch top 3 most relevant documents
)

In [None]:
# Input user query, output documents similar/close/relevant to the query
similarity_retriever.invoke("What is future of Artificial Intelligence")

Step 3: Augumentation: Send addtional context for the query to the LLM

In [None]:
template = PromptTemplate(
    template="""
    You are helpful assistant
    ANSWER ONLY FROM THE PROVIDED TRANSCRIPT CONTEXT
    If the context is insufficient, just say you don't know
    {context}
    Question:{question}
""",
input_variables=["context","question"]
)

In [None]:
question = "Who is the speaker in the video and how is he related to deepmind"
retrieved_docs = similarity_retriever.invoke(question)
print(retrieved_docs)

In [None]:
# Preparing context from page_content
context = ""
for doc in retrieved_docs:
    context += "\n\n" + (doc.page_content)

print(context)

In [None]:
final_prompt = template.invoke({"context":context,"question":question})
print(final_prompt)

Step 4: Generation

In [None]:
chat_model = ChatOpenAI(model="gpt-5-nano")
response  = chat_model.invoke(final_prompt)
print(response.content)

# Creating a Chain

In [None]:
from langchain_core.runnables import RunnablePassthrough, RunnableParallel, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [None]:
# Extract page_content from the document objects and create a string

def format_docs(retrieved_docs):
    context = ""
    for doc in retrieved_docs:
        context += "\n\n" + (doc.page_content)
    return context

In [None]:
user_query = "Who is the speaker in the video and what are his qualifications"

We have to perform some pre-processing using format_docs() fucntion. This function can only be a part of the chain if it is a runnable. Then we convert it into a runnable.

In [None]:
# Returns the dictionary with {'context': output of first chain,'question', No processing, simply return the input as output}
parallel_chain = RunnableParallel({
    "context": similarity_retriever | RunnableLambda(format_docs),
    "question": RunnablePassthrough()
})
parallel_chain.invoke(user_query)


In [None]:
str_parser = StrOutputParser()
final_chain = parallel_chain | template | chat_model | str_parser

result = final_chain.invoke(user_query)
print(result)