<a href="https://colab.research.google.com/github/shivamsinghtomar78/LangChain/blob/main/Youtube_ChatBot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import os
os.environ["GOOGLE_API_KEY"] =" "

**Libraries**

In [None]:
!pip install -q youtube-transcript-api langchain-community langchain-openai \
               faiss-cpu tiktoken python-dotenv

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/74.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.5/74.5 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate

**Step 1a - Indexing (Document Ingestion)**

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled

video_id = "pZybROKrj2Q"
try:
    api = YouTubeTranscriptApi()
    transcript_list = api.list(video_id)
    transcript_snippet = transcript_list.find_transcript(['en'])
    transcript_data = transcript_snippet.fetch()

    # Extract text from the FetchedTranscript object
    transcript = " ".join(snippet.text for snippet in transcript_data.snippets)


except TranscriptsDisabled:
    print("No captions available for this video.")
except Exception as e:
    print(f"Error: {e}")

**Step 1b - Indexing (Text Splitting)**

In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [None]:
len(chunks)

68

In [None]:
chunks[10]

Document(metadata={}, page_content="the grounding gets in by people interacting with the\nsystem and saying that's a rubbish answer,\nthat's a good answer. DEMIS HASSABIS: Yes. So for sure, part\nof that, if the question that they're\ngetting wrong, the early versions of this,\nwas due to grounding missing-- actually, the real world\ndogs bark in this way or whatever it is-- and it's\nanswering it incorrectly, then that feedback\nwill correct it. And part of that feedback is\nfrom our own grounded knowledge. So some grounding is seeping\nin like that for sure. HANNAH FRY: I remember\nseeing a really nice example about crossing the English\nChannel versus walking across the English Channel. DEMIS HASSABIS: Exactly,\nthose kinds of things. And if it answered wrong,\nyou would tell it it's wrong. And then it would have\nto slightly figure out that you can't walk\nacross the Channel. HANNAH FRY: So some\nof these properties that have emerged that\nweren't necessarily expected to be, I want

**Step 1c & 1d - Indexing (Embedding Generation and Storing in Vector Store)**

In [None]:
embeddings = GoogleGenerativeAIEmbeddings(model="text-embedding-004")
vector_store = FAISS.from_documents(chunks, embeddings)

**Step 2 - Retrieval**

In [None]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [None]:
retriever

VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7e993c799a00>, search_kwargs={'k': 4})

Step 3 - Augmentation

In [None]:
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0.2)

In [None]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [None]:
question= "is the topic of nuclear fusion discussed in this video? if yes then what was discussed"
retrieved_docs= retriever.invoke(question)

In [None]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"cases to worry about. There's bad uses by bad\nindividuals or nations, so human misuse, and then\nthere's the AI itself as it gets closer to\nAGI going off the rails. And I think you need different\nsolutions for those two problems. And so, yeah, that's\nwhat we're going to have to contend\nwith as we get closer to building these technologies. And also, just going back to\nyour benefiting everyone point, of course, we're showing\nthe way with things like AlphaFold and isomorphic. I think we could cure most\ndiseases within the next decade or two if AI drug design works. And then they could be\npersonalized medicines where it minimizes the side\neffects on the individual because it's mapped\nto the person's individual illness, and\ntheir individual metabolism, and so on. So these are amazing things-- clean energy, renewable\nenergy sources, fusion, or better solar power,\nall of these types of things. I think they're\nall within reach. And then that would\nsort out water access because

In [None]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

Step 4 - Generation

In [None]:
answer = llm.invoke(final_prompt)
print(answer.content)

Yes, nuclear fusion is mentioned as one of the technologies that could be within reach with the help of AI, potentially sorting out water access through desalination.


Building a Chain

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [None]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [None]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [None]:
parser = StrOutputParser()

In [None]:
main_chain = parallel_chain | prompt | llm | parser

In [None]:
main_chain.invoke('Can you summarize the video')

'The video discusses the need for clarifying the expectations and limitations of AI systems, as well as the importance of user education. It also touches on the surprising emergence of chatbots and their inherent flaws due to their stochastic and probabilistic nature. The video also talks about the importance of technical due diligence, understanding the background of people in AI, and the opportunistic environment created by sudden attention and money in the field. It also mentions building better world models and the vision of a universal assistant with multi-modality.'