<a href="https://colab.research.google.com/github/shivamsinghtomar78/LangChain/blob/main/Youtube_ChatBot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [55]:
import os
os.environ["GOOGLE_API_KEY"] =" "

**Libraries**

In [2]:
!pip install -q youtube-transcript-api langchain-community langchain-openai \
               faiss-cpu tiktoken python-dotenv

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/74.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.5/74.5 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [30]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate

**Step 1a - Indexing (Document Ingestion)**

In [25]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled

video_id = "pZybROKrj2Q"
try:
    api = YouTubeTranscriptApi()
    transcript_list = api.list(video_id)
    transcript_snippet = transcript_list.find_transcript(['en'])
    transcript_data = transcript_snippet.fetch()

    # Extract text from the FetchedTranscript object
    transcript = " ".join(snippet.text for snippet in transcript_data.snippets)


except TranscriptsDisabled:
    print("No captions available for this video.")
except Exception as e:
    print(f"Error: {e}")

In [26]:
transcript

'[MUSIC PLAYING] HANNAH FRY: Welcome to "Google\nDeepMind, the Podcast" with me, your host, Professor Hannah Fry. Now, when we first\nstarted thinking about making this\npodcast way back in 2017, DeepMind was this relatively\nsmall, focused AI research lab. They\'d just been\nbought by Google and given the freedom to do\ntheir own quirky research projects from the safe\ndistance of London. How things have changed. Because since the\nlast season, Google has reconfigured its\nentire structure, putting AI and the\nteam at DeepMind at the core of its strategy. Google DeepMind has\ncontinued its quest to endow AI with\nhuman-level intelligence, known as artificial general\nintelligence, or AGI. It has introduced a family of\npowerful new AI models called Gemini, as well as\nan AI agent called Project Astra that can process\naudio, video, image, and code. The lab is also\nmaking huge leaps in applying AI to a host\nof scientific domains, including a brand new\nthird version of AlphaFold, whi

**Step 1b - Indexing (Text Splitting)**

In [27]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [28]:
len(chunks)

68

In [29]:
chunks[10]

Document(metadata={}, page_content="the grounding gets in by people interacting with the\nsystem and saying that's a rubbish answer,\nthat's a good answer. DEMIS HASSABIS: Yes. So for sure, part\nof that, if the question that they're\ngetting wrong, the early versions of this,\nwas due to grounding missing-- actually, the real world\ndogs bark in this way or whatever it is-- and it's\nanswering it incorrectly, then that feedback\nwill correct it. And part of that feedback is\nfrom our own grounded knowledge. So some grounding is seeping\nin like that for sure. HANNAH FRY: I remember\nseeing a really nice example about crossing the English\nChannel versus walking across the English Channel. DEMIS HASSABIS: Exactly,\nthose kinds of things. And if it answered wrong,\nyou would tell it it's wrong. And then it would have\nto slightly figure out that you can't walk\nacross the Channel. HANNAH FRY: So some\nof these properties that have emerged that\nweren't necessarily expected to be, I want

**Step 1c & 1d - Indexing (Embedding Generation and Storing in Vector Store)**

In [33]:
embeddings = GoogleGenerativeAIEmbeddings(model="text-embedding-004")
vector_store = FAISS.from_documents(chunks, embeddings)

In [34]:
vector_store.index_to_docstore_id

{0: '43ecd37e-912f-4590-b2eb-8b623fe05a8d',
 1: '1206fb8d-a606-4f90-bf8a-e38e6d474e9e',
 2: '6a4734bb-c277-4209-82f5-5d17a44e4776',
 3: '3c6a6e71-1547-4c82-b94c-09a7d59babea',
 4: '55055b13-e7de-467b-ab0c-805c374d553b',
 5: 'ceb2973d-98bc-4dcb-b581-b271c0656060',
 6: '68d36662-ff62-4836-ab09-45c38520a790',
 7: '7d627bd3-6f97-4486-87ae-38649dfc62e6',
 8: 'fd1528be-1390-413b-ae82-a2936fd5e1d3',
 9: '90d50b97-f4e5-4f08-9fba-e89d41342023',
 10: '37b2490f-2608-4b42-a3d6-df6742e2cebb',
 11: '9c0b6ca9-5e12-40e2-9e0e-38974a17a770',
 12: '37b80dc8-74e8-4c80-82bd-f4a8a3666fd4',
 13: '792bde70-dd71-4e03-9560-5cb8619c016c',
 14: 'dd069526-b918-4def-8f4a-a9b3a2c72775',
 15: '669878a9-b313-46c1-8184-c0fb45617995',
 16: 'bf924fab-ff99-4464-a89c-30c82d28a59f',
 17: 'c319e9db-d7d2-4123-a9e3-ae34bc8c8e17',
 18: '6adcb616-f5e3-4688-a833-3076b17ca9ff',
 19: '1aa8b905-2660-47c9-ba05-f7f0bacd086b',
 20: 'bf5d7a8e-8e6a-4212-b54b-324ced078245',
 21: '4de89ce7-7031-47cd-8085-9d6df54c2c38',
 22: 'e4e2acdc-c860-

In [36]:
vector_store.get_by_ids(['88486cff-91e2-48f7-811c-0565d1d11554'])

[Document(id='88486cff-91e2-48f7-811c-0565d1d11554', metadata={}, page_content='are going to be harder? And then as for\nthe big predictions that Demis made, like cures for\nmost diseases in 10 or 20 years, or AGI by the end of\nthe decade, or how we\'re about to enter\ninto an era of abundance, I mean, they all sound like\nDemis is being a bit overly optimistic, doesn\'t it? But then again, he hasn\'t\nexactly been wrong so far. You\'ve been listening to "Google\nDeepMind, the Podcast" with me, Professor Hannah Fry. If you have enjoyed this\nepisode, hey, why not subscribe? We\'ve got plenty more\nfascinating conversations with the people at\nthe cutting edge of AI coming up on topics ranging\nfrom how AI is accelerating the pace of\nscientific discoveries to addressing some\nof the biggest risks of this technology. If you have any feedback, or you\nwant to suggest a future guest, then do leave us a\ncomment on YouTube. Until next time. [MUSIC PLAYING]')]

**Step 2 - Retrieval**

In [37]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [38]:
retriever

VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7e993c799a00>, search_kwargs={'k': 4})

In [39]:
retriever.invoke('What is deepmind')

[Document(id='43ecd37e-912f-4590-b2eb-8b623fe05a8d', metadata={}, page_content='[MUSIC PLAYING] HANNAH FRY: Welcome to "Google\nDeepMind, the Podcast" with me, your host, Professor Hannah Fry. Now, when we first\nstarted thinking about making this\npodcast way back in 2017, DeepMind was this relatively\nsmall, focused AI research lab. They\'d just been\nbought by Google and given the freedom to do\ntheir own quirky research projects from the safe\ndistance of London. How things have changed. Because since the\nlast season, Google has reconfigured its\nentire structure, putting AI and the\nteam at DeepMind at the core of its strategy. Google DeepMind has\ncontinued its quest to endow AI with\nhuman-level intelligence, known as artificial general\nintelligence, or AGI. It has introduced a family of\npowerful new AI models called Gemini, as well as\nan AI agent called Project Astra that can process\naudio, video, image, and code. The lab is also\nmaking huge leaps in applying AI to a host

Step 3 - Augmentation

In [40]:
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0.2)

In [41]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [42]:
question= "is the topic of nuclear fusion discussed in this video? if yes then what was discussed"
retrieved_docs= retriever.invoke(question)

In [43]:
retrieved_docs

[Document(id='1c12b0f0-db37-4609-8248-ee7c3d3625d3', metadata={}, page_content="cases to worry about. There's bad uses by bad\nindividuals or nations, so human misuse, and then\nthere's the AI itself as it gets closer to\nAGI going off the rails. And I think you need different\nsolutions for those two problems. And so, yeah, that's\nwhat we're going to have to contend\nwith as we get closer to building these technologies. And also, just going back to\nyour benefiting everyone point, of course, we're showing\nthe way with things like AlphaFold and isomorphic. I think we could cure most\ndiseases within the next decade or two if AI drug design works. And then they could be\npersonalized medicines where it minimizes the side\neffects on the individual because it's mapped\nto the person's individual illness, and\ntheir individual metabolism, and so on. So these are amazing things-- clean energy, renewable\nenergy sources, fusion, or better solar power,\nall of these types of things. I thin

In [44]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"cases to worry about. There's bad uses by bad\nindividuals or nations, so human misuse, and then\nthere's the AI itself as it gets closer to\nAGI going off the rails. And I think you need different\nsolutions for those two problems. And so, yeah, that's\nwhat we're going to have to contend\nwith as we get closer to building these technologies. And also, just going back to\nyour benefiting everyone point, of course, we're showing\nthe way with things like AlphaFold and isomorphic. I think we could cure most\ndiseases within the next decade or two if AI drug design works. And then they could be\npersonalized medicines where it minimizes the side\neffects on the individual because it's mapped\nto the person's individual illness, and\ntheir individual metabolism, and so on. So these are amazing things-- clean energy, renewable\nenergy sources, fusion, or better solar power,\nall of these types of things. I think they're\nall within reach. And then that would\nsort out water access because

In [45]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

In [46]:
final_prompt

StringPromptValue(text="\n      You are a helpful assistant.\n      Answer ONLY from the provided transcript context.\n      If the context is insufficient, just say you don't know.\n\n      cases to worry about. There's bad uses by bad\nindividuals or nations, so human misuse, and then\nthere's the AI itself as it gets closer to\nAGI going off the rails. And I think you need different\nsolutions for those two problems. And so, yeah, that's\nwhat we're going to have to contend\nwith as we get closer to building these technologies. And also, just going back to\nyour benefiting everyone point, of course, we're showing\nthe way with things like AlphaFold and isomorphic. I think we could cure most\ndiseases within the next decade or two if AI drug design works. And then they could be\npersonalized medicines where it minimizes the side\neffects on the individual because it's mapped\nto the person's individual illness, and\ntheir individual metabolism, and so on. So these are amazing things-

Step 4 - Generation

In [47]:
answer = llm.invoke(final_prompt)
print(answer.content)

Yes, nuclear fusion is mentioned as one of the technologies that could be within reach with the help of AI, potentially sorting out water access through desalination.


Building a Chain

In [48]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [49]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [50]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [51]:
parallel_chain.invoke('who is Demis')

{'context': 'are going to be harder? And then as for\nthe big predictions that Demis made, like cures for\nmost diseases in 10 or 20 years, or AGI by the end of\nthe decade, or how we\'re about to enter\ninto an era of abundance, I mean, they all sound like\nDemis is being a bit overly optimistic, doesn\'t it? But then again, he hasn\'t\nexactly been wrong so far. You\'ve been listening to "Google\nDeepMind, the Podcast" with me, Professor Hannah Fry. If you have enjoyed this\nepisode, hey, why not subscribe? We\'ve got plenty more\nfascinating conversations with the people at\nthe cutting edge of AI coming up on topics ranging\nfrom how AI is accelerating the pace of\nscientific discoveries to addressing some\nof the biggest risks of this technology. If you have any feedback, or you\nwant to suggest a future guest, then do leave us a\ncomment on YouTube. Until next time. [MUSIC PLAYING]\n\nbots, and Google had theirs. And one of the things was\nwe were looking at them, and we were loo

In [52]:
parser = StrOutputParser()

In [53]:
main_chain = parallel_chain | prompt | llm | parser

In [54]:
main_chain.invoke('Can you summarize the video')

'The video discusses the need for clarifying the expectations and limitations of AI systems, as well as the importance of user education. It also touches on the surprising emergence of chatbots and their inherent flaws due to their stochastic and probabilistic nature. The video also talks about the importance of technical due diligence, understanding the background of people in AI, and the opportunistic environment created by sudden attention and money in the field. It also mentions building better world models and the vision of a universal assistant with multi-modality.'