<a href="https://colab.research.google.com/github/rishike/langchain/blob/master/youtube_chatbot_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [20]:
import os
from getpass import getpass
os.environ["GOOGLE_API_KEY"] =  getpass("Provide your Google API key here")

Provide your Google API key here··········


In [21]:
!pip install -q youtube-transcript-api langchain-community google-genai langchain-google-genai faiss-cpu tiktoken python-dotenv

In [22]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import PromptTemplate

Indexing (Document Ingestion)

In [23]:
video_id = "LPZh9BOjkQs"

try:

  transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])

  transcript = " ".join([t["text"] for t in transcript_list])
  print(transcript)

except TranscriptsDisabled:
  print("No Captions available for this video")
except Exception as e:
  print(f"Error fetching details: {e}")

Imagine you happen across a short movie script that describes a scene between a person and their AI assistant. The script has what the person asks the AI, but the AI's response has been torn off. Suppose you also have this powerful magical machine that can take any text and provide a sensible prediction of what word comes next. You could then finish the script by feeding in what you have to the machine, seeing what it would predict to start the AI's answer, and then repeating this over and over with a growing script completing the dialogue. When you interact with a chatbot, this is exactly what's happening. A large language model is a sophisticated mathematical function that predicts what word comes next for any piece of text. Instead of predicting one word with certainty, though, what it does is assign a probability to all possible next words. To build a chatbot, you lay out some text that describes an interaction between a user and a hypothetical AI assistant, add on whatever the use

Indexing - Text Splitting

In [24]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [25]:
len(chunks)

10

In [26]:
chunks[9]

Document(metadata={}, page_content='the other steps in a transformer. Also, on my second channel I just posted a talk I gave a couple months ago about this topic for the company TNG in Munich. Sometimes I actually prefer the content I make as a casual talk rather than a produced video, but I leave it up to you which one of these feels like the better follow-on.')

In [27]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = FAISS.from_documents(chunks, embeddings)

In [28]:
vector_store.index_to_docstore_id

{0: '30dda1e6-a6cd-46b0-8c37-c34e165c4240',
 1: 'ebff7a51-f82e-4fc9-ac5a-a2734af9d98d',
 2: '956d77e8-f71a-4031-b3ee-cb22f935f3f6',
 3: '9525e501-68d2-4a3c-a279-11af5214cecc',
 4: '86f8b9ba-da3a-4d8c-a198-892ba1dac292',
 5: '2b14cdb1-b533-4ac5-a9d5-6372b929b406',
 6: '4d634b21-296b-4f42-bd31-c7819d1b64ac',
 7: '3729a617-c802-4548-9cd8-b125560b24ab',
 8: '1a4a97f0-9eb1-4c48-afa8-581ec4bbc4de',
 9: '999dd3c1-6ac4-4699-a31c-2ff6c9274f2d'}

In [29]:
vector_store.get_by_ids(['5e6c5472-2971-433f-b593-d383370fb80a'])

[]

Retrieval

In [17]:
retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={"k" : 4})

In [18]:
retriever

VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7a34b6e7ef10>, search_kwargs={'k': 4})

In [30]:
retriever.invoke("What is deepmind")

[Document(id='fa861775-0347-477d-9fda-7fcd304e7ece', metadata={}, page_content="on many example pieces of text. One of these training examples could be just a handful of words, or it could be thousands, but in either case, the way this works is to pass in all but the last word from that example into the model and compare the prediction that it makes with the true last word from the example. An algorithm called backpropagation is used to tweak all of the parameters in such a way that it makes the model a little more likely to choose the true last word and a little less likely to choose all the others. When you do this for many, many trillions of examples, not only does the model start to give more accurate predictions on the training data, but it also starts to make more reasonable predictions on text that it's never seen before. Given the huge number of parameters and the enormous amount of training data, the scale of computation involved in training a large language model is mind-bogg

Augmentation

In [31]:
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.1)

In [32]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer Only from the provided transcript context.
      If context is insufficient, just say you don't know.

      Context: {context}
      Question: {question}
    """,
    input_variables=['context', 'question']
)

In [39]:
question          = "is the topic of neural network in this video? if yes then what was discussed"
retrieved_docs    = retriever.invoke(question)

In [40]:
len(retrieved_docs)

4

In [41]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"next word. Although researchers design the framework for how each of these steps work, it's important to understand that the specific behavior is an emergent phenomenon based on how those hundreds of billions of parameters are tuned during training. This makes it incredibly challenging to determine why the model makes the exact predictions that it does. What you can see is that when you use large language model predictions to autocomplete a prompt, the words that it generates are uncannily fluent, fascinating, and even useful. If you're a new viewer and you're curious about more details on how transformers and attention work, boy do I have some material for you. One option is to jump into a series I made about deep learning, where we visualize and motivate the details of attention and all the other steps in a transformer. Also, on my second channel I just posted a talk I gave a couple months ago about this topic for the company TNG in Munich. Sometimes I actually prefer the content I\

In [43]:
final_prompt = prompt.invoke({"context": context_text, "question": question})
final_prompt

StringPromptValue(text="\n      You are a helpful assistant.\n      Answer Only from the provided transcript context.\n      If context is insufficient, just say you don't know.\n\n      Context: next word. Although researchers design the framework for how each of these steps work, it's important to understand that the specific behavior is an emergent phenomenon based on how those hundreds of billions of parameters are tuned during training. This makes it incredibly challenging to determine why the model makes the exact predictions that it does. What you can see is that when you use large language model predictions to autocomplete a prompt, the words that it generates are uncannily fluent, fascinating, and even useful. If you're a new viewer and you're curious about more details on how transformers and attention work, boy do I have some material for you. One option is to jump into a series I made about deep learning, where we visualize and motivate the details of attention and all the 

Generation

In [44]:
answer = llm.invoke(final_prompt)
answer.content

'Yes.  The video discusses feed-forward neural networks as a component of transformers, explaining their role in giving the model extra capacity to store patterns about language learned during training.  The discussion also covers how these networks, along with the attention operation, process data to predict the next word in a sequence.'

Building a chain

In [45]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [46]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [51]:
parallel_chain = RunnableParallel({
    'context' : retriever | RunnableLambda(format_docs),
    'question' : RunnablePassthrough()
})

In [52]:
parallel_chain.invoke("what is Large Language Models")

{'context': "a standard human to read the amount of text that was used to train GPT-3, for example, if they read non-stop 24-7, it would take over 2600 years. Larger models since then train on much, much more. You can think of training a little bit like tuning the dials on a big machine. The way that a language model behaves is entirely determined by these many different continuous values, usually called parameters or weights. Changing those parameters will change the probabilities that the model gives for the next word on a given input. What puts the large in large language model is how they can have hundreds of billions of these parameters. No human ever deliberately sets those parameters. Instead, they begin at random, meaning the model just outputs gibberish, but they're repeatedly refined based on many example pieces of text. One of these training examples could be just a handful of words, or it could be thousands, but in either case, the way this works is to pass in all but the l

In [53]:
parser = StrOutputParser()

In [54]:
main_chain = parallel_chain | prompt | llm | parser

In [59]:
parallel_chain

{
  context: VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7a34b6e7ef10>, search_kwargs={'k': 4})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}

In [56]:
prompt

PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="\n      You are a helpful assistant.\n      Answer Only from the provided transcript context.\n      If context is insufficient, just say you don't know.\n\n      Context: {context}\n      Question: {question}\n    ")

In [55]:
main_chain.invoke("can you summarize the video")

'The video discusses large language models and transformers.  It explains that while researchers design the framework, the specific behavior is emergent from the training of hundreds of billions of parameters.  The video offers two options for viewers wanting more detail: a deep learning series visualizing attention and other transformer steps, and a talk given at TNG in Munich (posted on a second channel).  The video also describes how chatbots work by using large language models to predict the next word in a sequence, assigning probabilities to all possibilities.  The process involves feed-forward neural networks and iterative operations to enrich the data and produce a prediction.'