### Step 1 - Indexing (Document Ingestion)

In [4]:
video_id = "KNAWp2S3w94" # only the ID, not full URL
try:
    # If you don’t care which language, this returns the “best” one
    transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])

    # Flatten it to plain text
    transcript = " ".join(chunk["text"] for chunk in transcript_list)
    print(transcript)

except TranscriptsDisabled:
    print("No captions available for this video.")

♪ (music) ♪ You've probably heard a lot
about AI machine learning over the last few months. And maybe you've been inspired
by videos showing what's possible with AI machine learning. But what is it really? Once you go beyond the hype
and get down to writing code, what does AI really look like? Well, that's what we're going
to go through in this video series, where we'll teach you what it's like
to write code for machine learning, and how it provides different,
new, and exciting scenarios that will help you write applications that behave more like a human being,
giving you artificial intelligence. I'm Laurence,
and I'm going to be your guide. You don't need
to know a lot to get started, and we'll be using the Python language. Don't worry if you've never used it,
it's super simple to understand, and you'll be up and running in no time. So let's start with a very simple example. Consider you're creating a game
of Rock, Paper, Scissors. When you play this
with a human, it's very basic; eve

#### Show the first 5 chunks of the transcript

In [5]:
transcript_list[:5]  # Show the first 5 chunks of the transcript

[{'text': '♪ (music) ♪', 'start': 0.15, 'duration': 2.222},
 {'text': "You've probably heard a lot\nabout AI machine learning",
  'start': 4.066,
  'duration': 2.53},
 {'text': 'over the last few months.', 'start': 6.596, 'duration': 1.173},
 {'text': "And maybe you've been inspired\nby videos showing what's possible",
  'start': 7.769,
  'duration': 2.94},
 {'text': 'with AI machine learning.', 'start': 10.709, 'duration': 1.8}]

### Step - 2 Indexing (Text Splitting)

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=20)
chunks = splitter.create_documents([transcript])

In [9]:
len(chunks)  # Number of chunks created

27

### Step 3 - Embedding Generation

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(google_api_key='use your own api key',model="models/gemini-embedding-exp-03-07")
vector = embeddings.embed_query("What are embeddings?")
print(vector[:5])

[-0.01861327327787876, 0.01201360858976841, -0.00407150574028492, -0.09517369419336319, -0.0025419637095183134]


### Step 4 - Storing in Vector

#### Example if you are using qdrant 

In [14]:
from langchain_qdrant import QdrantVectorStore

In [None]:
qdrant_url = ""
qdrant_key = ""
collection_name = "YT chatbot"

In [None]:
from qdrant_client import QdrantClient

client = QdrantClient(
    url=qdrant_url,
    api_key=qdrant_key,
    timeout=60.0  # increase to 60 seconds or more
)

In [None]:
# Initialize QdrantVectorStore with documents and embedding model
qdrant = QdrantVectorStore.from_documents(
    chunks,                # List of Document objects to be stored in the vector store
    embeddings,             # Embedding model used to convert documents into vectors        # URL for the Qdrant service
    client=client,      # API key for accessing the Qdrant service
    collection_name=collection_name  # Name of the collection to store the vectors in
)

#### Creating chroma db vector store

In [18]:
from langchain.vectorstores import Chroma
db = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory='./chromadb'
)


### Step 5 - Retrieval

In [19]:
retriver=db.as_retriever(search_type="similarity", search_kwargs={"k": 3})
retriver.invoke("What Machine learning?")

[Document(metadata={}, page_content="♪ (music) ♪ You've probably heard a lot\nabout AI machine learning over the last few months. And maybe you've been inspired\nby videos showing what's possible with AI machine learning. But what is it really? Once you go beyond the hype"),
 Document(metadata={}, page_content="it answers with the data and have the computer\nfigure out what the rules are. That's machine learning. So now, I can have\nlots of pictures of rocks and tell a computer"),
 Document(metadata={}, page_content="and get down to writing code, what does AI really look like? Well, that's what we're going\nto go through in this video series, where we'll teach you what it's like\nto write code for machine learning, and how it provides different,")]

### Step 6 - Augmentation

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate


#### Loading model

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    google_api_key='use your own google api key',
    model="gemini-2.0-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

#### Checking if model working properly

In [None]:
messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg

#### Complete RAG pipeline

In [None]:

prompt ='''
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    ''',

prompt=ChatPromptTemplate.from_template(prompt)
document_chain=create_stuff_documents_chain(llm,prompt)
retriever = db.as_retriever(search_type="similarity",
                                        search_kwargs={"k": 3})
from langchain.chains import create_retrieval_chain
retrieval_chain=create_retrieval_chain(retriever,document_chain)


#### Example of checking the Youtube chatbot 

In [37]:
response=retrieval_chain.invoke({"input":"what is Machine learning?"})
response['answer']

'Machine learning is when you feed the computer data and have the computer figure out what the rules are.'

In [38]:
response=retrieval_chain.invoke({"input":"can you summarize this video?"})
response['answer']

'This video is about AI and machine learning. It will teach you what it is and how to apply it to computer vision, teaching a computer to see things.'

#### **Project by : Mudassar Khan**