# Chatbot with Website/YouTube Video
This guide will walk you through creating a Question-Answering system for Website/YT Video documents using Retrieval-Augmented Generation (RAG) with Langchain and Pinecone.

### Installing Dependencies

In [1]:
%pip install -qU langchain-community langchain langchain-openai requests chromadb beautifulsoup4 langchain-google-genai

### Storing API keys

- Get OpenAI API key: https://platform.openai.com/account/api-keys

In [2]:
import os

os.environ["OPENAI_API_KEY"] = ""
os.environ["GOOGLE_API_KEY"] = ""

## Chat with Website Using ChromaDB


### Import Required Libraries

In [3]:
from langchain_community.vectorstores import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

USER_AGENT environment variable not set, consider setting it to identify your requests.


### Load Website Content

In [4]:
def load_website(url):
    loader = WebBaseLoader(url)
    data = loader.load()
    return data

# Example usage
url = "https://www.buildfastwithai.com/"  # Replace with your target website
website_data = load_website(url)

In [5]:
website_data

[Document(metadata={'source': 'https://www.buildfastwithai.com/', 'title': 'Build Fast with AI', 'description': 'Build Fast with AI - a vibrant community of AI builders, innovators, and enthusiasts. Whether you are an entrepreneur, a product manager, a developer, or anyone intrigued by AI, this is your platform to learn, grow, and innovate.', 'language': 'en'}, page_content="Build Fast with AIAsk toBuildFast BotHey! Wanna know about Generative AI Crash Course?What will I learn?How can I join?What's the course duration?What's the course fee?What's the course syllabus?Sendsatvik@buildfastwithai.comKoramangala, Bengaluru, 560034SupportConsultingGenAI CourseApp ShowcaseCompanyResourcesEventsLegalPrivacyTermsRefundOur ProductsEduchainAI-powered education platform for teachersApp ShowcaseThe Indian version of CharacterAI but even more varieties.LinkedInInstagramTwitterGitHub¬© 2025 Intellify Edventures Private Limited All rights reserved.GenAI 2025 Launch PadTransform AI Ideas into RealityJo

### Split Content into Chunks

In [8]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(website_data)

In [9]:
len(splits)

17

### Initialize Embeddings and Chroma Vector Store

##### Using OpenAI embeddings

In [None]:
openai_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create a Chroma vector store
vectorstore = Chroma.from_documents(splits, openai_embeddings)

##### Using GeminiAI embeddings

In [10]:
gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Create a Chroma vector store
vectorstore = Chroma.from_documents(splits, gemini_embeddings)

### Set Up Conversational Retrieval Chain

##### Using OpenAI model

In [11]:
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)

qa = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

)

  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


##### Using GeminiAI Models

In [16]:
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.7)

qa = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

)
qa

ConversationalRetrievalChain(memory=ConversationBufferMemory(chat_memory=InMemoryChatMessageHistory(messages=[]), return_messages=True, memory_key='chat_history'), verbose=False, combine_docs_chain=StuffDocumentsChain(verbose=False, llm_chain=LLMChain(verbose=False, prompt=ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="Use the following pieces of context to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n{context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='{question}'), additional_kwargs={})]), llm=ChatGoogleGenerativeAI(model='models/gemini-1.5-flash', google_api_key=SecretStr('**********'), client=<g

In [19]:
result = qa("Hey can you tello the main theme of this website?")
result

{'question': 'Hey can you tello the main theme of this website?',
 'chat_history': [HumanMessage(content='Hey can you tello the main theme of this website?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The main theme of the website is providing education and consulting services related to Generative AI (GenAI).  They offer courses, bootcamps, and other resources to help professionals and businesses learn and implement GenAI.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Hey can you tello the main theme of this website?', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Based on the provided text, the main theme appears to be using AI to enhance career prospects, specifically focusing on resume building, interview preparation, and leveraging AI tools for professional development.  There's also mention of an unrelated workshop on using AI with Excel, but the career focus seems to be the primary theme.", additional_kwargs={},

### Chat Function

In [54]:
def chat_with_website(query):
    result = qa({"question": query})
    return result['answer']

# Example usage
query = "What is the main topic of this website?"
response = chat_with_website(query)
print(f"Human: {query}")
print(f"AI: {response}")

Human: What is the main topic of this website?
AI: Based on the provided text, there are several topics mentioned, making it difficult to identify one main topic.  The text includes a podcast about expressing feelings (sadness, anger, nervousness), AI-enhanced resume crafting, building a personal AI career assistant, and mastering ATS-friendly resumes.


In [53]:
# Example usage
query = "Tell me about the Generative AI Bootcamp"
response = chat_with_website(query)
print(f"Human: {query}")
print(f"AI: {response}")

Human: Tell me about the Generative AI Bootcamp
AI: I am sorry, but I don't have enough information to answer your question.  While the provided text mentions a "GenAI Bootcamp," it does not describe its content or structure.


## Chat with YouTube Video Using ChromaDB

### Install the required dependencies:

In [29]:
%pip install -qU langchain-community langchain langchain-openai requests chromadb youtube_transcript_api pytube

Note: you may need to restart the kernel to use updated packages.


### Import Required Libraries

In [30]:
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_community.document_loaders import YoutubeLoader, WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

### Load Video Transcript

In [31]:
def load_video_transcript(video_url):
    loader = YoutubeLoader.from_youtube_url(
        video_url,
    )
    data = loader.load()
    return data

# Example usage
video_url = "https://youtu.be/Bx7I06w6vVA?si=BsiZuoo2_NH7vZzM"  # Replace with your target video
video_data = load_video_transcript(video_url)

In [32]:
video_data

[Document(metadata={'source': 'Bx7I06w6vVA'}, page_content="just yesterday Chachi PT launched a brand new pretty incredible feature called canvas this feature feels akin to cla's artifacts and it could do a whole bunch of stuff so that's what we're going to go over today the way to think about canvas is a new way to collaborate with chat GPT you type a prompt and instead of it just being in one single chat it opens up this separate window where you can ask chat gbt to edit certain parts of text or code you can ask it to update you can ask it to add comes a whole bunch of cool stuff so I'm briefly going to tell you about it then we're just going to play around with it so everybody already has access to chat GPT with canvas and what I really like about it is the fact that they didn't say it's coming soon they just released it the day that they announced it so if you click in the drop down here it is GPT 40 with canvas beta and they said for now everybody with a pro or Enterprise account 

In [43]:
# Example usage
video_url = "https://youtu.be/4s7rlRkwC0U?si=uFQJvh0OxedykGjL"  # Replace with your target video
video_data = load_video_transcript(video_url)
video_data

[Document(metadata={'source': '4s7rlRkwC0U'}, page_content="welcome to my channel and this is a podcast speak [Music] English hello English Learners welcome back to another lesson here at English pod my name is Marco and my name is Katherine hello everyone today Marco what are we talking about well today we are um not really in the best mood what you mean we're kind of sad we're kind of um nervous we're we're not feeling very well yeah it's a it's a cloudy day maybe that's why at the blues the blues so yeah that's what we're talking about today we're going to have a lot of different descriptive words to express of maybe feeling sad or maybe you're angry or you're nervous or something like that okay well this is a very useful lesson hopefully you're not feeling this way but if you have to talk about someone who is got lots of words for you but first we're going to preview a couple in today's vocabulary preview vocabulary preview all right so today on vocabulary preview we're going to lo

### Split Content into Chunks

In [45]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
splits = text_splitter.split_documents(video_data)
splits

[Document(metadata={'source': '4s7rlRkwC0U'}, page_content="welcome to my channel and this is a podcast speak [Music] English hello English Learners welcome back to another lesson here at English pod my name is Marco and my name is Katherine hello everyone today Marco what are we talking about well today we are um not really in the best mood what you mean we're kind of sad we're kind of um nervous we're we're not feeling very well yeah it's a it's a cloudy day maybe that's why at the blues the blues so yeah that's what we're talking about today we're going to have a lot of different descriptive words to express of maybe feeling sad or maybe you're angry or you're nervous or something like that okay well this is a very useful lesson hopefully you're not feeling this way but if you have to talk about someone who is got lots of words for you but first we're going to preview a couple in today's vocabulary preview vocabulary preview all right so today on vocabulary preview we're going to lo

### Initialize Embeddings and Chroma Vector Store

##### Using OpenAI Embeddings

In [None]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create a Chroma vector store
vectorstore = Chroma.from_documents(splits, embeddings)

##### Using GeminiAI Embeddings

In [46]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Create a Chroma vector store
vectorstore = Chroma.from_documents(splits, embeddings)

### Set Up Conversational Retrieval Chain

##### Using OpenAI

In [None]:
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)

qa = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
  )

##### Using GeminiAI

In [47]:
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.7)

qa = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
  )
qa

ConversationalRetrievalChain(memory=ConversationBufferMemory(chat_memory=InMemoryChatMessageHistory(messages=[]), return_messages=True, memory_key='chat_history'), verbose=False, combine_docs_chain=StuffDocumentsChain(verbose=False, llm_chain=LLMChain(verbose=False, prompt=ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="Use the following pieces of context to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n{context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='{question}'), additional_kwargs={})]), llm=ChatGoogleGenerativeAI(model='models/gemini-1.5-flash', google_api_key=SecretStr('**********'), client=<g

### Chat Function

In [48]:
def chat_with_video(query):
    result = qa.invoke({"question": query})
    return result['answer']

# Example usage
query = "What is the main topic of this video?"
response = chat_with_video(query)
print(f"Human: {query}")
print(f"AI: {response}")

Human: What is the main topic of this video?
AI: The main topic of the video is vocabulary to describe feelings, particularly negative ones like sadness, anger, and nervousness.  The video also includes a segment about using AI to edit text.


### Create an Interactive Chat Interface

In [49]:
from IPython.display import display, HTML
from ipywidgets import widgets

chat_history = []

def on_send_button_clicked(b):
    query = input_box.value
    input_box.value = ""

    response = chat_with_video(query)

    chat_history.append(f"Human: {query}")
    chat_history.append(f"AI: {response}")

    output.clear_output()
    with output:
        print("\n".join(chat_history))

input_box = widgets.Text(description="You:")
send_button = widgets.Button(description="Send")
output = widgets.Output()

send_button.on_click(on_send_button_clicked)

display(HTML("<h3>Chat with Video</h3>"))
display(widgets.VBox([input_box, send_button, output]))

VBox(children=(Text(value='', description='You:'), Button(description='Send', style=ButtonStyle()), Output()))

## Classwork 1

### 1. Try different open source models using Together




## Classwork 2

### 1. Create a bot for a famous personality (Bill Gates, Mahatma Gandhi, etc) - add system instructions + image
### 2. Create a bot for a use-case/scenario (Interview prep, a chatbot for a specific service,  etc )

### 1. Create a QA engine on CSV/Audio/Video/PDF
### 2. Experiment with different chunks, models, vector dbs.