# Pre-requisites
- WSL
- Miniconda3 

# Setup environment
- Create conda env `conda create langchain python=3.11`
- Set the "langchain" env that has been just created as the running env in VS code


Install langchain and openai package

In [None]:
! pip install langchain openai

# Init variables

You need to set value of `OPENAI_API_KEY` that you get from the training team in the .env file

In [1]:
import openai, os
from dotenv import load_dotenv

load_dotenv()
openai.api_type = "azure"
openai.api_version = "2023-07-01-preview"

# Overviews
The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)

- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-ada-002 to create vectors, feel free to use any other open source embedding model if it works.

In [None]:
# Please put your code here to do Document Indexing
from langchain.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

loader = PyPDFLoader("data/BonBon FAQ.pdf")
documents = loader.load()

embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
text_splitter = RecursiveCharacterTextSplitter(
    separators=[".\n\n", ".\r\n\r\n", "\n\n", "\r\n\r\n", ".\n", ".\r\n"],
    chunk_size= 240,
    chunk_overlap=90,
    length_function=len,
    add_start_index=True,
)
chunks = text_splitter.split_documents(documents)

db_path = "./db"
db_collection = "text_collection"
chroma_db = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    persist_directory=db_path,
    collection_name=db_collection
)
chroma_db.persist()

print("✅ Document indexing complete!")


  embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


✅ Done!


  chroma_db.persist()


## Assignment 2: Building Chatbot (mandatory)
- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-3.5 LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

In [None]:
# Please put your code here
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from langchain.chains import RetrievalQA
from langchain.tools import Tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import initialize_agent, AgentType
from langchain.memory import ConversationBufferMemory
import dotenv

dotenv.load_dotenv()
HUGGINGFACEHUB_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")

llm = HuggingFaceEndpoint(
    repo_id="deepseek-ai/DeepSeek-R1-0528",
    task="text-generation",
    max_new_tokens=512,
    do_sample=False,
    repetition_penalty=1.03,
    provider="auto",
    huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN
)
chat_model = ChatHuggingFace(llm=llm)

retriever = chroma_db.as_retriever(search_kwargs={"k": 3})
faq_chain = RetrievalQA.from_chain_type(llm=chat_model, retriever=retriever, return_source_documents=True)

def format_faq_response(query):
    output = faq_chain({"query": query})
    answer = output["result"]
    sources = output.get("source_documents", [])
    pages = set()
    for doc in sources:
        if "page" in doc.metadata:
            pages.add(doc.metadata["page"] + 1)
    citation = f" (source: BonBon FAQ.pdf page {', '.join(map(str, sorted(pages)))})" if pages else ""
    return answer + citation
    
kb_tool = Tool(
    name="Knowledge Base Search",
    func=lambda q: format_faq_response(q),
    description="Use this to answer questions about BonBon FAQ topics like internet, printer, or malware."
)

internet_tool = DuckDuckGoSearchRun()

agent = initialize_agent(
    tools=[kb_tool, internet_tool],
    llm=chat_model,
    agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
    memory=ConversationBufferMemory(memory_key="chat_history", return_messages=True),
    verbose=False
)

print("\n🤖 Chatbot ready! Ask a question (type 'exit' to quit)", flush=True)
while True:
    query = input("You: ")
    if query.lower() == "exit":
        print("👋 Goodbye!")
        break
    try:
        print("🤖 Bot is thinking...", end="", flush=True)
        response = agent.invoke(query)
        print("\rBot:", response["output"])
    except Exception as e:
        print("⚠️ Error:", e)


🤖 Chatbot ready! Ask a question (type 'exit' to quit)
🤖 Bot is thinking...⚠️ Error: 402 Client Error: Payment Required for url: https://router.huggingface.co/fireworks-ai/inference/v1/chat/completions (Request ID: Root=1-6877ce98-2d69ab71297282c16161497e;d5549b66-5690-417d-b24d-6e85b51bb396)

You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly included credits.
👋 Goodbye!


## Assignment 3: Build a new assistant based on BonBon source code (optional)
The objective
- Run the code and index the sample BonBon FAQ.pdf file to Azure Cognitive Search
- Explore the code and implement a new assistant that has the same behavior as above
- Explore other features such as RBACs, features on admin portal

Please contact the training team in case you need to get the source code of BonBon.