# Setup environment
- Create conda env `conda create langchain python=3.11`
- Set the "langchain" env that has been just created as the running env in VS code


Install  python-dotenv

In [3]:
! pip install python-dotenv


46.67s - pydevd: Sending message related to process being replaced timed-out after 5 seconds




Install langchain and openai package

In [4]:
! pip install langchain openai

56.44s - pydevd: Sending message related to process being replaced timed-out after 5 seconds




In [2]:
! pip install --upgrade pip langchain
! pip install langchain_community langchain_openai python-dotenv pypdf chromadb pysqlite3-binary



# Init variables

You need to set value of `OPENAI_API_KEY` that you get from the training team in the .env file

In [9]:
import os
from dotenv import load_dotenv

# Load environment variables from the .env file
load_dotenv()

AZURE_OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
AZURE_OPENAI_EMBEDDING_MODEL = os.getenv("OPENAI_EMBEDDING_MODEL")
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT")

# Overviews
The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)

- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-3-small to create vectors, feel free to use any other open source embedding model if it works.

In [10]:
# Install core packages
! pip install --upgrade langchain langchain-openai langchain-community langchain-chroma

# Install supporting packages
! pip install --upgrade chromadb tiktoken pypdf duckduckgo-search python-dotenv pysqlite3-binary

85964.30s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Collecting langchain-openai
  Downloading langchain_openai-0.3.27-py3-none-any.whl.metadata (2.3 kB)
Downloading langchain_openai-0.3.27-py3-none-any.whl (70 kB)
Installing collected packages: langchain-openai
  Attempting uninstall: langchain-openai
    Found existing installation: langchain-openai 0.3.24
    Uninstalling langchain-openai-0.3.24:
      Successfully uninstalled langchain-openai-0.3.24
Successfully installed langchain-openai-0.3.27


85979.74s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Collecting pypdf
  Downloading pypdf-5.7.0-py3-none-any.whl.metadata (7.2 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Downloading pypdf-5.7.0-py3-none-any.whl (305 kB)
Downloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv, pypdf
[2K  Attempting uninstall: python-dotenv
[2K    Found existing installation: python-dotenv 1.1.0
[2K    Uninstalling python-dotenv-1.1.0:
[2K      Successfully uninstalled python-dotenv-1.1.0
[2K  Attempting uninstall: pypdf
[2K    Found existing installation: pypdf 5.6.1
[2K    Uninstalling pypdf-5.6.1:
[2K      Successfully uninstalled pypdf-5.6.1
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [pypdf]32m1/2[0m [pypdf]
[1A[2KSuccessfully installed pypdf-5.7.0 python-dotenv-1.1.1


In [24]:
from dotenv import load_dotenv
import os
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import AzureOpenAIEmbeddings
from langchain_chroma import Chroma

# Load environment variables
load_dotenv()

# Configuration
PDF_PATH = "data/BonBon FAQ.pdf"
CHROMA_DIR = "./chroma_db"

# Initialize embedding model
embedding = AzureOpenAIEmbeddings(
    azure_deployment=os.environ["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"]
)

# Load and split PDF into chunks
loader = PyPDFLoader(PDF_PATH)
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
pages = loader.load_and_split(splitter)

# Create Chroma vector store with persistence
vectordb = Chroma.from_documents(
    documents=pages,
    embedding=embedding,
    persist_directory=CHROMA_DIR
)

# Persist to disk only if supported
if hasattr(vectordb, "persist"):
    vectordb.persist()

print("✅ Indexed FAQ and built Chroma vector DB")


✅ Indexed FAQ and built Chroma vector DB


## Assignment 2: Building Chatbot (mandatory)
- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-4o LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

In [2]:
from dotenv import load_dotenv
import os

from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.agents import Tool
from langchain.tools import DuckDuckGoSearchRun
from langchain.chains import RetrievalQA
from langchain.memory import ConversationBufferWindowMemory
from langchain_core.runnables import RunnableLambda

class BonBonChatbot:
    def __init__(self):
        load_dotenv()
        self._load_environment_variables()
        self._initialize_embedding()
        self._load_vectorstore()
        self._initialize_llm()
        self._initialize_memory()
        self._setup_retrieval_chain()
        self._setup_tools()

    def _load_environment_variables(self):
        self.api_key = os.getenv("OPENAI_API_KEY")
        self.azure_embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT")
        self.azure_chat_deployment = os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT")
        self.azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
        self.azure_endpoint_gpt = os.getenv("AZURE_OPENAI_ENDPOINT_GPT")
        self.api_version = os.getenv("AZURE_OPENAI_API_VERSION")

    def _initialize_embedding(self):
        self.embedding = AzureOpenAIEmbeddings(
            azure_deployment=self.azure_embedding_deployment,
            api_key=self.api_key,
            azure_endpoint=self.azure_endpoint,
            api_version=self.api_version,
        )

    def _load_vectorstore(self):
        my_path = os.path.expanduser("./chroma_db")
        self.vectordb = Chroma(persist_directory=my_path, embedding_function=self.embedding)

    def _initialize_llm(self):
        self.model = AzureChatOpenAI(
            azure_deployment=self.azure_chat_deployment,
            api_key=self.api_key,
            azure_endpoint=self.azure_endpoint_gpt,
            api_version=self.api_version,
            temperature=0.2,
        )

    def _initialize_memory(self):
        self.memory = ConversationBufferWindowMemory(
            return_messages=True,
            k=5
        )

    def _setup_retrieval_chain(self):
        self.qa = RetrievalQA.from_chain_type(
            llm=self.model,
            retriever=self.vectordb.as_retriever(search_kwargs={"k": 5}),
            return_source_documents=True,
            verbose=False,
        )

    def _custom_qa_with_source(self, query):
        docs = self.vectordb.as_retriever(search_kwargs={"k": 1}).get_relevant_documents(query)
        if not docs:
            return "I don't know."
        doc = docs[0]
        chunk = doc.page_content.strip()
        meta = doc.metadata
        source = meta.get("source", "BonBon FAQ.pdf")
        page = meta.get("page", "unknown")
        try:
            page_num = int(page) + 1
        except Exception:
            page_num = page

        result = self.qa.invoke({"query": query})
        answer = result["result"].strip()
        if len(answer) < 5 or answer == "...":
            answer = chunk

        return f"{answer}\n\n(Source: {source} (page {page_num}))"

    def _setup_tools(self):
        self.tools = [
            Tool(
                name="Retrieve Answer",
                func=self._custom_qa_with_source,
                description="Use this to answer questions about BonBon FAQ topics like internet connection, printer, malware, etc. Always cite the source and page number.",
            ),
            Tool(
                name="Search",
                func=DuckDuckGoSearchRun().run,
                description="Use this for anything else or if the KB cannot answer the question.",
            ),
        ]

    def interact(self):
        print("💬 Chatbot Ready! (type 'exit' to quit)\n")
        turn = 1
        while True:
            user_input = input("User: ")
            if user_input.lower() == "exit":
                break

            print("→ Trying knowledge base (FAQ) tool first…")
            response = self.tools[0].func(user_input)
            if (
                "don't know" in response.strip().lower()
                or response.strip() == ""
                or response is None
            ):
                print("→ No FAQ answer found, using search tool instead.")
                response = self.tools[1].func(user_input)
            else:
                print("→ Answered from knowledge base.")

            print("\n" + "=" * 60)
            print(f"🟦 Question {turn}: {user_input}\n")
            print(f"🤖 Answer:\n{response}")
            print("=" * 60 + "\n")
            turn += 1


if __name__ == "__main__":
    bot = BonBonChatbot()
    bot.interact()


💬 Chatbot Ready! (type 'exit' to quit)

→ Trying knowledge base (FAQ) tool first…
→ Answered from knowledge base.

🟦 Question 1: How do I connect to the Any Corp’s Corporate VPN (Virtual Private Network)?

🤖 Answer:
To connect to Any Corp’s Corporate VPN, follow these steps:

### 1. **Obtain VPN Credentials**
   - Contact your company's IT department to get the necessary credentials for connecting to the VPN. These typically include:
     - Username
     - Password
     - VPN server address (and possibly additional details).

### 2. **Install VPN Software (if required)**
   - If Any Corp provides custom VPN client software, download and install it on your computer or device. The IT department will guide you on where to find the software.

### 3. **Configure VPN Settings (if using built-in clients)**
   - If Any Corp uses standard VPN protocols (e.g., PPTP, L2TP, IPSec, OpenVPN), you can use the built-in VPN client on your operating system.

#### For **Windows**:
   - Go to **Settings**

## Assignment 3: Build a new assistant based on BonBon source code (optional)
The objective
- Run the code and index the sample BonBon FAQ.pdf file to Azure Cognitive Search
- Explore the code and implement a new assistant that has the same behavior as above
- Explore other features such as RBACs, features on admin portal

Please contact the training team in case you need to get the source code of BonBon.