# Pre-requisites
- WSL
- Miniconda3 

# Setup environment
- Create conda env `conda create langchain python=3.11`
- Set the "langchain" env that has been just created as the running env in VS code


Install  python-dotenv

In [7]:
! pip install python-dotenv




Install langchain and openai package

In [None]:
! pip install langchain openai

Collecting langchain
  Using cached langchain-0.3.26-py3-none-any.whl.metadata (7.8 kB)
Collecting openai
  Using cached openai-1.90.0-py3-none-any.whl.metadata (26 kB)
Collecting langchain-core<1.0.0,>=0.3.66 (from langchain)
  Using cached langchain_core-0.3.66-py3-none-any.whl.metadata (5.8 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain)
  Using cached langchain_text_splitters-0.3.8-py3-none-any.whl.metadata (1.9 kB)
Collecting langsmith>=0.1.17 (from langchain)
  Using cached langsmith-0.4.1-py3-none-any.whl.metadata (15 kB)
Collecting pydantic<3.0.0,>=2.7.4 (from langchain)
  Using cached pydantic-2.11.7-py3-none-any.whl.metadata (67 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain)
  Using cached sqlalchemy-2.0.41-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting requests<3,>=2 (from langchain)
  Using cached requests-2.32.4-py3-none-any.whl.metadata (4.9 kB)
Collecting PyYAML>=5.3 (from langchain)
  Using cached PyYAM

In [12]:
! pip install --upgrade pip
! pip install langchain langchain_community langchain_openai openai python-dotenv pypdf chromadb pysqlite3-binary



# Init variables

You need to set value of `OPENAI_API_KEY` that you get from the training team in the .env file

In [6]:
import os
from dotenv import load_dotenv

# Load environment variables from the .env file
load_dotenv()

AZURE_OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
AZURE_OPENAI_EMBEDDING_MODEL = os.getenv("OPENAI_EMBEDDING_MODEL")
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT")

# Overviews
The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)

- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-3-small to create vectors, feel free to use any other open source embedding model if it works.

In [2]:
# Install core packages
! pip install --upgrade langchain langchain-openai langchain-community langchain-chroma

# Install supporting packages
! pip install --upgrade chromadb tiktoken pypdf duckduckgo-search python-dotenv pysqlite3-binary



In [4]:
from dotenv import load_dotenv
import os
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import AzureOpenAIEmbeddings
from langchain_chroma import Chroma

load_dotenv()

embedding = AzureOpenAIEmbeddings(
    azure_deployment=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT"),
    api_key=os.getenv("OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION")
)

loader = PyPDFLoader("data/BonBon FAQ.pdf")
pages = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
docs = splitter.split_documents(pages)
vectordb = Chroma.from_documents(docs, embedding, persist_directory="./chroma_db")
print("✅ Indexed FAQ and built Chroma vector DB")

✅ Indexed FAQ and built Chroma vector DB


## Assignment 2: Building Chatbot (mandatory)
- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-4o LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

In [9]:
from dotenv import load_dotenv
import os

load_dotenv()

from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.agents import Tool, initialize_agent, AgentType
from langchain.tools import DuckDuckGoSearchRun
from langchain.chains import RetrievalQA
from langchain.chains.conversation.memory import ConversationBufferWindowMemory

# Load vectorstore
embedding = AzureOpenAIEmbeddings(
    azure_deployment=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT"),
    api_key=os.getenv("OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION")
)
my_path = os.path.expanduser("./chroma_db")
vectordb = Chroma(persist_directory=my_path, embedding_function=embedding)

# Setup LLM: GPT-3.5 Turbo
model = AzureChatOpenAI(
    azure_deployment=os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT"),
    api_key=os.getenv("OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT_GPT"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    temperature=0.2
)

# Conversational memory
conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=5,
    return_messages=True
)

# RetrievalQA chain for KB search, with citation
def custom_qa_with_source(query):
    # Retrieve relevant chunk
    docs = vectordb.as_retriever(search_kwargs={"k": 1}).get_relevant_documents(query)
    if not docs:
        return "I don't know."
    doc = docs[0]
    chunk = doc.page_content.strip()
    meta = doc.metadata
    source = meta.get('source', 'BonBon FAQ.pdf')
    page = meta.get('page', 'unknown')
    try:
        page_num = int(page) + 1
    except Exception:
        page_num = page
    # Let LLM try to paraphrase or extract
    result = qa.invoke({"query": query})
    answer = result['result'].strip()
    # If LLM answer is empty or just "..."
    if len(answer) < 5 or answer == "...":
        answer = chunk
    answer += f"\n\n(Source: {source} (page {page_num}))"
    return answer


qa = RetrievalQA.from_chain_type(
    llm=model,
    retriever=vectordb.as_retriever(search_kwargs={"k": 5}),
    return_source_documents=True,
    verbose=False,  # set to True for debugging
)

# Tools for the agent
tools = [
    Tool(
        name='Retrieve Answer',
        func=custom_qa_with_source,
        description='Use this to answer questions about BonBon FAQ topics like internet connection, printer, malware, etc. Always cite the source and page number.'
    ),
    Tool(
        name='Search',
        func=DuckDuckGoSearchRun().run,
        description='Use this for anything else or if the KB cannot answer the question.'
    ),
]

def chatbot_interaction():
    print("💬 Chatbot Ready! (type 'exit' to quit)\n")
    turn = 1
    while True:
        user_input = input("User: ")
        if user_input.lower() == "exit":
            break

        # Always try KB first
        print("→ Trying knowledge base (FAQ) tool first…")
        response = tools[0].func(user_input)
        if "don't know" in response.strip().lower() or response.strip() == "" or response is None:
            print("→ No FAQ answer found, using search tool instead.")
            response = tools[1].func(user_input)
        else:
            print("→ Answered from knowledge base.")

        print("\n" + "="*60)
        print(f"🟦 Question {turn}: {user_input}\n")
        print(f"🤖 Answer:\n{response}")
        print("="*60 + "\n")
        turn += 1

chatbot_interaction()

💬 Chatbot Ready! (type 'exit' to quit)

→ Trying knowledge base (FAQ) tool first…
→ Answered from knowledge base.

🟦 Question 1: Hi

🤖 Answer:
Hello! How can I assist you today? 😊

(Source: data/BonBon FAQ.pdf (page 14))

→ Trying knowledge base (FAQ) tool first…
→ Answered from knowledge base.

🟦 Question 2: How do I reset my password?

🤖 Answer:
To reset your password, go to the “Where to Reset my Password for which application” web page at the following link: [www.anycorp.intranet.passwordreset/com](http://www.anycorp.intranet.passwordreset/com). There, you can select the application for which you need to reset your password and follow the provided instructions.

(Source: data/BonBon FAQ.pdf (page 3))

→ Trying knowledge base (FAQ) tool first…
→ Answered from knowledge base.

🟦 Question 3: How do I connect to Any Corp’s Corporate Wi-Fi network?

🤖 Answer:
To connect to Any Corp’s Corporate Wi-Fi network, follow these steps:

1. **Go to your device's Wi-Fi settings**:
   - For Window

## Assignment 3: Build a new assistant based on BonBon source code (optional)
The objective
- Run the code and index the sample BonBon FAQ.pdf file to Azure Cognitive Search
- Explore the code and implement a new assistant that has the same behavior as above
- Explore other features such as RBACs, features on admin portal

Please contact the training team in case you need to get the source code of BonBon.