# Pre-requisites
- WSL
- Miniconda3 

# Setup environment
- Create conda env `conda create langchain python=3.11`
- Install the ipykernel into langchain conda env with `conda install -n langchain ipykernel --update-deps --force-reinstall`
- Set the "langchain" env that has been just created as the running env in VS code


Install langchain and openai package

In [56]:
! pip install langchain==0.3.2 openai==1.51.0 python-dotenv==1.0.1 langchain-openai==0.2.2 langchain-community==0.3.0 pypdf==5.0.1 chromadb==0.5.11 duckduckgo-search==6.2.13 numexpr==2.10.1



# Init variables

You need to set value of `OPENAI_API_KEY` that you get from the training team in the .env file

In [1]:
import openai, os
from dotenv import load_dotenv

OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
OPENAI_DEPLOYMENT_VERSION = "2024-06-01"
OPENAI_MODEL_NAME = "gpt-4o"

OPENAI_ADA_EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
OPENAI_ADA_EMBEDDING_DEPLOYMENT = "text-embedding-ada-002"

# Overviews
The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)

- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-ada-002 to create vectors, feel free to use any other open source embedding model if it works.

## Create an embedding model

In [2]:
from langchain_openai import AzureOpenAIEmbeddings

# setup embedding model
embeddings = AzureOpenAIEmbeddings(
    deployment=OPENAI_ADA_EMBEDDING_DEPLOYMENT,
    model=OPENAI_ADA_EMBEDDING_MODEL_NAME,
    chunk_size=1,
    api_key=OPENAI_API_KEY,
    api_version=OPENAI_DEPLOYMENT_VERSION,
)

## Load data 

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pypdf import PdfReader
from pydantic import BaseModel, Field

file_path = ".\data\BonBon FAQ.pdf"
reader = PdfReader(file_path)
documents = [page.extract_text() for page in reader.pages]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)



## Setup collection name and local data store

In [4]:
import os 

persist_directory = "local_vectorstore"
local_store = "local_docstore"
collection_name = "react_agent"
try:
    PROJECT_ROOT = os.path.dirname(os.path.abspath(__file__))
except Exception:
    PROJECT_ROOT = os.getcwd()

## Load vector collection
### Case 1: Vector collection doesn't exist -> Create new collection

In [59]:
import chromadb
client = chromadb.PersistentClient(path = os.path.join(PROJECT_ROOT, "data", local_store))

# Create a collection
try:
    client.delete_collection(collection_name)
except:
    print(f"Collection '{collection_name}' did not exist")
collection = client.create_collection(collection_name)
print(f"Collection '{collection_name}' created")

Collection 'react_agent' created


In [60]:
for doc_i,doc in enumerate(documents):
    chunks = text_splitter.split_text(doc)
    for index, chunk in enumerate(chunks):
        vector = embeddings.embed_query(chunk)  # Create an embedding for the document
        collection.add(
            ids=[chunk],
            documents=[chunk],
            embeddings=[vector],  # The vector representation
            metadatas=[{"page": doc_i, "source": "BonBon FAQ"}]  # Optional metadata
        )

## Case 2: Load existing vector collection

In [22]:
import os

import chromadb
client = chromadb.PersistentClient(path = os.path.join(PROJECT_ROOT, "data", local_store))

# Create a collection
collection = client.get_collection(name=collection_name)

## Create a vector store from collection

In [61]:
from langchain.vectorstores import Chroma

vectorstore = Chroma(client=client, collection_name=collection_name, embedding_function=embeddings)

## Assignment 2: Building Chatbot (mandatory)
- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-3.5 LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

## Set up Tools for agent

In [67]:
from langchain import hub
from langchain.agents import (
    AgentExecutor,
    Tool,
    create_openai_tools_agent,
    create_react_agent,
    tool,
)
from langchain_openai import AzureChatOpenAI
from langchain.memory import ConversationEntityMemory
from langchain.chains import LLMMathChain, RetrievalQA
from langchain_core.prompts.prompt import PromptTemplate
from langchain_community.tools import DuckDuckGoSearchRun
# Setting up ReAct Agent

llm = AzureChatOpenAI(
    deployment_name=OPENAI_MODEL_NAME,
    model_name=OPENAI_MODEL_NAME,
    azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
    api_version=OPENAI_DEPLOYMENT_VERSION,
    api_key=OPENAI_API_KEY,
    temperature=0,
)

retriever = vectorstore.as_retriever()
doc_prompt = PromptTemplate.from_template(
    "<context>\n{page_content}\n\n<meta>\nsource: {source}\npage: {page}\n</meta>\n</context>"
)
tool_search = create_retriever_tool(
    retriever=retriever,
    name="Search BonBon",
    description="Searches and returns answer from BONBON FAQ.",
    document_prompt=doc_prompt,
)

math_chain = LLMMathChain.from_llm(llm=llm)
duckduck = DuckDuckGoSearchRun()

tools = [
    tool_search,
    Tool(
        name="Duck Duck Go",
        func=duckduck.run,
        description="useful for when you need to search for more information on the internet with Duck Duck Go search",
    ),
    Tool(
        name="Calculator",
        func=math_chain.run,
        description="use this tool for math calculating",
    ),
]

## Building agent

In [68]:
template = """
Answer the following questions as best you can. You can use history {history} to fill in unknown context. You have access to the following tools:

{tools}

Use the following format:
Remember: If any invalid format occurs, terminate and return answer
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, find in Search BonBon tool first, then should be one of [{tool_names}]. If not, terminate answer
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat at most 5 times.)
Thought: I now know the final answer. Let's return the answer
Final Answer: the final answer to the original input question. If use Search BonBon then include the page of the PDF file that has the question
You should only tell the final answere
Begin!
Context:
{entities}

Current conversation:
Chat History:
{history}
Last line:
Human: {input}
Thought: {agent_scratchpad}
"""
prompt_react = hub.pull("hwchase17/react")
prompt_react.template = template
react_agent = create_react_agent(llm=llm, tools=tools, prompt=prompt_react)
react_agent_executor = AgentExecutor(
    agent=react_agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True,
    memory=ConversationEntityMemory(llm=llm, top_k=3),
    max_iterations=3
)



## Building chatbot that wrap around agent

In [70]:
i = input("enter prompt ('exit' to terminate): ")
while i.lower() != "exit":
    query = i
    response = react_agent_executor.invoke({"input": query})
    result = response.get("output")
    print(result)
    i = input("enter new prompt ('exit' to terminate): ")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mFinal Answer: 2[0m

[1m> Finished chain.[0m
2


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: Search BonBon
Action Input: "printing problems"[0m[36;1m[1;3m<context>
print job.  
8. Test with a different document or application:  
• Try printing from a different document or application to see if the issue is specific to one 
file or program. If you can print successfully from other sources, the problem may lie within the 
original document or application.  
9. Restart print spooler service (Windows):  
• Open the "Services" application on your laptop. Locate the "Print Spooler" service, right -
click on it, and select "Restart." This action clears the print spooler and can resolve certain 
printing issues.  
10. Reinstall printer software:

<meta>
source: BonBon FAQ
page: 8
</meta>
</context>

<context>
queue by clicking on the printer icon in the system tray (Windows) or the printer settings (Mac). 

## Assignment 3: Build a new assistant based on BonBon source code (optional)
The objective
- Run the code and index the sample BonBon FAQ.pdf file to Azure Cognitive Search
- Explore the code and implement a new assistant that has the same behavior as above
- Explore other features such as RBACs, features on admin portal

Please contact the training team in case you need to get the source code of BonBon.