# Pre-requisites
- WSL
- Miniconda3 

# Setup environment
- Create conda env `conda create langchain python=3.11`
- Set the "langchain" env that has been just created as the running env in VS code


Install langchain and openai package

In [14]:
! pip install -r requirements.txt

Collecting pypdf==5.0.1 (from -r requirements.txt (line 134))
  Using cached pypdf-5.0.1-py3-none-any.whl.metadata (7.4 kB)
Using cached pypdf-5.0.1-py3-none-any.whl (294 kB)
Installing collected packages: pypdf
Successfully installed pypdf-5.0.1


# Init variables

You need to set value of `OPENAI_API_KEY` that you get from the training team in the .env file

In [15]:
# SETUP ENVIRONMENT VARIABLE

import openai, os
from dotenv import load_dotenv

load_dotenv()

OPEN_API_TYPE = "azure"
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
OPENAI_DEPLOYMENT_VERSION = "2023-05-15"
OPENAI_MODEL_NAME = "gpt-4o"

OPENAI_ADA_EMBEDDING_MODEL_NAME = "text-embedding-ada-002"

# Overviews
The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)

- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-ada-002 to create vectors, feel free to use any other open source embedding model if it works.

In [16]:
# READING THE DATA INSIDE THE "BonBon FAQ.pdf"

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(".\data\BonBon FAQ.pdf")
documents = loader.load()
documents

[Document(metadata={'source': '.\\data\\BonBon FAQ.pdf', 'page': 0}, page_content="General guidelines for categorising requests as assessing Priority.  \n \nCategorize the incident accurately based on predefined categories.  \n1.       Password and Account Management:  \nExamples:  \n• Password Resets: Assisting users who have forgotten their passwords or need to reset them \ndue to security reasons.  \n• Account Creations: Creating new user accounts for employees or clients, granting access to \nvarious systems and services.  \n• Username Recovery: Helping users retrieve their forgotten usernames or login IDs.  \n2.       Software and Application Support  \nExamples:  \n• Providing guidance and troubleshooting assistance during the installation of software \napplications on users' devices.  \n• Software Installation  \n• Application Errors: Resolving issues related to errors or crashes that occur while using \nspecific software applications.  \n• Configuration Assistance: Helping user

In [17]:
# SPLIT DATA INTO SMALL CHUNK

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
texts

[Document(metadata={'source': '.\\data\\BonBon FAQ.pdf', 'page': 0}, page_content="General guidelines for categorising requests as assessing Priority.  \n \nCategorize the incident accurately based on predefined categories.  \n1.       Password and Account Management:  \nExamples:  \n• Password Resets: Assisting users who have forgotten their passwords or need to reset them \ndue to security reasons.  \n• Account Creations: Creating new user accounts for employees or clients, granting access to \nvarious systems and services.  \n• Username Recovery: Helping users retrieve their forgotten usernames or login IDs.  \n2.       Software and Application Support  \nExamples:  \n• Providing guidance and troubleshooting assistance during the installation of software \napplications on users' devices.  \n• Software Installation  \n• Application Errors: Resolving issues related to errors or crashes that occur while using \nspecific software applications.  \n• Configuration Assistance: Helping user

In [18]:
# SETUP EMBEDDING

from langchain_openai import AzureOpenAIEmbeddings

embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(
    azure_deployment=OPENAI_ADA_EMBEDDING_MODEL_NAME,
    openai_api_version=OPENAI_DEPLOYMENT_VERSION,
    azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
    api_key=OPENAI_API_KEY,
)

In [19]:
# CREATE DATABASE AND ADD VECTOR TO DATABASE

from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="collection",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",
)

vector_store.add_documents(texts)

['57bcd528-0408-41ef-af38-4a45ac908800',
 '97f30cd5-5f58-4bce-937b-50cc54b49a88',
 'd9ee1ac8-b26e-47c1-9e4d-cbcb2a45b1a1',
 '8d4f340b-da63-4470-b826-7669bd531778',
 'f1e88361-969a-4e06-a577-53132be8f84f',
 '5da8e50b-dbe8-4663-939d-f20d672b1f63',
 'f0ebf0d9-0c50-4480-8fc9-4df58320d184',
 'be84ab78-fc7c-46de-962f-a4e837d9a880',
 '5c195073-45e3-43fd-adbb-d65a3bfe1151',
 'ee80bf05-786f-4e2b-b3c9-80d0de80ffc6',
 '66501d97-faf2-48c9-a5a5-92b1a24e7468',
 '80b45222-b3cb-4f07-8ce8-88a3855687ca',
 '27c23b4d-0c3c-491a-9de8-6ae14260c441',
 '0dcc3490-e487-4204-9e9c-90406cd72c13',
 '841e34a2-f1bf-43cc-b781-dd8c41644821',
 '02cfbd86-4a57-4e73-a1d8-b9ed390bc1bd',
 'f6d6e5a1-6cdb-436a-8193-fc3590cf1ef3',
 '5e537fd8-52af-4bc7-b073-8575382ab632',
 '83a2baaa-7cd1-4661-8160-1a1af5b69ab9',
 '77766257-3cd9-4ce6-b70b-671daf242cf6',
 '18bf00a9-3687-4693-b58f-e4d362073a43',
 'a36a3211-6edb-4ef7-8047-a5d52f7171b1',
 'cccc6ca7-d637-4525-8e33-b0008b0b6f25',
 '2dbfd0df-ab13-4fa6-9f3b-758243a38a4d',
 'e0eeefee-cabe-

In [20]:
# TEST RESULT WITH SAMPLE QUERY

results = vector_store.search(
    query="What does NashTech Business Process Outsourcing Team do?",
    search_type="similarity"
)

for result in results:
    print(result)

page_content='Data quality management: Including verifying, standardizing, and auditing data . 
Language services: NashTech can support business process outsourcing in over 28 languages . 
Sales and Marketing support:  Including customer experience, promotional services and support for Sales 
and Marketing, including campaign management and Customer Relationship administration . 
Digitization:  Including content extraction and validation, paper to digital conversion, document 
management . 
Back -office operations:  HR & admin support, image processing, financial service, claim administration . 
Market research:  Including market data collection, consolidation, analysis, and reporting.' metadata={'page': 12, 'source': '.\\data\\BonBon FAQ.pdf'}
page_content='Data quality management: Including verifying, standardizing, and auditing data . 
Language services: NashTech can support business process outsourcing in over 28 languages . 
Sales and Marketing support:  Including customer experie

## Assignment 2: Building Chatbot (mandatory)
- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-3.5 LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

In [21]:
# SETUP LLM WITH AZURE CHAT OPEN AI

from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
    deployment_name=OPENAI_MODEL_NAME,
    model_name=OPENAI_MODEL_NAME,
    azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
    api_version=OPENAI_DEPLOYMENT_VERSION,
    api_key=OPENAI_API_KEY,
    temperature=0,
)

In [33]:
# SETUP CHAT

import textwrap
from langchain_core.prompts.prompt import PromptTemplate
from langchain.tools.retriever import create_retriever_tool
from langchain.chains import LLMMathChain
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import (
    AgentExecutor,
    Tool,
    create_react_agent,
)
from langchain import hub
from langchain.memory import ConversationEntityMemory

doc_prompt = PromptTemplate.from_template(
    "<context>\n{page_content}\n\n<meta>\nsource: {source}\npage: {page} + 1\n</meta>\n</context>"
)
tool_search = create_retriever_tool(
    retriever=vector_store.as_retriever(),
    name="Search BonBon",
    description="Searches and returns answer from BONBON FAQ.",
    document_prompt=doc_prompt,
)

math_chain = LLMMathChain.from_llm(llm=llm)
duckduck = DuckDuckGoSearchRun()

tools = [
    tool_search,
    Tool(
        name="Duck Duck Go",
        func=duckduck.run,
        description="useful for when you need to search for more information on the internet with Duck Duck Go search",
    ),
    Tool(
        name="Calculator",
        func=math_chain.run,
        description="use this tool for math calculating",
    ),
]

template = """
Answer the following questions as best you can. You can use history {history} to fill in unknown context. You have access to the following tools:

{tools}

Use the following format:
Remember: If any invalid format occurs, terminate and return answer
Question: the input question you must answer
Thought: you should always think about what to do
Action (if existed): the action to take, find in Search BonBon tool first, then should be one of [{tool_names}] (one of [Search BonBon, Duck Duck Go, Calculator]). If DuckDuckGo has already been used, skip it. If not, terminate answer
Action Input (if existed): the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat at most 5 times.)
Thought: I now know the final answer. Let's return the answer
Final Answer: the final answer to the original input question. If use Search BonBon then include the page of the PDF file that has the question

Begin!
Context:
{entities}

Current conversation:
Chat History:
{history}
Last line:
Human: {input}
Thought: {agent_scratchpad}
"""
prompt_react = hub.pull("hwchase17/react")
prompt_react.template = template
react_agent = create_react_agent(llm=llm, tools=tools, prompt=prompt_react)
react_agent_executor = AgentExecutor(
    agent=react_agent,
    tools=tools,
    verbose=False,
    handle_parsing_errors=True,
    memory=ConversationEntityMemory(llm=llm, top_k=3),
    max_iterations=3
)
print(f"Welcome to AI Chat! The server is currently busy, so please be patient. \n")
i = input("enter promt ('exit' to terminate): ")
while i.lower() != "exit":
    print(f"Question: {textwrap.fill(i, width=150)}")
    
    print("Responding ...")
    
    response = react_agent_executor.invoke({"input": i})
    result = response.get("output")
    print(f"Final Answer: {textwrap.fill(result, width=150)} \n")
    
    i = input("enter new promt ('exit' to terminate): ")



Welcome to AI Chat! The server is currently busy, so please be patient. 

Question: how can i reset my password?
Responding ...
Final Answer: To reset your password, go to the “Where to Reset my Password for which application” web page at the following link:
www.anycorp.intranet.passwordreset/com. There, you will be able to select the application for which you need to reset your password and will receive
further instructions.  Source: BonBon FAQ, page 3. 

Question: what is the weather today?
Responding ...
Final Answer: Agent stopped due to iteration limit or time limit. 



## Assignment 3: Build a new assistant based on BonBon source code (optional)
The objective
- Run the code and index the sample BonBon FAQ.pdf file to Azure Cognitive Search
- Explore the code and implement a new assistant that has the same behavior as above
- Explore other features such as RBACs, features on admin portal

Please contact the training team in case you need to get the source code of BonBon.