# Pre-requisites
- WSL
- Miniconda3 

# Setup environment
- Create conda env `conda create langchain python=3.11`
- Set the "langchain" env that has been just created as the running env in VS code


Install langchain and openai package

In [None]:
! pip install langchain openai

# Init variables

You need to set value of `OPENAI_API_KEY` that you get from the training team in the .env file

In [2]:
# SETUP ENVIRONMENT VARIABLE

import openai, os
from dotenv import load_dotenv

load_dotenv()

OPEN_API_TYPE = "azure"
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
OPENAI_DEPLOYMENT_VERSION = "2023-05-15"
OPENAI_MODEL_NAME = "gpt-4o"

OPENAI_ADA_EMBEDDING_MODEL_NAME = "text-embedding-ada-002"

# Overviews
The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)

- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-ada-002 to create vectors, feel free to use any other open source embedding model if it works.

In [3]:
# READING THE DATA INSIDE THE "BonBon FAQ.pdf"

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(".\data\BonBon FAQ.pdf")
documents = loader.load()
documents

[Document(metadata={'source': '.\\data\\BonBon FAQ.pdf', 'page': 0}, page_content="General guidelines for categorising requests as assessing Priority.  \n \nCategorize the incident accurately based on predefined categories.  \n1.       Password and Account Management:  \nExamples:  \n• Password Resets: Assisting users who have forgotten their passwords or need to reset them \ndue to security reasons.  \n• Account Creations: Creating new user accounts for employees or clients, granting access to \nvarious systems and services.  \n• Username Recovery: Helping users retrieve their forgotten usernames or login IDs.  \n2.       Software and Application Support  \nExamples:  \n• Providing guidance and troubleshooting assistance during the installation of software \napplications on users' devices.  \n• Software Installation  \n• Application Errors: Resolving issues related to errors or crashes that occur while using \nspecific software applications.  \n• Configuration Assistance: Helping user

In [4]:
# SPLIT DATA INTO SMALL CHUNK

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
texts

[Document(metadata={'source': '.\\data\\BonBon FAQ.pdf', 'page': 0}, page_content="General guidelines for categorising requests as assessing Priority.  \n \nCategorize the incident accurately based on predefined categories.  \n1.       Password and Account Management:  \nExamples:  \n• Password Resets: Assisting users who have forgotten their passwords or need to reset them \ndue to security reasons.  \n• Account Creations: Creating new user accounts for employees or clients, granting access to \nvarious systems and services.  \n• Username Recovery: Helping users retrieve their forgotten usernames or login IDs.  \n2.       Software and Application Support  \nExamples:  \n• Providing guidance and troubleshooting assistance during the installation of software \napplications on users' devices.  \n• Software Installation  \n• Application Errors: Resolving issues related to errors or crashes that occur while using \nspecific software applications.  \n• Configuration Assistance: Helping user

In [5]:
# SETUP EMBEDDING

from langchain_openai import AzureOpenAIEmbeddings

embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(
    azure_deployment=OPENAI_ADA_EMBEDDING_MODEL_NAME,
    openai_api_version=OPENAI_DEPLOYMENT_VERSION,
    azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
    api_key=OPENAI_API_KEY,
)

In [6]:
# CREATE DATABASE AND ADD VECTOR TO DATABASE

from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="collection",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",
)

vector_store.add_documents(texts)

['b5c73e47-20c3-4e84-b4c7-85751ca75469',
 'c86bab28-e505-4da3-89d2-0debc0382182',
 'a16f548f-5c79-4e42-9a8e-7232583dbac3',
 '3560983c-3d6e-44e6-8835-bba6876f7f5f',
 '4cbba9d3-3ec4-46dd-b8ca-52c698cfd5f9',
 '5327b71b-1ff7-49ab-bcf8-464ecfd31684',
 'da5ac275-df10-4601-a51d-f3ee14c73fbf',
 '7e773731-0831-4436-b7b4-91e8ae0e7aab',
 '9861e4ba-8158-454e-ba10-c4ca9075fe2f',
 '3a0227a5-ce76-4174-b43a-34dc30dc68b4',
 '0514ed71-d340-48d2-a17b-572ed0477a0a',
 '50f6ad0e-0e19-49cc-9096-97079ca23882',
 '30e938f4-6eae-4764-b221-fdf3045e6289',
 '47ad86e1-f372-47af-8723-0bec2ac98f70',
 'f6770838-d1f4-4cbf-bbe8-f33a83657782',
 '4fba4628-5d81-4c1a-84e6-4cb97a54ecb4',
 '05673b2b-8ed9-46da-9542-8c676ea27abd',
 '60eb6e13-b99f-4e2b-81e5-3650c7ac8ed6',
 '7769bead-cae5-45f8-9e44-f5733f403aed',
 'd212aad0-b4f4-4b4b-b218-66010261188c',
 '33172e72-b452-46ec-90a9-7dc1668fcca7',
 '25071fad-d327-426d-9dca-204bbc73c8e8',
 '9107c502-31a5-4f67-83e8-ff32974dcf32',
 'fcb7b8de-7b55-453f-ae83-8857112c0c66',
 '4175970a-d4bd-

In [7]:
# TEST RESULT WITH SAMPLE QUERY

results = vector_store.search(
    query="What does NashTech Business Process Outsourcing Team do?",
    search_type="similarity"
)

for result in results:
    print(result)

page_content='Data quality management: Including verifying, standardizing, and auditing data . 
Language services: NashTech can support business process outsourcing in over 28 languages . 
Sales and Marketing support:  Including customer experience, promotional services and support for Sales 
and Marketing, including campaign management and Customer Relationship administration . 
Digitization:  Including content extraction and validation, paper to digital conversion, document 
management . 
Back -office operations:  HR & admin support, image processing, financial service, claim administration . 
Market research:  Including market data collection, consolidation, analysis, and reporting.' metadata={'page': 12, 'source': '.\\data\\BonBon FAQ.pdf'}
page_content='Data quality management: Including verifying, standardizing, and auditing data . 
Language services: NashTech can support business process outsourcing in over 28 languages . 
Sales and Marketing support:  Including customer experie

## Assignment 2: Building Chatbot (mandatory)
- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-3.5 LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

In [14]:
# SETUP LLM WITH AZURE CHAT OPEN AI

from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
    deployment_name=OPENAI_MODEL_NAME,
    model_name=OPENAI_MODEL_NAME,
    azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
    api_version=OPENAI_DEPLOYMENT_VERSION,
    api_key=OPENAI_API_KEY,
    temperature=0,
)

In [18]:
# SETUP CHAT

from langchain_core.prompts.prompt import PromptTemplate
from langchain.tools.retriever import create_retriever_tool
from langchain.chains import LLMMathChain
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import (
    AgentExecutor,
    Tool,
    create_react_agent,
)
from langchain import hub
from langchain.memory import ConversationEntityMemory

doc_prompt = PromptTemplate.from_template(
    "<context>\n{page_content}\n\n<meta>\nsource: {source}\npage: {page} + 1\n</meta>\n</context>"
)
tool_search = create_retriever_tool(
    retriever=vector_store.as_retriever(),
    name="Search BonBon",
    description="Searches and returns answer from BONBON FAQ.",
    document_prompt=doc_prompt,
)

math_chain = LLMMathChain.from_llm(llm=llm)
duckduck = DuckDuckGoSearchRun()

tools = [
    tool_search,
    Tool(
        name="Duck Duck Go",
        func=duckduck.run,
        description="useful for when you need to search for more information on the internet with Duck Duck Go search",
    ),
    Tool(
        name="Calculator",
        func=math_chain.run,
        description="use this tool for math calculating",
    ),
]

template = """
Answer the following questions as best you can. You can use history {history} to fill in unknown context. You have access to the following tools:

{tools}

Use the following format:
Remember: If any invalid format occurs, terminate and return answer
Question: the input question you must answer
Thought: you should always think about what to do
Action (if existed): the action to take, find in Search BonBon tool first, then should be one of [{tool_names}] (one of [Search BonBon, Duck Duck Go, Calculator]). If DuckDuckGo has already been used, skip it. If not, terminate answer
Action Input (if existed): the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat at most 5 times.)
Thought: I now know the final answer. Let's return the answer
Final Answer: the final answer to the original input question. If use Search BonBon then include the page of the PDF file that has the question

Begin!
Context:
{entities}

Current conversation:
Chat History:
{history}
Last line:
Human: {input}
Thought: {agent_scratchpad}
"""
prompt_react = hub.pull("hwchase17/react")
prompt_react.template = template
react_agent = create_react_agent(llm=llm, tools=tools, prompt=prompt_react)
react_agent_executor = AgentExecutor(
    agent=react_agent,
    tools=tools,
    verbose=False,
    handle_parsing_errors=True,
    memory=ConversationEntityMemory(llm=llm, top_k=3),
    max_iterations=3
)
i = input("enter promt ('exit' to terminate): ")
while i.lower() != "exit":
    print(f"Question: {i}")
    print("Responding ...")
    response = react_agent_executor.invoke({"input": i})
    result = response.get("output")
    print(f"Final Answer: {result} \n")
    i = input("enter new promt ('exit' to terminate): ")



Question: hello
Responding ...
Final Answer: Agent stopped due to iteration limit or time limit. 

Question: translate: hello from english to vietnamese
Responding ...
Final Answer: The translation of "hello" from English to Vietnamese is "Xin chào" (pronounced like 'sin chow'). This is the most common and polite way to greet someone in Vietnamese. 



## Assignment 3: Build a new assistant based on BonBon source code (optional)
The objective
- Run the code and index the sample BonBon FAQ.pdf file to Azure Cognitive Search
- Explore the code and implement a new assistant that has the same behavior as above
- Explore other features such as RBACs, features on admin portal

Please contact the training team in case you need to get the source code of BonBon.