In [None]:
%pip install langchain_openai

In [6]:
from config.Config import Config
from src.DataLoader import DataLoader
from src.VectorDatabase import VectorDatabase
from src.AgentChat import AgentChat
from langchain.callbacks.tracers import LangChainTracer

# Initialize the project based on the configuration file

In [7]:
config = Config()
print(f"Project *{config.project_name}* initialized from configuration: \n {str(config)} \n")

# keep track of API calls to make less
tracer = LangChainTracer(project_name=config.project_name)

Project *chatbot* initialized from configuration: 
 Config(project_name=chatbot, 
             session_id=abc123, 
             data_directory=data, 
             ignored_files=[], 
             persist_directory=chroma_db, 
             model_first_instruction=You are a helpful robot who engages in conversations with the users and answers questions., 
 



# Load and preprocess the documents

The supported files so far:
- .txt files
- .pdf files

The example documents are loaded and split into chunks of 300 tokens. I generated a pdf HR contract and txt company policy using ChatGPT.

In [8]:
dataloader = DataLoader(config=config)
docs_by_ids = dataloader.process_documents()

docs_by_ids

Processing documents from data
Loading documents from data


Loading new documents: 100%|██████████████████████| 4/4 [00:01<00:00,  3.66it/s]

Done loading 4 documents from data
Split documents into 15 chunks of text (max. 500 tokens each).





{'7': Document(page_content='Tech Innovations Inc. Company Policy\n\n1. Introduction\nThis document outlines the company policies of Tech Innovations Inc. ("the Company"), located at 123 Tech Avenue, Innovation City, Techland. Under the guidance of our HR Director, Jane Smith, these policies have been developed to ensure a productive, secure, and inclusive work environment for all employees.', metadata={'source': 'data\\policy.txt'}),
 '4': Document(page_content='2. Equal Employment Opportunity\nTech Innovations Inc. is committed to providing an equal employment opportunity for all employees and applicants without regard to race, color, religion, sex, national origin, age, disability, or genetics. We strive to create a diverse and inclusive environment where everyone can succeed.', metadata={'source': 'data\\policy.txt'}),
 '2': Document(page_content='3. Code of Conduct\nAll employees are expected to conduct themselves professionally, respecting their colleagues, the company, and our c

In [9]:
ID = '6'

print(f"Page content of doc with ID={ID}: {docs_by_ids.get(ID).page_content}\n")
print(f"Source of text: {docs_by_ids.get('0').metadata['source']}\n")
print(f"Page of text: {docs_by_ids.get('0').metadata['page']}\n")

Page content of doc with ID=6: Compliance with the Global Data Protection Regulation (GDPR) and local laws.
8. Termination
Notice Period: 4 weeks by either party.
Grounds for Immediate Termination: Gross misconduct, breach of confidentiality, or failure
to comply with company policies.
Severance: Equivalent to two months' salary, provided the termination is not for gross
misconduct.
9. Dispute Resolution
Any disputes arising from or related to the employment contract will be resolved through

Source of text: data\Contract.pdf

Page of text: 2



# Create a vector database

For this, we will use:
- Chroma
- OpenAI embeddings

In [10]:
vdb = VectorDatabase(docs_by_ids, config=config)
print(vdb.vector_database.get())

Embeddings created using OpenAIEmbeddings.
Vector database created in chroma_db.


# Retriever

We can create a retriever based on the database and query the data.

The output contains the most relevant k chunks in terms of semantic similarity to the query.

In [11]:
retriever = vdb.get_database_as_retriever(k=3)

print(retriever.invoke("What are my benefits?"))
print(retriever.invoke("How are performance reviews conducted?"))

[Document(page_content='Other Benefits: Gym membership reimbursement up to $500 annually, childcare assistance\nup to $1000 annually.\nPage 1', metadata={'page': 0, 'source': 'data\\Contract.pdf'}), Document(page_content='Base Salary: $75,000 annually, payable in monthly installments\nBonus: Eligible for an annual performance-based bonus of up to 10% of base salary\nStock Options: Eligible after one year of service, details to be provided upon eligibility.\n4. Benefits\nHealth Insurance: Comprehensive health, dental, and vision plan, effective after 30 days of\nemployment.\nRetirement Plan: 401(k) plan with company matching up to 5% of salary.', metadata={'page': 0, 'source': 'data\\Contract.pdf'}), Document(page_content='5. Health and Safety\nThe health and safety of our employees are of paramount importance. All employees are expected to follow safety guidelines and report any hazards or injuries immediately to their supervisor or the HR department.\n\n6. Performance Reviews\nPerform

# Agent Chatbot

Create an agent chatbot with memory based using the openAI model. An API key is needed for this.

I will be using the gpt-3.5-turbo model and the retriver that was created earlier.

The chatbot can reason based on the user's query and decide what action to take before it gives an answer. The available actions are:
- retrieve information from database when the user asks a personal question or a HR-related question
- search the internet using the tavily API for general questions
- the model can also decide to answer based on its own knowledge or the chat history

In [12]:
chatbot = AgentChat(retriever=retriever)

# the tools available to the agent
chatbot.tools

[TavilySearchResults(),
 Tool(name='retrieve_contract_information', description='Search for information about the HR contract. Use this tool when the user asks about salary, policies, benefits, holiday entitlements, and other employment terms.', args_schema=<class 'langchain.tools.retriever.RetrieverInput'>, func=functools.partial(<function _get_relevant_documents at 0x000002E176471FC0>, retriever=VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002E1780F6230>, search_kwargs={'k': 3}), document_prompt=PromptTemplate(input_variables=['page_content'], template='{page_content}'), document_separator='\n\n'), coroutine=functools.partial(<function _aget_relevant_documents at 0x000002E1764720E0>, retriever=VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002E1780F6230>, search_kwargs={'k': 3}), document_prompt=PromptTemplate(in

# The prompt:

The model gets information about its role and what tools to use in each context. We also try to tell it to reveal the source of its answers and tell it to not make up answers.

In [13]:
# the prompt
chatbot.prompt.messages

[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a helpful assistant who answers questions from the user.\n                    If the user asks questions about personal information, try to retrieve the answer from their contract or the chat history.\n                    If the user asks general questions, use the search tool.\n                    Tell the user the source of your answer.\n                    If you cannot find the right answer, say you do not know.')),
 MessagesPlaceholder(variable_name='chat_history', optional=True),
 HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='{input}')),
 MessagesPlaceholder(variable_name='agent_scratchpad')]

# Start chatting!

Since we are using an agent, we can also see the reasoning process behind an answer and just the final response.
In this example, the agent answers based on its knowledge without using any tools.

In [14]:
config_invoke = {"configurable": {"session_id": config.session_id}, "callbacks": [tracer]}

chatbot.get_response("Hello! What can you do for me?", config_invoke=config_invoke)['output']



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mHello! I can help you with a variety of tasks. Here are some things I can assist you with:
1. Provide information from your contract regarding salary, policies, benefits, and more.
2. Answer questions about general topics using a search tool.
3. Assist with finding specific information or resources.
4. Help with any queries or tasks you have in mind. Just let me know how I can assist you today![0m

[1m> Finished chain.[0m


'Hello! I can help you with a variety of tasks. Here are some things I can assist you with:\n1. Provide information from your contract regarding salary, policies, benefits, and more.\n2. Answer questions about general topics using a search tool.\n3. Assist with finding specific information or resources.\n4. Help with any queries or tasks you have in mind. Just let me know how I can assist you today!'

Here the model uses the retrieve tool and answers based on the contract.

In [16]:
chatbot.get_response("I want to know what my benefits are", config_invoke=config_invoke)['output']



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `retrieve_contract_information` with `{'query': 'benefits'}`


[0m[33;1m[1;3mOther Benefits: Gym membership reimbursement up to $500 annually, childcare assistance
up to $1000 annually.
Page 1

Base Salary: $75,000 annually, payable in monthly installments
Bonus: Eligible for an annual performance-based bonus of up to 10% of base salary
Stock Options: Eligible after one year of service, details to be provided upon eligibility.
4. Benefits
Health Insurance: Comprehensive health, dental, and vision plan, effective after 30 days of
employment.
Retirement Plan: 401(k) plan with company matching up to 5% of salary.

5. Health and Safety
The health and safety of our employees are of paramount importance. All employees are expected to follow safety guidelines and report any hazards or injuries immediately to their supervisor or the HR department.

6. Performance Reviews
Performance reviews are conducted annually to ass

'Your benefits include:\n- Comprehensive health, dental, and vision insurance plan effective after 30 days of employment\n- 401(k) retirement plan with company matching up to 5% of salary\n- Gym membership reimbursement up to $500 annually\n- Childcare assistance up to $1000 annually\n\nIf you have any more specific questions about your benefits or need further details, feel free to ask!'

In [17]:
chatbot.get_response("Where is this information located?", config_invoke=config_invoke)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `retrieve_contract_information` with `{'query': 'benefits'}`


[0m[33;1m[1;3mOther Benefits: Gym membership reimbursement up to $500 annually, childcare assistance
up to $1000 annually.
Page 1

Base Salary: $75,000 annually, payable in monthly installments
Bonus: Eligible for an annual performance-based bonus of up to 10% of base salary
Stock Options: Eligible after one year of service, details to be provided upon eligibility.
4. Benefits
Health Insurance: Comprehensive health, dental, and vision plan, effective after 30 days of
employment.
Retirement Plan: 401(k) plan with company matching up to 5% of salary.

5. Health and Safety
The health and safety of our employees are of paramount importance. All employees are expected to follow safety guidelines and report any hazards or injuries immediately to their supervisor or the HR department.

6. Performance Reviews
Performance reviews are conducted annually to ass

{'input': 'Where is this information located?',
 'chat_history': [HumanMessage(content='Hello! What can you do for me?'),
  AIMessage(content='Hello! I can help you with a variety of tasks. You can ask me questions, seek information, or request assistance on different topics. Feel free to ask anything you need help with!'),
  HumanMessage(content='I want to know what my benefits are'),
  AIMessage(content='Your benefits include:\n- Comprehensive health, dental, and vision insurance plan effective after 30 days of employment\n- 401(k) retirement plan with company matching up to 5% of salary\n- Gym membership reimbursement up to $500 annually\n- Childcare assistance up to $1000 annually\n\nIf you have any more specific questions about your benefits or need further details, feel free to ask!')],
 'output': 'The information about your benefits, including health insurance, retirement plan, gym membership reimbursement, and childcare assistance, can be found in your contract under the "Benef

We can ask the model for recommendations and it will use the search tool to browse the internte and provide an answer.

In [45]:
chatbot.get_response("A gym membership sounds nice! What are some popular gyms in Amsterdam?", config_invoke=config_invoke)['output']



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `tavily_search_results_json` with `{'query': 'popular gyms in Amsterdam'}`


[0m[36;1m[1;3m[{'url': 'https://bestgymsnearyou.com/amsterdam/', 'content': 'Rocycle City Center at Nieuwe Passeerdersstraat 12 Vélo at Albert Cuypstraat 35 City Alps at Polderweg 190 You can get a great sweat in at those indoor cycling studios plus many of the best Amsterdam gyms have group spinning classes like Clubsportive, Splash Healthclub, and Sportcity. Boxing & MMA Gyms Near You'}, {'url': 'https://www.wander-lust.nl/amazing-gyms-in-amsterdam/', 'content': 'Trainmore - The luxurious one Looking for luxury? Gustav Gym is the place to be. This premium gym offers you the best fitness devices of Technogym. But what we really love is the tranquil vibe and personal attention. The coaches are kind, motivating and make you feel comfortable in the training.'}, {'url': 'https://www.yelp.com/search?cflt=gyms&find_loc=Amsterdam,+Noord-Holla

'Some popular gyms in Amsterdam include:\n\n1. Rocycle City Center at Nieuwe Passeerdersstraat 12\n2. Vélo at Albert Cuypstraat 35\n3. City Alps at Polderweg 190\n4. Trainmore - The luxurious one\n5. Gustav Gym - Known for luxury and premium fitness devices\n\nThese gyms offer a variety of facilities and classes to help you stay fit and healthy. You can explore these options further to find the one that best suits your preferences. If you need more information or have any other questions, feel free to ask!'

In [18]:
chatbot.get_response("How are performance reviews conducted?", config_invoke=config_invoke)['output']



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `retrieve_contract_information` with `{'query': 'performance reviews'}`


[0m[33;1m[1;3m5. Health and Safety
The health and safety of our employees are of paramount importance. All employees are expected to follow safety guidelines and report any hazards or injuries immediately to their supervisor or the HR department.

6. Performance Reviews
Performance reviews are conducted annually to assess employee progress, set goals, and discuss potential for growth and development within the company.

8. Disciplinary Actions

9. Amendments to the Policy
Tech Innovations Inc. reserves the right to amend these policies at any time. Employees will be notified of any changes through official communications.

Base Salary: $75,000 annually, payable in monthly installments
Bonus: Eligible for an annual performance-based bonus of up to 10% of base salary
Stock Options: Eligible after one year of service, details to be provided u

'Performance reviews at your company are conducted annually to assess employee progress, set goals, and discuss potential for growth and development within the company. If you have any specific questions about the performance review process or need more details, feel free to ask!'