## Expert Knowledge Worker

### A question answering agent that is an expert knowledge worker  To be used by employees of Insurellm, an Insurance Tech company The agent needs to be accurate and the solution should be low cost.

This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy.

In [1]:
# imports
import os
import glob
from dotenv import load_dotenv
import gradio as gr

In [2]:
# imports for langchain
from langchain.document_loaders import DirectoryLoader, TextLoader  # directory loader which load the inter folder, and text loader to laod text indidvivual file  
from langchain.text_splitter import CharacterTextSplitter  # these allow to divide thet etxt into chunk of characcter

In [3]:
# price is a factor for our company, so we're going to use a low cost model
MODEL = "gpt-4o-mini"
db_name = "vector_db"

In [4]:
# Load environment variables in a file called .env
load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')

In [12]:
# Read in documents using LangChain's loaders
# Take everything in all the sub-folders of our knowledgebase

folders = glob.glob("knowledge-base/*")
text_loader_kwargs = {'encoding': 'utf-8'}
# If that doesn't work, some Windows users might need to uncomment the next line instead
# text_loader_kwargs={'autodetect_encoding': True}

documents = []
for folder in folders:
   doc_type = os.path.basename(folder)
   # we load our directory  
   loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
   folder_docs = loader.load()
   for doc in folder_docs:    
        # we set the metadata
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

In [13]:
len(documents)

31

In [16]:
documents[23]

Document(metadata={'source': 'knowledge-base\\employees\\Maxine Thompson.md', 'doc_type': 'employees'}, page_content="# HR Record\n\n# Maxine Thompson\n\n## Summary\n- **Date of Birth:** January 15, 1991  \n- **Job Title:** Data Engineer  \n- **Location:** Austin, Texas  \n\n## Insurellm Career Progression\n- **January 2017 - October 2018**: **Junior Data Engineer**  \n  * Maxine joined Insurellm as a Junior Data Engineer, focusing primarily on ETL processes and data integration tasks. She quickly learned Insurellm's data architecture, collaborating with other team members to streamline data workflows.  \n- **November 2018 - December 2020**: **Data Engineer**  \n  * In her new role, Maxine expanded her responsibilities to include designing comprehensive data models and improving data quality measures. Though she excelled in technical skills, communication issues with non-technical teams led to some project delays.  \n- **January 2021 - Present**: **Senior Data Engineer**  \n  * Maxine 

In [17]:
# text spliter  si going to take the document and split into a chunk of character
# chunck size they are said what it does 
# chuck overlap means that we want to have some overlap between to have some text in common between the the chuncks 
# this is to avoid to not inlclude some chunck  thata re euivalen important 
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# here perform the text spliter
chunks = text_splitter.split_documents(documents)

Created a chunk of size 1088, which is longer than the specified 1000


In [18]:
# number of chuncks 
len(chunks)

123

In [20]:
chunks[3]

Document(metadata={'source': 'knowledge-base\\contracts\\Contract with Apex Reinsurance for Rellm.md', 'doc_type': 'contracts'}, page_content='# Contract with Apex Reinsurance for Rellm: AI-Powered Enterprise Reinsurance Solution\n\n## Terms\n\n1. **Parties Involved**: This contract (“Agreement”) is entered into between Insurellm, Inc. (“Provider”) and Apex Reinsurance (“Client”) on this [Date].\n\n2. **Scope of Services**: Provider agrees to deliver the Rellm solution, which includes AI-driven analytics, seamless integrations, risk assessment modules, customizable dashboards, regulatory compliance tools, and client and broker portals as described in the product summary.\n\n3. **Payment Terms**: Client shall pay the Provider the sum of $10,000 per month for the duration of this agreement. Payments are due on the first day of each month and will be processed via electronic funds transfer.\n\n4. **Contract Duration**: This Agreement shall commence on [Start Date] and shall remain in effe

In [21]:
# what numb  doc type 
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types found: {', '.join(doc_types)}")

Document types found: products, employees, company, contracts


In [22]:
# In the example below, we search for chunks containing the word "CEO".
# This highlights a major limitation of text-based search—it misses important context.
# For instance, relevant information might be referred to with synonyms or other terms (e.g., "Lancaster" or "Avery").
# A smarter approach, like vector search, understands the meaning behind queries, ensuring better results.
# that where the rag come solve this is with vector emebedding 


for chunk in chunks:
    if 'CEO' in chunk.page_content:
        print(chunk)
        print("_________")


page_content='3. **Regular Updates:** Insurellm will offer ongoing updates and enhancements to the Homellm platform, including new features and security improvements.

4. **Feedback Implementation:** Insurellm will actively solicit feedback from GreenValley Insurance to ensure Homellm continues to meet their evolving needs.

---

**Signatures:**

_________________________________  
**[Name]**  
**Title**: CEO  
**Insurellm, Inc.**

_________________________________  
**[Name]**  
**Title**: COO  
**GreenValley Insurance, LLC**  

---

This agreement represents the complete understanding of both parties regarding the use of the Homellm product and supersedes any prior agreements or communications.' metadata={'source': 'knowledge-base\\contracts\\Contract with GreenValley Insurance for Homellm.md', 'doc_type': 'contracts'}
_________
page_content='## Support

1. **Customer Support**: Velocity Auto Solutions will have access to Insurellm’s customer support team via email or chatbot, availa