## Expert Knowledge Worker

### A question answering agent that is an expert knowledge worker
### To be used by employees of Insurellm, an Insurance Tech company
### The agent needs to be accurate and the solution should be low cost.

This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy.

In [1]:
# imports

import os
import glob
from dotenv import load_dotenv
import gradio as gr

In [2]:
# imports for langchain

from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter

In [3]:
# price is a factor for our company, so we're going to use a low cost model

MODEL = "gpt-4o-mini"
db_name = "vector_db"

In [4]:
# Load environment variables in a file called .env

load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')

In [5]:
# Read in documents using LangChain's loaders
# Take everything in all the sub-folders of our knowledgebase

folders = glob.glob("knowledge-base/*")

# With thanks to CG and Jon R, students on the course, for this fix needed for some users 
text_loader_kwargs = {'encoding': 'utf-8'}
# If that doesn't work, some Windows users might need to uncomment the next line instead
# text_loader_kwargs={'autodetect_encoding': True}

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

In [6]:
len(documents)

32

In [7]:
documents[24]

Document(metadata={'source': 'knowledge-base/employees/Avery Lancaster.md', 'doc_type': 'employees'}, page_content="# Avery Lancaster\n\n## Summary\n- **Date of Birth**: March 15, 1985  \n- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  \n- **Location**: San Francisco, California  \n\n## Insurellm Career Progression\n- **2015 - Present**: Co-Founder & CEO  \n  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  \n\n- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  \n  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.  \n\n- **2010 - 2013**: Business Analyst at Edge Analytics  

In [8]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

Created a chunk of size 1088, which is longer than the specified 1000


In [9]:
len(chunks)

127

In [10]:
chunks[6]

Document(metadata={'source': 'knowledge-base/contracts/Contract with Roadway Insurance Inc. for Carllm.md', 'doc_type': 'contracts'}, page_content="---\n\n## Renewal\n\n1. **Automatic Renewal**: This agreement will automatically renew for an additional 12-month term unless either party provides written notice of non-renewal at least 30 days before the expiration date.\n2. **Price Adjustments**: Subscription fees may be adjusted for the renewal term in accordance with market conditions and the company's pricing policies, with 60 days' prior notice provided to Roadway Insurance Inc.\n\n---\n\n## Features")

In [11]:
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types found: {', '.join(doc_types)}")

Document types found: contracts, company, employees, products


In [15]:
for chunk in chunks:
    if 'Onur' in chunk.page_content:
        print(chunk)
        print("_________")

page_content='# Onur Tosun – Yapay Zekâ Mühendisi

## Genel Bilgiler  
- **Doğum Tarihi:** 1 Ocak 1997  
- **Pozisyon:** Yapay Zekâ Mühendisi  
- **Konum:** İstanbul, Türkiye  

## Insurellm Kariyer Yolculuğu  

### **2024-Günümüz: Yapay Zekâ Mühendisi**  
Emily, stereo kamera sistemleri üzerinde çalışarak derinlik haritaları ve 3D yüz modelleme süreçlerini yönetmektedir. Nokta bulutu verilerini kullanarak gerçek dünya uygulamalarında kullanılabilir modeller geliştirip optimize etmektedir. LangChain ile RAG tabanlı bilgi alma sistemleri ve çoklu ajan iş akışları tasarlamaktadır.  

#### **Başarılar:**  
- 2022 yılında yıllık satış hedefini %30 oranında aşarak önemli bir katkı sağladı.  
- Yarım yıl içinde 15 yeni kurumsal müşteri kazandırarak müşteri portföyünün genişlemesine yardımcı oldu.' metadata={'source': 'knowledge-base/employees/Onur Tosun.md', 'doc_type': 'employees'}
_________
