# Use case for using RAG - Creating Knowledge Base

### A question answering agent that is an expert knowledge worker
### To be used by employees of Insurellm, an Insurance Tech company
### The agent needs to be accurate and the solution should be low cost.

This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy.

In [1]:
import os 
from dotenv import load_dotenv 
import glob 
import gradio as gr 
from openai import OpenAI

In [2]:
# including langchain imports 
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import DirectoryLoader, TextLoader          # Directory loader will load whole directory and text loader will load whole document
from langchain_text_splitters import CharacterTextSplitter                   # splitting the content in chunks so that there is some meaningful context 

In [3]:
MODEL="llama3.1"

In [4]:
load_dotenv(override=True)

True

In [5]:
api_key=os.getenv("OPENAI_API_KEY")
openai=OpenAI(base_url=os.getenv("OLLAMA_BASE_URL"), api_key=os.getenv("OLLAMA_API_KEY"))

### 1. Now grab documents and load them to Langchain Loaders 

In [None]:
context={} 

# grab the documents in knowledge-base all folders 
folders=glob.glob("knowledge-base/*")

text_loader_kwargs={"encoding": "utf-8"}

documents=[] 
for folder in folders: 

    # grab the name of the file in sub-folder name e.g. products, employees etc
    doc_type=os.path.basename(folder)

    # load the files from the directory 
    loader=DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs=loader.load()

    # for each folder document loaded add a metadata tag
    for doc in folder_docs: 
        doc.metadata["doc_type"]=doc_type
        documents.append(doc)

In [8]:
len(documents)

31

In [12]:
print(documents[0].metadata)

{'source': 'knowledge-base/products/Rellm.md', 'doc_type': 'products'}


In [11]:
print(documents[0])

page_content='# Product Summary

# Rellm: AI-Powered Enterprise Reinsurance Solution

## Summary

Rellm is an innovative enterprise reinsurance product developed by Insurellm, designed to transform the way reinsurance companies operate. Harnessing the power of artificial intelligence, Rellm offers an advanced platform that redefines risk management, enhances decision-making processes, and optimizes operational efficiencies within the reinsurance industry. With seamless integrations and robust analytics, Rellm enables insurers to proactively manage their portfolios and respond to market dynamics with agility.

## Features

### AI-Driven Analytics
Rellm utilizes cutting-edge AI algorithms to provide predictive insights into risk exposures, enabling users to forecast trends and make informed decisions. Its real-time data analysis empowers reinsurance professionals with actionable intelligence.

### Seamless Integrations
Rellm's architecture is designed for effortless integration with exis

### Split the documents to manageable chunks 

if chunk_size=1000 is provided; langchain will not cut the characters at 1000; it will try to create meaningful chunks near to 1000.   
Also each chunk will have some overlap to logically connect the documents

In [13]:
text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks=text_splitter.split_documents(documents=documents)

Created a chunk of size 1088, which is longer than the specified 1000


In [18]:
print(chunks[3])

page_content='Join the growing number of organizations leveraging Rellm to enhance their reinsurance processes while driving profitability and compliance. 

## 2025-2026 Roadmap

At Insurellm, we are committed to the continuous improvement of Rellm. Our roadmap for 2025-2026 includes:

- **Q3 2025**: 
  - Launch of the Rellm Mobile App for on-the-go insights and management.
  - Introduction of augmented reality (AR) features for interactive risk assessments.

- **Q1 2026**: 
  - Deployment of advanced machine learning models for even more accurate risk predictions.
  - Expansion of integration capabilities to support emerging technologies in the insurance sector.

- **Q3 2026**: 
  - Release of a community platform for Rellm users to exchange insights, tips, and best practices.
  - Launch of Rellm 2.0, featuring enhanced user interface and premium features based on user feedback.' metadata={'source': 'knowledge-base/products/Rellm.md', 'doc_type': 'products'}


In [17]:
print(chunks[4])

page_content='Experience the future of reinsurance with Rellm, where innovation meets reliability. Let Insurellm help you navigate the complexities of the reinsurance market smarter and faster.' metadata={'source': 'knowledge-base/products/Rellm.md', 'doc_type': 'products'}


In [19]:
len(chunks)

123

### Validating if all documents loaded

In [20]:
# how many document types we have 
doc_types=set(chunk.metadata["doc_type"] for chunk in chunks)
print(doc_types)

{'employees', 'contracts', 'company', 'products'}


In [21]:
# lets check which page has content "CEO"
for chunk in chunks:
    if "CEO" in chunk.page_content:
        print(chunk)

page_content='## Support

1. **Customer Support**: Velocity Auto Solutions will have access to Insurellm’s customer support team via email or chatbot, available 24/7.  
2. **Technical Maintenance**: Regular maintenance and updates to the Carllm platform will be conducted by Insurellm, with any downtime communicated in advance.  
3. **Training & Resources**: Initial training sessions will be provided for Velocity Auto Solutions’ staff to ensure effective use of the Carllm suite. Regular resources and documentation will be made available online.

---

**Accepted and Agreed:**  
**For Velocity Auto Solutions**  
Signature: _____________________  
Name: John Doe  
Title: CEO  
Date: _____________________  

**For Insurellm**  
Signature: _____________________  
Name: Jane Smith  
Title: VP of Sales  
Date: _____________________' metadata={'source': 'knowledge-base/contracts/Contract with Velocity Auto Solutions for Carllm.md', 'doc_type': 'contracts'}
page_content='3. **Regular Updates:** 