We need a system to recognize the company policy for refunds or nonrefund windows. This requires a vector database to be made to store this company policy for our system to better understand our customers inputs at a later time.

In [1]:
import chromadb
from chromadb.utils import embedding_functions
import pandas as pd

We don't have any specific company policy, so we will make some dummy policy based on some basic industry standards for returns and password reset, etc

In [3]:
documents = [
    {
        "id": "doc1",
        "text": "Refund Policy: You can request a full refund within 30 days of purchase. Money is returned to the original payment method within 5-7 business days.",
        "category": "refunds"
    },
    {
        "id": "doc2",
        "text": "Shipping Times: Standard shipping takes 3-5 business days. Express shipping takes 1-2 days. We do not ship on weekends.",
        "category": "shipping"
    },
    {
        "id": "doc3",
        "text": "Password Reset: To reset your password, go to Settings > Security > Change Password. You will need access to your email.",
        "category": "account"
    },
    {
        "id": "doc4",
        "text": "Hours of Operation: Our support team is available Monday to Friday, 9 AM to 5 PM EST. We are closed on public holidays.",
        "category": "general"
    }
]

We need to initalize chromaDB

In [4]:
client = chromadb.PersistentClient(path="./chroma_db_data")

Setup the embedding function

In [6]:
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")

Make a chroma db collection, this should be similar to a table in sql

In [7]:
collection = client.get_or_create_collection(
    name="company_knowledge_base",
    embedding_function=sentence_transformer_ef
)

Add a database

In [8]:
if collection.count() == 0:
    collection.add(
        documents=[doc['text'] for doc in documents],
        metadatas=[{"category": doc['category']} for doc in documents],
        ids = [doc['id'] for doc in documents]
    )
    print("Knowledge base built successfully")

else:
    print("Knowledge base already exists")


Knowledge base built successfully


Make the retrival function

In [9]:
def query_rag_system(user_question, n_results=1):
    results = collection.query(
        query_texts=[user_question],
        n_results=n_results
    )
    
    # Extract the best matching text
    if results['documents']:
        best_match = results['documents'][0][0]
        return best_match
    else:
        return "No relevant policy found."

Test the funcitionally of the system

In [10]:
test_questions = [
    "How long does it take to get my money back?",
    "Are you open on Sundays?"
]

print("\n--- RAG SYSTEM TEST ---")
for q in test_questions:
    answer = query_rag_system(q)
    print(f"\nUser: {q}")
    print(f"Retrieved Policy: {answer}")


--- RAG SYSTEM TEST ---

User: How long does it take to get my money back?
Retrieved Policy: Refund Policy: You can request a full refund within 30 days of purchase. Money is returned to the original payment method within 5-7 business days.

User: Are you open on Sundays?
Retrieved Policy: Hours of Operation: Our support team is available Monday to Friday, 9 AM to 5 PM EST. We are closed on public holidays.
