# Use case for using RAG - Creating Knowledge Base - II - Creating vectors

### A question answering agent that is an expert knowledge worker
### To be used by employees of Insurellm, an Insurance Tech company
### The agent needs to be accurate and the solution should be low cost.

This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy.

In [1]:
import os 
from dotenv import load_dotenv 
import glob 
import gradio as gr 
from openai import OpenAI

In [2]:
# including langchain imports 
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import DirectoryLoader, TextLoader          # Directory loader will load whole directory and text loader will load whole document
from langchain_text_splitters import CharacterTextSplitter                   # splitting the content in chunks so that there is some meaningful context 

In [3]:
# some more imports 
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
import numpy as np
from sklearn.manifold import TSNE
import plotly.graph_objects as go

# lets add one more embedding model from local mxbai to compare the output
from langchain_ollama import OllamaEmbeddings, ChatOllama

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

In [4]:
# create 2 llm model for gradio screen 
OLLAMA_MODEL="llama3.1"
OPENAI_MODEL="gpt-4o-mini"

db_name_ollama="vector_db_ollama_mxbai"
db_name_openai="vector_db_openai_embed"

In [5]:
load_dotenv(override=True)

True

In [6]:
# create 2 instances one frontier model and another is local model
api_key=os.getenv("OPENAI_API_KEY")
openai=OpenAI()
ollama=OpenAI(base_url=os.getenv("OLLAMA_BASE_URL"), api_key=os.getenv("OLLAMA_API_KEY"))

### 1. Now grab documents and load them to Langchain Loaders 

In [7]:
context={} 

# grab the documents in knowledge-base all folders 
folders=glob.glob("knowledge-base/*")

text_loader_kwargs={"encoding": "utf-8"}

documents=[] 
for folder in folders: 

    # grab the name of the file in sub-folder name e.g. products, employees etc
    doc_type=os.path.basename(folder)

    # load the files from the directory 
    loader=DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs=loader.load()

    # for each folder document loaded add a metadata tag
    for doc in folder_docs: 
        doc.metadata["doc_type"]=doc_type
        documents.append(doc)

In [8]:
len(documents)

31

### Split the documents to manageable chunks 

if chunk_size=1000 is provided; langchain will not cut the characters at 1000; it will try to create meaningful chunks near to 1000.   
Also each chunk will have some overlap to logically connect the documents

In [9]:
text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks=text_splitter.split_documents(documents=documents)

Created a chunk of size 1088, which is longer than the specified 1000


In [10]:
len(chunks)

123

## A sidenote on Embeddings, and "Auto-Encoding LLMs"

We will be mapping each chunk of text into a Vector that represents the meaning of the text, known as an embedding.

OpenAI offers a model to do this, which we will use by calling their API with some LangChain code.

This model is an example of an "Auto-Encoding LLM" which generates an output given a complete input.
It's different to all the other LLMs we've discussed today, which are known as "Auto-Regressive LLMs", and generate future tokens based only on past context.

Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification.

### Sidenote

In week 8 we will return to RAG and vector embeddings, and we will use an open-source vector encoder so that the data never leaves our computer - that's an important consideration when building enterprise systems and the data needs to remain internal.

In [11]:
# how many document types we have 
doc_types=set(chunk.metadata["doc_type"] for chunk in chunks)
print(doc_types)

{'employees', 'contracts', 'company', 'products'}


### Using OpenAIEmbeddings

In [12]:
openai_embeddings=OpenAIEmbeddings()

# If you would rather use the free Vector Embeddings from HuggingFace sentence-transformers
# Then replace embeddings = OpenAIEmbeddings()
# with:
# from langchain.embeddings import HuggingFaceEmbeddings
# embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [13]:
# delete the datastore if exists already
if os.path.exists(db_name_openai):
    Chroma(persist_directory=db_name_openai, embedding_function=openai_embeddings).delete_collection()

In [14]:
# Create our Chroma vectorstore!
vectorstore_openai = Chroma.from_documents(documents=chunks, embedding=openai_embeddings, persist_directory=db_name_openai)
print(f"Vectorstore created with {vectorstore_openai._collection.count()} documents")

Vectorstore created with 123 documents


In [15]:
# get the collection name 
collection = vectorstore_openai._collection

# get one document from the vector store 
sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]

# get the dimension of document retrieved 
dimensions = len(sample_embedding)
print(f"The vectors have {dimensions:,} dimensions")

The vectors have 1,536 dimensions


In [16]:
sample_embedding.shape

(1536,)

In [17]:
# get first 10 values 
sample_embedding[:10]

array([-0.01149175, -0.01443085, -0.00741044, -0.00707613, -0.01926435,
        0.02113088, -0.02888955, -0.00294432, -0.02376354, -0.02328994])

### Visualize the vector store for Open AI 

In [18]:
# Prework
# get all documents from the vector store 
result = collection.get(include=['embeddings', 'documents', 'metadatas'])

# get vector embeddings in numpy array
vectors = np.array(result['embeddings'])

# retrieve documents from the result set 
documents = result['documents']

# get the document type from metadata 
doc_types = [metadata['doc_type'] for metadata in result['metadatas']]

# set color for each type; blue for products, green for employees and so on 
colors = [['blue', 'green', 'red', 'orange'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]

In [19]:
colors[:10]

['blue',
 'blue',
 'blue',
 'blue',
 'blue',
 'blue',
 'blue',
 'blue',
 'blue',
 'blue']

In [20]:
# Visualize the data in plotly 2D using projection technique 
tsne = TSNE(n_components=2, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 2D scatter plot
fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x',yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [21]:
# Visualize this in 3D 
tsne = TSNE(n_components=3, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='3D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

# Let's try with local model 

In [22]:
ollama_embeddings=OllamaEmbeddings(model="mxbai-embed-large")

In [23]:
if os.path.exists(db_name_ollama):
    Chroma(persist_directory=db_name_ollama, embedding_function=ollama_embeddings).delete_collection()

In [24]:
vectorstore_ollama = Chroma.from_documents(documents=chunks, embedding=ollama_embeddings, persist_directory=db_name_ollama)
print(f"Vectorstore created with {vectorstore_ollama._collection.count()} documents")

Vectorstore created with 123 documents


In [26]:
type(vectorstore_ollama)

langchain_chroma.vectorstores.Chroma

In [27]:
# get the collection name 
collection = vectorstore_ollama._collection

# get one document from the vector store 
sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]

# get the dimension of document retrieved 
dimensions = len(sample_embedding)
print(f"The vectors have {dimensions:,} dimensions")

The vectors have 1,024 dimensions


In [28]:
collection

Collection(name=langchain)

In [29]:
sample_embedding.shape

(1024,)

In [30]:
# get first 10 values 
sample_embedding[:10]

array([ 0.02466233,  0.05259123, -0.02803837,  0.01043415, -0.00500993,
        0.00884979, -0.00981181, -0.03921529, -0.00725373,  0.00947065])

In [31]:
# Prework
# get all documents from the vector store 
result = collection.get(include=['embeddings', 'documents', 'metadatas'])

# get vector embeddings in numpy array
vectors = np.array(result['embeddings'])

# retrieve documents from the result set 
documents = result['documents']

# get the document type from metadata 
doc_types = [metadata['doc_type'] for metadata in result['metadatas']]

# set color for each type; blue for products, green for employees and so on 
colors = [['blue', 'green', 'red', 'orange'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]

In [32]:
# Visualize the data in plotly 2D using projection technique 
tsne = TSNE(n_components=2, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 2D scatter plot
fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D Chroma Vector Store Visualization - Local Model',
    scene=dict(xaxis_title='x',yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [33]:
# Visualize this in 3D 
tsne = TSNE(n_components=3, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='3D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">PLEASE READ ME! Ignoring the Deprecation Warning</h2>
            <span style="color:#900;">When you run the next cell, you will get a LangChainDeprecationWarning 
            about the simple way we use LangChain memory. They ask us to migrate to their new approach for memory. 
            I feel quite conflicted about this. The new approach involves moving to LangGraph and getting deep into their ecosystem.
            There's a fair amount of learning and coding in LangGraph, frankly without much benefit in our case.<br/><br/>
            I'm going to think about whether/how to incorporate it in the course, but for now please ignore the Depreciation Warning and
            use the code as is; LangChain are not expected to remove ConversationBufferMemory any time soon.
            </span>
        </td>
    </tr>
</table>

# Using these vectors efficiently in our prompt

First run this in a cell: `!pip install langchain-ollama`

Then replace `llm = ChatOpenAI(temperature=0.7, model_name=MODEL)` with:

```python
from langchain_ollama import ChatOllama
llm = ChatOllama(temperature=0.7, model="llama3.2")
```

In [36]:
# create new chat with ollama 
llm=ChatOllama(temperature=0.7, model=OLLAMA_MODEL)

# set up conversation memory for the chat 
memory=ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# set up retriever abstraction over the vector store that will be used during the RAG 
# retriever=vectorstore_ollama.as_retriever()
retriever=vectorstore_openai.as_retriever()

# puttint together 
conversation_chain=ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [35]:
query = "Can you describe Insurellm in a few sentences"
result = conversation_chain.invoke({"question":query})
print(result["answer"])

Insurellm is an insurance technology firm founded in 2015 by Avery Lancaster. The company offers four software products, including Carllm, Homellm, Rellm, and Marketllm, which cater to various aspects of the insurance industry, such as auto, home, reinsurance, and consumer connections. With over 300 clients worldwide and 200 employees across the US, Insurellm aims to provide innovative solutions for insurance providers through its digital platforms.


In [37]:
query = "Can you describe Insurellm in a few sentences"
result_openai = conversation_chain.invoke({"question":query})
print(result_openai["answer"])

Insurellm is an innovative insurance technology firm founded by Avery Lancaster in 2015. It offers four software products: Carllm, Homellm, Rellm, and Markellm (previously mentioned as Marketllm), serving over 300 clients worldwide across various insurance sectors. The company has grown to have 200 employees and 12 offices across the US.


Looks similar! So it is safe to assume vector representation of data from open ai works well for llama as well!!! 

### Building Gradio 

In [38]:

def chat(message, history):
    result = conversation_chain.invoke({"question": message})
    return result["answer"]

In [39]:

view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.
