# Unleashing the Potential of Vector Databases: Amplifying RAG!

Ever pondered on how to elevate the capabilities of your Retrieval Augmented Generation (RAG) with the prowess of a Vector Database? Well, you're in for a treat! We're about to embark on a journey to establish a VectorDB using Qdrant (https://cloud.qdrant.io/login), and the best part is that they provide 1GB of free usage.

Here's the roadmap:

1. Start by signing up for an account on Qdrant.
2. Navigate to the "Clusters" section and create one. Ensure that you remain within the 1GB limit to enjoy the benefits of the free tier.
3. Upon cluster creation, you'll receive a message indicating "no API keys created for this cluster." Click on "API keys" to commence their generation.
4. In the API keys section, you'll discover the API key itself and a collection of code samples in various programming languages for database connectivity. For our purposes, we'll utilize the Python code for connection.

By following this path, you'll harness the capabilities of a Vector Database, supercharging your RAG endeavors and opening the door to a world of enhanced responses and information retrieval.

**I have had to decrease the amount of text sent to ADA in order to account for the limits imposed on free users**

In [None]:
%pip install openai
%pip install qdrant-client
%pip install gptrim
%pip install tiktoken
%pip install langchain
%pip install gptrim

In [None]:
## import our modules

from qdrant_client import QdrantClient
from qdrant_client.http import models
from qdrant_client.http.models import PointStruct
import os
from openai import OpenAI
import csv
import tiktoken
from langchain.text_splitter import RecursiveCharacterTextSplitter
import ast
from gptrim import trim
import json


# Load the vectordb with your url and apikey
qdrant_client = QdrantClient(
    url='', 
    api_key='',
)


# Set the client and api key

llm_client = OpenAI(
    api_key=''
)
# Let's set our model
MODEL = "gpt-3.5-turbo"


The process outlined above loads our vector database into memory, making it ready for subsequent queries and imports. Now, let's tackle the task of importing a substantial text, like Homer's Odyssey, into a vector database while addressing some key challenges:

## Challenge 1 - Document Vectorization

### Document Vectorization Demystified
Document vectorization involves the conversion of textual documents into numerical vectors. These vectors typically encapsulate the semantic meaning and content of the documents. Various techniques exist for document vectorization, ranging from traditional Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) to modern approaches like Word2Vec and Doc2Vec embeddings.

### Significance of Document Vectorization Before Vector Database Import

- **Semantic Search:** Vector databases are tailored for similarity searches within high-dimensional vector spaces. Vectorizing documents equips them for semantic content comparison, enabling the prowess of semantic search.
- **Efficiency:** Text data is inherently unstructured and can vary significantly in length. By transforming text into fixed-length vectors, it becomes more manageable, storable, and searchable.
- **Comparability:** Representing documents as vectors allows for the measurement of distance or similarity between them using metrics like cosine similarity. This is indispensable for tasks such as document clustering, recommendation systems, and classification.
- **Dimensionality Reduction:** Techniques like TF-IDF or embeddings often reduce the dimensionality of the data by focusing on the most informative features. This results in more efficient storage and faster search.
- **Machine Learning Compatibility:** Numerous machine learning algorithms demand numerical input. Vectorized documents can be directly fed into these algorithms, facilitating tasks such as topic modeling, sentiment analysis, and more.

By addressing these considerations, we not only enhance the utility of our vector database but also open up possibilities for advanced search, retrieval, and machine learning applications.

While QDRANT offers a straightforward CPU-based method for document vectorization, I will illustrate an approach to vectorize documents using OpenAI's ADA Language Model, keeping in mind that you may be developing your own vectorization methods using APIs or GPUs. Relying solely on the CPU, especially in the present context, might not be sufficient. This is because you'll need to vectorize both the documents and the user-submitted queries, which can be time-consuming. In contrast, you'd likely prefer instantaneous responses from your chatbot rather than waiting for extended processing times.

In [None]:
# let us load the book (three books from the odyssey)

with open('./books/theodyssey.txt', 'r', encoding='UTF-8') as file:
    book = file.read()



## Challenge 2 - Addressing Text Length Limitations

Hold on a moment, this is quite an extensive amount of text! It's apparent that we won't be inputting all of this in a single prompt.

As we embark on the journey of document vectorization, it's imperative to grasp that whatever the Vector Database (DB) returns, whether condensed or not, will be combined with your question and any other historical prompts if you're maintaining a memory. Consequently, if you opt to house entire documents or books within a VectorDB, you're bound to encounter the constraints of maximum token limits imposed by your API. This, in turn, may lead to escalating costs, even for straightforward inquiries.

So, how do we surmount this challenge? The solution lies in segmenting our documents into smaller text snippets before commencing the vectorization process.

Let's press on...

In [None]:
# How many tokens would this book require? Let us use tiktoken to find out...

encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
numtokens = len(encoding.encode(book))

gpt3cost = round((numtokens / 1000) * 0.0015, 2)
gpt316kcost = round((numtokens / 1000) * 0.003, 2)
gpt4cost = round((numtokens / 1000 ) * 0.03, 2)
gpt432kcost = round((numtokens / 1000) * 0.06, 2)

adacost = round((numtokens / 1000) * 0.0001, 10)


print (f'This book without any processing is:\n{numtokens} Tokens using GPT3.5 encoding')


print (f'This will cost a total of:\n${gpt3cost} GPT3.5 \n${gpt316kcost} GPT3.5 16K \n${gpt4cost} GPT4 \n${gpt432kcost} GPT4 32K')


print (f'Vectorising this book using OpenAI ADA will cost: ${adacost}')


It's evident that submitting the entire book as a prompt won't be feasible, given the token limits of GPT-3.5, which is 4000 or 16,000 tokens, and GPT-4, which is 8000 tokens. Therefore, let's explore the approach of breaking down our book into smaller text snippets. We have at our disposal various Langchain classes for this purpose, and for this task, we will make use of the "RecursiveCharacterTextSplitter" class from the Langchain library. This tool excels in segmenting text into smaller chunks based on character limits, making it a valuable asset for dividing lengthy content into manageable portions that can be processed by a language model like GPT-3.

**I have had to increase chunk size in order to fix the issue with the free tier of ADA**

In [None]:

# let us set our text splitter to use tiktoken encoder as a way to split our text
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1200, chunk_overlap=120
)

# create seperate text snippets from the book as a list
split_book = text_splitter.split_text(book)

print (f'Number of chunks : {len(split_book)}') # number of chunks





In [None]:

documents = []
for snippet in split_book:
    item = {
        'source': "Homer's Odyssey",
        'text': snippet
    }

    documents.append(item)


print (documents[1]) # We have document snippets to import now

In [None]:
## Save as CSV for analysis and import later on.

# Define the fields for your CSV
fields = ['source', 'text']

# Name of the output csv file
filename = "documents.csv"

# Writing to csv file
with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
    # Create a csv dict writer object
    writer = csv.DictWriter(csvfile, fieldnames=fields)

    # Write the headers
    writer.writeheader()

    # Write the data
    writer.writerows(documents)

Now that we've compiled our list of documents, composed of text snippets and saved as a CSV file, let's shift our focus to the process of vectorization. We'll be generating text embeddings and preserving the results in a CSV file, ensuring that we don't need to call ADA again.

# Understanding Embeddings
Working with textual data presents a challenge because our computers, scripts, and machine learning models lack the capacity to read and comprehend text in a human sense. To bridge this gap, we employ a technique known as "embedding" to represent data in the form of numerical vectors in a higher-dimensional space. In the context of semantic search, embeddings are crafted using deep learning models, encompassing word embeddings (e.g., Word2Vec, GloVe) for natural language data and image embeddings for visual content.

OpenAI's text embeddings play a pivotal role in gauging the relatedness of text strings. Embeddings find widespread application in various tasks, including:

- Search, where results are sorted based on their relevance to a query string.
- Clustering, where text strings are grouped according to their similarity.
- Recommendations, where items with closely related text strings are suggested.
- Anomaly detection, which identifies outliers with minimal relatedness.
- Diversity measurement, involving the analysis of similarity distributions.
- Classification, where text strings are categorized based on their closest matching label.

An embedding essentially consists of a vector (list) of floating-point numbers. The proximity between two vectors serves as a gauge of their relatedness, with shorter distances indicating higher relatedness and longer distances implying lower relatedness.

In [None]:
# Load and process dataset
input_datapath = "./documents.csv"
output_datapath = "documents_embeddings.csv"

# Prepare a list to store the data
documents = []

with open(input_datapath, 'r', encoding='utf-8') as file:
    reader = csv.DictReader(file)
    for row in reader:
        if row["source"] and row["text"]:  # Checking if both fields are not empty
            combined = "Source: " + row["source"].strip() + "; Text: " + row["text"].strip()
            documents.append({
                "text": combined
            })

The snippet below will start the embedding process with ADA. This will take some time to complete. I have also added a counter to avoid ADA's api limits for the free tier.

In [None]:
import time
# Embedding model parameters
embedding_model = "text-embedding-ada-002"
embedding_encoding = "cl100k_base"
max_tokens = 8000  # the maximum for text-embedding-ada-002 is 8191

# Counter to avoid rate limits with ADA (assuming free tier)
rate_limit_counter_free = 0

# Generate embeddings and add as a seperate column
for doc in documents:
    rate_limit_counter_free += 1
    if rate_limit_counter_free == 3:
        time.sleep(60)
        rate_limit_counter_free = 0
    doc['embedding'] = llm_client.embeddings.create(
        input = doc['text'],
        model = embedding_model
    ).data[0].embedding

# Save results to a CSV file
with open(output_datapath, 'w', encoding='utf-8') as file:
    fieldnames = ["text", "embedding"]
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    for doc in documents:
        writer.writerow(doc)

Examine both the "documents.csv" file and the one containing embeddings. We are now prepared to utilize these embeddings for importing into QDRANT.

This process needs to be meticulously repeated for each individual document or book that you wish to import, as well as for incoming questions, which we'll address shortly. Picture having a nightly task in place to import data from Confluence, conduct the necessary splitting, generate embeddings, and import them into the database. This ensures that your chatbot can furnish appropriate responses when a knowledge base (KB) skill is triggered.

At this juncture, I will proceed to establish a collection in QDRANT, denominated as 'Books,' and delineate the parameters for importing documents. Visualize this as akin to configuring a conventional SQL table with distinct columns.

In [None]:
# Let us create our collection and call it 'Books'
collection_name = 'books'

qdrant_client.recreate_collection(
    collection_name=f"{collection_name}",
    vectors_config=models.VectorParams(
      size=1536,
      distance=models.Distance.COSINE
    )
)

## Let's check out our collection

collection_info = qdrant_client.get_collection(collection_name="books")

print("Collection info:", collection_info)

With your collection now in place, head over to your QDRANT cluster's dashboard (don't forget to use your API key) to view the newly created collection.

Now, let's proceed...

In [None]:
### Import Homer's Odyssey into QDRANT


## We will build a new list using the PointStruct class and use it to import into qdrant
points = []

## We will use a counter to generate ids
i = 1

## Load our saved CSV file...
with open('./documents_embeddings.csv', 'r', encoding='utf-8') as file:
    reader = csv.DictReader(file)
    for row in reader:
        ## Increment the counter
        i += 1

        ## We will need to use ast in order to bring back the list of vectors since it is represented as string in the CSV. You won't have this problem if you load embeddings directly
        ## But i wanted you to have the CSV to take a look at the data.
        embeddings = ast.literal_eval(row['embedding'])

        # Use PointStruct to append to the list above
        points.append(PointStruct(
            id=i,  # Our ID
            vector=embeddings, # Our embeddings 
            payload={"text": row['text']}) # Our payload which will contain the actual text.
            )
        

### LETS IMPORT

operation_info = qdrant_client.upsert(
    collection_name="books",
    wait=True,
    points=points
)

print("Operation info:", operation_info)




Take a moment to breathe. We've successfully set up a vector database with data ready for searching. Return to your database's dashboard and select the collection to explore its contents.

Now, let's delve into QDRANT's query functions. Keep in mind that we still need to vectorize the query for this to work seamlessly.

In [None]:
QUERY = "Why is Neptune still furious with Ulysses?"



response = llm_client.embeddings.create(
        input=QUERY,
        model="text-embedding-ada-002"
    )

embeddings = response.data[0].embedding

print (embeddings)

In [None]:
search_result = qdrant_client.search(
    collection_name="books",
    query_vector=embeddings, 
    limit=5
)

print(search_result)

As you can probably discern, the output consists of the five closest text snippets to the query I submitted. These snippets are now ready to be presented to a Language Model for the generation of a suitable response.

Shall we proceed with encapsulating this in a skill? Let's go for it!

Similar to the internet search skill, we will craft a function tailored for knowledge base search.

In [None]:
def search_kb(query):

    response = llm_client.embeddings.create(
            input=query,
            model="text-embedding-ada-002"
        )

    embeddings = response.data[0].embedding

    search_result = str(qdrant_client.search(
        collection_name="books",
        query_vector=embeddings, 
        limit=3
    )
)
    search_prompt = f"""
        Based on the knowledge base snippets provided in <>, provide an answer to the query [] if it is relevant along with a source. \
        If there are no snippets or if they are not relevant then say \"Please try again.\"\
        
        context:<{trim(search_result)}>
        query:[{query}]
        """
    
    return {"role":"system","content": search_prompt}




## Requires a function 
QUESTION = "Search the knowledge base for why is Neptune still furious with Ulysses?"
## Doesn't require a function
#QUESTION = "How can i make a cup of tea?"


# Let us define the search_internet function for the LLM...
skill_definitions = [
      {
        "name": "search_kb",
        "description": "Searches the knowledge base to provide context to a query if required",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The query to search for"
                }
            },
            "required": ["query"]
        }
    }
]


messages = [
    {"role": "system", "content": "You have tools that add more up to date context to queries should they be required, otherwise act as a helpful assistant."},
    {"role": "user", "content": QUESTION}
    ]

response = llm_client.chat.completions.create(
    model="gpt-3.5-turbo-0613",
    messages=messages,
    functions=skill_definitions, # we present our skill definitions to the LLM. Think of it as a toolbox of functions :)
    function_call='auto' # auto means that we let the LLM decide when to use a function. You can also set a static function name here so that it will always call it.
)
### Let us take the JSON response and load it


### Do we need a skill?
if 'function_call' in response.choices[0].finish_reason:

    # tie response with our functions
    available_functions = {
        'search_kb': search_kb
    }


    # extract function name and arguments from the LLM
    function_name = response.choices[0].message.function_call.name
    function_args = json.loads(response.choices[0].message.function_call.arguments)

    function2call = available_functions.get(function_name)

    # call it appropriately
    function_response = function2call(**function_args)

    # append the response to the message history
    messages.append(function_response)

    # send the message history to the LLM again without functions.
    new_response = llm_client.chat.completions.create(
    model="gpt-3.5-turbo-0613",
    messages=messages,
    #functions=skill_definitions,
    #function_call='auto' 
)
    
    # Result

    print ('\n\n'+str(new_response.choices[0].message.content))
    



else:
    print(response.choices[0].message.content) 

        

There you have it. We've successfully deployed a Vector Database, performed document and question vectorization, and created a Question-Answer (QA) chatbot that responds to triggers effectively :)

# Food for Thought

RAG and VectorDB aren't limited to knowledge base (KB) searches of documents. They can also serve various other purposes:

- Instead of storing numerous prompts for different procedures or relying on Python conditionals to select specific prompts, you can establish a permanent repository for company policies. For instance, in the context of customer service QA scoring, where different procedures are followed based on the initial query, you can configure your function calls to consistently access the "company_policy" function. This way, you can dynamically enhance your prompts in response to the specific inquiry (e.g., retrieve Data Protection Act (DPA) policies, create a system prompt, add to memory, and submit).
- You can employ RAG and VectorDB for the enduring storage and retrieval of user interaction history. Summarize the interactions with the LLM or preserve them as they are and store them in the database. This enables the LLM to recollect past conversations, fostering a deeper understanding and context in future interactions.