# Retrieval Augmented Generation (RAG) and Vector Databases

In [1]:
!pip install getenv openai==1.12.0

Collecting getenv
  Downloading getenv-0.2.0-py3-none-any.whl.metadata (1.2 kB)
Collecting openai==1.12.0
  Downloading openai-1.12.0-py3-none-any.whl.metadata (18 kB)
Downloading openai-1.12.0-py3-none-any.whl (226 kB)
   ---------------------------------------- 0.0/226.7 kB ? eta -:--:--
   ------- ------------------------------- 41.0/226.7 kB 991.0 kB/s eta 0:00:01
   ---------------------------------------  225.3/226.7 kB 2.8 MB/s eta 0:00:01
   ---------------------------------------- 226.7/226.7 kB 2.3 MB/s eta 0:00:00
Downloading getenv-0.2.0-py3-none-any.whl (2.6 kB)
Installing collected packages: getenv, openai
  Attempting uninstall: openai
    Found existing installation: openai 1.25.0
    Uninstalling openai-1.25.0:
      Successfully uninstalled openai-1.25.0
Successfully installed getenv-0.2.0 openai-1.12.0


In [1]:
import os
import pandas as pd
import numpy as np
import openai
import dotenv

# import dotenv
dotenv.load_dotenv()

True

## Creating our Knowledge base

Creating a Azure Cosmos DB database


In [3]:
!pip install azure-cosmos

Collecting azure-cosmos
  Downloading azure_cosmos-4.7.0-py3-none-any.whl.metadata (70 kB)
     ---------------------------------------- 0.0/70.3 kB ? eta -:--:--
     ----- ---------------------------------- 10.2/70.3 kB ? eta -:--:--
     --------------------------------- ---- 61.4/70.3 kB 825.8 kB/s eta 0:00:01
     -------------------------------------- 70.3/70.3 kB 769.6 kB/s eta 0:00:00
Collecting azure-core>=1.25.1 (from azure-cosmos)
  Downloading azure_core-1.30.2-py3-none-any.whl.metadata (37 kB)
Downloading azure_cosmos-4.7.0-py3-none-any.whl (252 kB)
   ---------------------------------------- 0.0/252.1 kB ? eta -:--:--
   -------------------------------- ------- 204.8/252.1 kB 4.1 MB/s eta 0:00:01
   ---------------------------------------- 252.1/252.1 kB 5.1 MB/s eta 0:00:00
Downloading azure_core-1.30.2-py3-none-any.whl (194 kB)
   ---------------------------------------- 0.0/194.3 kB ? eta -:--:--
   ---------------------------------------- 194.3/194.3 kB ? eta 0:00:00


In [4]:
## create your cosmoss db on Azure CLI using the following commands
## az login
## az group create -n <resource-group-name> -l <location>
## az cosmosdb create -n <cosmos-db-name> -r <resource-group-name>
## az cosmosdb list-keys -n <cosmos-db-name> -g <resource-group-name>

## Once done navigate to data explorer and create a new database and a new container


In [11]:
from azure.cosmos import CosmosClient

# Initialize Cosmos Client
url = os.getenv('COSMOS_DB_ENDPOINT')
key = os.getenv('COSMOS_DB_KEY')
client = CosmosClient(url, credential=key)

# Select database
database_name = 'rag-cosmos-db'
database = client.get_database_client(database_name)

# Select container
container_name = 'data'
container = database.get_container_client(container_name)



In [2]:
import pandas as pd

# Initialize an empty DataFrame
df = pd.DataFrame(columns=['path', 'text'])


# splitting our data into chunks
data_paths= ["data/frameworks.md", "data/own_framework.md", "data/perceptron.md"]

for path in data_paths:
    with open(path, 'r', encoding='utf-8') as file:
        file_content = file.read()

    # Append the file path and text to the DataFrame
    df = pd.concat([df,pd.DataFrame([{'path': path, 'text': file_content}])],ignore_index=True)

df.head()

Unnamed: 0,path,text
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...
1,data/own_framework.md,# Introduction to Neural Networks. Multi-Layer...
2,data/perceptron.md,# Introduction to Neural Networks: Perceptron\...


In [3]:
def split_text(text, max_length, min_length):
    words = text.split()
    chunks = []
    current_chunk = []

    for word in words:
        current_chunk.append(word)
        if len(' '.join(current_chunk)) < max_length and len(' '.join(current_chunk)) > min_length:
            chunks.append(' '.join(current_chunk))
            current_chunk = []

    # If the last chunk didn't reach the minimum length, add it anyway
    if current_chunk:
        chunks.append(' '.join(current_chunk))

    return chunks

# Assuming analyzed_df is a pandas DataFrame and 'output_content' is a column in that DataFrame
splitted_df = df.copy()
splitted_df['chunks'] = splitted_df['text'].apply(lambda x: split_text(x, 400, 300))

splitted_df

Unnamed: 0,path,text,chunks
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,[# Neural Network Frameworks As we have learne...
1,data/own_framework.md,# Introduction to Neural Networks. Multi-Layer...,[# Introduction to Neural Networks. Multi-Laye...
2,data/perceptron.md,# Introduction to Neural Networks: Perceptron\...,[# Introduction to Neural Networks: Perceptron...


In [4]:
# Assuming 'chunks' is a column of lists in the DataFrame splitted_df, we will split the chunks into different rows
flattened_df = splitted_df.explode('chunks')

flattened_df.head()

Unnamed: 0,path,text,chunks
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,# Neural Network Frameworks As we have learned...
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,descent optimization While the `numpy` library...
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,should give us the opportunity to compute grad...
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,those computations on GPUs is very important. ...
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,"API, there is also higher-level API, called Ke..."


## Converting our text to embeddings

Converting out text  to embeddings, and storing them in our database in chunks

In [7]:
from openai import OpenAI
import openai

openai.api_type = "openai"
API_KEY = os.getenv("OPENAI_API_KEY","")
assert API_KEY, "ERROR: OpenAI Key is missing"

client = OpenAI(
    api_key=API_KEY
    )

In [8]:
def create_embeddings(text, model="text-embedding-ada-002"):
    # Create embeddings for each document chunk
    embeddings = openai.embeddings.create(input = text, model=model).data[0].embedding
    return embeddings

#embeddings for the first chunk
create_embeddings(flattened_df['chunks'][0])

[-0.017089299857616425,
 0.002794487401843071,
 0.025382837280631065,
 -0.038929399102926254,
 0.006936165504157543,
 0.003929588943719864,
 -0.006223545875400305,
 -0.003289927961304784,
 -0.0028895034920424223,
 -0.02937350794672966,
 0.034884434193372726,
 0.02046915329992771,
 0.001446448382921517,
 0.003025240497663617,
 -0.014632458798587322,
 -0.011028639040887356,
 0.0220979992300272,
 0.009067238308489323,
 -0.029264917597174644,
 -0.020577743649482727,
 -0.03553597256541252,
 -0.0037700978573411703,
 0.0129561061039567,
 -0.03436863422393799,
 -0.030432257801294327,
 -0.0015787921147421002,
 0.015460454858839512,
 -0.043598756194114685,
 -0.007580916862934828,
 -0.014252395369112492,
 0.01970902644097805,
 0.012589615769684315,
 -0.012725352309644222,
 -0.01567763462662697,
 -0.004730437882244587,
 0.011116867884993553,
 0.0012750803725793958,
 0.008239241316914558,
 -0.0003134678700007498,
 -0.001844327780418098,
 0.04034106433391571,
 0.011340834200382233,
 -0.0098341526463

In [9]:
cat = create_embeddings("cat")
cat

[-0.007064840290695429,
 -0.017335813492536545,
 -0.009703516028821468,
 -0.03069942817091942,
 -0.012505334801971912,
 0.003071361454203725,
 -0.005107113625854254,
 -0.04122575744986534,
 -0.014612019062042236,
 -0.021308012306690216,
 0.019250981509685516,
 0.05084415152668953,
 -0.0012741184327751398,
 0.00249858433380723,
 -0.038416843861341476,
 -0.0060646976344287395,
 0.03549443930387497,
 -0.004656694363802671,
 0.0024134658742696047,
 -0.013384893536567688,
 -0.018882133066654205,
 0.00899419467896223,
 0.01586042530834675,
 -0.0087601188570261,
 -0.014654578641057014,
 0.007192518562078476,
 0.01308697834610939,
 -0.01334233395755291,
 0.0029099907260388136,
 0.0049049570225179195,
 0.004000572487711906,
 -0.016768356785178185,
 -0.015803679823875427,
 -0.043098364025354385,
 -0.027167007327079773,
 -0.004294940736144781,
 0.008022424764931202,
 -0.009973057545721531,
 0.022017333656549454,
 -0.009022567421197891,
 0.004876584280282259,
 0.00031365302857011557,
 -0.012171953

In [10]:
# create embeddings for the whole data chunks and store them in a list

embeddings = []
for chunk in flattened_df['chunks']:
    embeddings.append(create_embeddings(chunk))

# store the embeddings in the dataframe
flattened_df['embeddings'] = embeddings

flattened_df.head()

Unnamed: 0,path,text,chunks,embeddings
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,# Neural Network Frameworks As we have learned...,"[-0.017089299857616425, 0.002794487401843071, ..."
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,descent optimization While the `numpy` library...,"[-0.01474149338901043, 0.0017059656092897058, ..."
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,should give us the opportunity to compute grad...,"[-0.03686178848147392, -0.02068474516272545, 0..."
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,those computations on GPUs is very important. ...,"[-0.031714845448732376, -0.01109745167195797, ..."
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,"API, there is also higher-level API, called Ke...","[-0.008053705096244812, -0.03333533555269241, ..."


# Retrieval

Vector search and similiarity between our prompt and the database

### Creating an search index and reranking

In [11]:
from sklearn.neighbors import NearestNeighbors

embeddings = flattened_df['embeddings'].to_list()

# Create the search index
nbrs = NearestNeighbors(n_neighbors=5, algorithm='ball_tree').fit(embeddings)

# To query the index, you can use the kneighbors method
distances, indices = nbrs.kneighbors(embeddings)

# Store the indices and distances in the DataFrame
flattened_df['indices'] = indices.tolist()
flattened_df['distances'] = distances.tolist()

flattened_df.head()

Unnamed: 0,path,text,chunks,embeddings,indices,distances
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,# Neural Network Frameworks As we have learned...,"[-0.017089299857616425, 0.002794487401843071, ...","[0, 2, 11, 3, 1]","[0.0, 0.523271729080148, 0.5282808588281044, 0..."
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,descent optimization While the `numpy` library...,"[-0.01474149338901043, 0.0017059656092897058, ...","[1, 0, 32, 2, 50]","[0.0, 0.569985387825166, 0.5920245992435789, 0..."
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,should give us the opportunity to compute grad...,"[-0.03686178848147392, -0.02068474516272545, 0...","[2, 3, 0, 5, 1]","[0.0, 0.5056048476107072, 0.523271729080148, 0..."
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,those computations on GPUs is very important. ...,"[-0.031714845448732376, -0.01109745167195797, ...","[3, 2, 0, 10, 11]","[0.0, 0.5056048476107072, 0.5463271890300337, ..."
0,data/frameworks.md,# Neural Network Frameworks\n\nAs we have lear...,"API, there is also higher-level API, called Ke...","[-0.008053705096244812, -0.03333533555269241, ...","[4, 12, 10, 8, 9]","[0.0, 0.5202010562661403, 0.5529406126323927, ..."


In [46]:
# Your text question
question = "what is a perceptron?"

# Convert the question to a query vector
query_vector = create_embeddings(question)  # You need to define this function

# Find the most similar documents
distances, indices = nbrs.kneighbors([query_vector])

index = []
# Print the most similar documents
for i in range(3):
    index = indices[0][i]
    if index :
        print(flattened_df['chunks'].iloc[index])
        print(flattened_df['path'].iloc[index])
        print(flattened_df['distances'].iloc[index])
    else:
        print(f"Index {index} not found in DataFrame")

in our model, in which case the input vector would be a vector of size N. A perceptron is a **binary classification** model, i.e. it can distinguish between two classes of input data. We will assume that for each input vector x the output of our perceptron would be either +1 or -1, depending on the class.
data/perceptron.md
[0.0, 0.5277524698843057, 0.5361884317207419, 0.5441593827523529, 0.5534469152717095]
# Introduction to Neural Networks: Perceptron One of the first attempts to implement something similar to a modern neural network was done by Frank Rosenblatt from Cornell Aeronautical Laboratory in 1957. It was a hardware implementation called "Mark-1", designed to recognize primitive geometric figures,
data/perceptron.md
[0.0, 0.45838413163887076, 0.5234371691387615, 0.5630872658767582, 0.5633781042370638]
user to adjust the resistance of a circuit. > The New York Times wrote about perceptron at that time: *the embryo of an electronic computer that [the Navy] expects will be able

## Putting it all together to answer a question

In [22]:
import os
import openai

openai.api_type = "azure"
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_version = "2023-07-01-preview"
openai.api_key = os.getenv("AZURE_OPENAI_KEY")

In [49]:
# user_input = "what is a perceptron?"

# def chatbot(user_input):
#     # Convert the question to a query vector
#     query_vector = create_embeddings(user_input)

#     # Find the most similar documents
#     distances, indices = nbrs.kneighbors([query_vector])

#     # add documents to query  to provide context
#     history = []
#     for index in indices[0]:
#         history.append(flattened_df['chunks'].iloc[index])

#     # combine the history and the user input
#     history.append(user_input)
#     # create a message object
#     messages=[
#         {"role": "system", "content": "You are an AI assiatant that helps with AI questions."},
#         {"role": "user", "content": history[-1]}
#     ]

#     # use chat completion to generate a response
#     response = openai.chat.completions.create(
#         model="gpt-3.5-turbo",
#         temperature=0.7,
#         max_tokens=800,
#         messages=messages
#     )

#     return response.choices[0].message

# chatbot(user_input)

['in our model, in which case the input vector would be a vector of size N. A perceptron is a **binary classification** model, i.e. it can distinguish between two classes of input data. We will assume that for each input vector x the output of our perceptron would be either +1 or -1, depending on the class.', '# Introduction to Neural Networks: Perceptron One of the first attempts to implement something similar to a modern neural network was done by Frank Rosenblatt from Cornell Aeronautical Laboratory in 1957. It was a hardware implementation called "Mark-1", designed to recognize primitive geometric figures,', 'user to adjust the resistance of a circuit. > The New York Times wrote about perceptron at that time: *the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.* ## Perceptron Model Suppose we have N features', "and to continue learning - go to Perceptron notebook. Here's an interest

ChatCompletionMessage(content='A perceptron is a type of artificial neuron or basic unit of a neural network that was developed in the 1950s. It takes multiple input signals, applies weights to these inputs, sums them up, and then passes the sum through an activation function to produce an output. Perceptrons are often used in single-layer neural networks for simple classification tasks.', role='assistant', function_call=None, tool_calls=None)

In [29]:
user_input = "what is a perceptron?"

def chatbot(user_input):
    # Convert the question to a query vector
    query_vector = create_embeddings(user_input)

    # Find the most similar documents
    distances, indices = nbrs.kneighbors([query_vector])

    # add documents to query  to provide context
    history = []
    for index in indices[0]:
        history.append(flattened_df['chunks'].iloc[index])

    # combine the history and the user input
    prompt = """You are an AI assiatant that helps with AI questions. Use the following pieces of retrieved context: {doc} \n\n
    If you cannot find the answer from the pieces of context, just say that you don't know, don't try to make up an answer."""

    prompt.format(doc="\n".join(history))
    # create a message object
    messages=[
        {"role": "system", "content": prompt},
        {"role": "user", "content": user_input}
    ]

    # use chat completion to generate a response
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        temperature=0.7,
        max_tokens=800,
        messages=messages
    )

    return response.choices[0].message

chatbot(user_input)

ChatCompletionMessage(content='A perceptron is a type of artificial neural network, specifically a single-layer neural network. It is one of the simplest forms of neural networks and is based on a mathematical model of a biological neuron. The perceptron takes several input values, applies weights to them, sums them up, and passes the result through an activation function to produce an output. It is commonly used for binary classification tasks and can be trained using algorithms such as the perceptron learning rule or gradient descent.', role='assistant', function_call=None, tool_calls=None)

## Testing and evaluation

A basic example of how you can use Mean Average Precision (MAP) to evaluate the responses of your model based on their relevance.

In [28]:
from sklearn.metrics import average_precision_score

# Define your test cases
test_cases = [
    {
        "query": "What is a perceptron?",
        "relevant_responses": ["A perceptron is a type of artificial neuron.", "It's a binary classifier used in machine learning."],
        "irrelevant_responses": ["A perceptron is a type of fruit.", "It's a type of car."]
    },
    {
        "query": "What is machine learning?",
        "relevant_responses": ["Machine learning is a method of data analysis that automates analytical model building.", "It's a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention."],
        "irrelevant_responses": ["Machine learning is a type of fruit.", "It's a type of car."]
    },
    {
        "query": "What is deep learning?",
        "relevant_responses": ["Deep learning is a subset of machine learning in artificial intelligence (AI) that has networks capable of learning unsupervised from data that is unstructured or unlabeled.", "It's a type of machine learning."],
        "irrelevant_responses": ["Deep learning is a type of fruit.", "It's a type of car."]
    },
    {
        "query": "What is a neural network?",
        "relevant_responses": ["A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.", "It's a type of machine learning."],
        "irrelevant_responses": ["A neural network is a type of fruit.", "It's a type of car."]
    }
]

# Initialize the total average precision
total_average_precision = 0

# Test the RAG application
for test_case in test_cases:
    query = test_case["query"]
    relevant_responses = test_case["relevant_responses"]
    irrelevant_responses = test_case["irrelevant_responses"]

    # Generate a response using your RAG application
    response = chatbot(query) 
    print(query)
    print(response.content)
    print("---------------")
    # Create a list of all responses and a list of true binary labels
    all_responses = relevant_responses + irrelevant_responses
    true_labels = [1] * len(relevant_responses) + [0] * len(irrelevant_responses)

    # Create a list of predicted scores based on whether the response is the generated response
    predicted_scores = [1 if resp == response else 0 for resp in all_responses]

    # Calculate the average precision for this query
    average_precision = average_precision_score(true_labels, predicted_scores)

    # Add the average precision to the total average precision
    total_average_precision += average_precision

# Calculate the mean average precision
mean_average_precision = total_average_precision / len(test_cases)

['in our model, in which case the input vector would be a vector of size N. A perceptron is a **binary classification** model, i.e. it can distinguish between two classes of input data. We will assume that for each input vector x the output of our perceptron would be either +1 or -1, depending on the class.', '# Introduction to Neural Networks: Perceptron One of the first attempts to implement something similar to a modern neural network was done by Frank Rosenblatt from Cornell Aeronautical Laboratory in 1957. It was a hardware implementation called "Mark-1", designed to recognize primitive geometric figures,', 'user to adjust the resistance of a circuit. > The New York Times wrote about perceptron at that time: *the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.* ## Perceptron Model Suppose we have N features', "and to continue learning - go to Perceptron notebook. Here's an interest

In [27]:
mean_average_precision

np.float64(0.5)