<div id="singlestore-header" style="display: flex; background-color: rgba(255, 167, 103, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/crystal-ball.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Using RAG with SingleStoreDB</h1>
    </div>
</div>

Vertex AI, a product by Google Cloud, offers an integrated suite of machine learning tools that allows developers to build, deploy, and scale AI models faster than ever. On the other hand, SingleStoreDB offers a fast, scalable, and SQL-compliant relational database system. By combining the power of Vertex AI's machine learning capabilities with the efficient storage and retrieval mechanisms of SingleStoreDB, we can create robust AI applications that respond to user queries in real-time.

## RAG with Google Gemini Pro and SingleStore

This example leverages the RAG Pattern in the context of the Generative AI Lifecycle Patterns depicted in this [blogpost by Dr. Ali Arsanjani](https://dr-arsanjani.medium.com/the-generative-ai-lifecycle-1b0c7d9463ec).

## What You'll Learn

* Setting up your environment with the necessary packages and credentials.
* How Vector Similarity Search can be achieved by leveraging a SingleStore database.
* How to implement the RAG Technique.
* How to work with results from the TextGeneration API from Google Vertex AI.

## Prerequisites
* Basic knowledge of Python programming.
* Familiarity with Google Cloud services and SQL databases.
* An active Google Cloud account.
* A SingleStoreDB hosted or self-managed instance.

## Setup

In [1]:
%pip install --quiet google-cloud-aiplatform singlestoredb

In [2]:
from google.colab import auth as google_auth
google_auth.authenticate_user()

import vertexai
from google.cloud import aiplatform
from vertexai.language_models import TextEmbeddingModel
from vertexai.preview import generative_models

import singlestoredb as s2
import json

from IPython.display import display, Markdown, Latex

## Parameters

In [3]:
# GCP Parameters
PROJECT = ""
LOCATION = "us-central1"

#LLM
MODEL = "gemini-pro"
TEMPERATURE = 0.1
TOP_K = 1
TOP_P = 1
MAX_OUTPUT_TOKENS = 2048

# Init AI Platform
aiplatform.init(project=PROJECT, location=LOCATION)
model = generative_models.GenerativeModel(MODEL)

# Doc Similarity Threshold
threshold = 0.7

## Connect to SingleStoreDB

In [4]:
connection = s2.connect()

## Function definitions

### About RAG

Access similar documents using semantic search. How is this done? A set of documents you supply are chunked (read ‘split’) up (sentence by sentence or by paragraph, or by page, etc.) then converted into an embedding with a Vector Embedding like textembedding-gecko@latest and then stored in a Vector Database such as Google’s Vertex Vector Search. The retrieval is done via an Approximate Nearest Neighbor search (ANN) aka semantic search algorithm. This input may significantly decrease the possibility of the model’s hallucination and provide the model with enough relevant context so as to be more knowledgeable about the topic and return more ‘sensible’ and relevant completions. This process is known as Retrieval Augmented Generation or RAG. So RAG it.

### RAG Steps

1. Creating an **initial prompt** from the user’s query or statement.
2. Augmenting the prompt with **context** retrieved from the Vector Store.
3. **Sending** the augmented prompt to the LLM.

### RAG Implementation

In the case of this example the RAG technique is implemented through the ask_question method.

```python
def ask_question(query,model):
  # Vector Similary Search
  results = query_s2(query)
  filtered_results = filter_threshold(results, threshold)
  # Check if there are documents within the threshold
  if len(filtered_results)==0:
    return("I'm Sorry, I don't know that.")
  unique_results = filter_unique_docs(filtered_results)
  # Context Preparation
  context = get_context(unique_results)
  # LLM Query
  answer, verification = process_llm(query,context,model,temperature)
  return(context, answer, verification)
 ```


#### Retrieval

In the first step, Vector Similarity Search is performed on a SingleStore database that has embeddings stored.

The database structure is the following:

**Table name**: embeddings

**Columns**:
 * *content*: Contains the text extracted from the document chunk during ingesion.
 * *vector*: The embedding generated from the content.
 * *metadata*: Contains information about the chunk: page, and document name.


##### Results Filtering

In the parameter section of this notebook, you will see there is threshold parameter, this parameter defines what is the minimum similarity between the query and the document, in order to consider the document as part of the context, if the similarity score is larger, the documents are more relevant.

Additinally, in this example, there is a filtering in place to ensure there are no duplicate combination of document/page pairs.

#### Prompt creation

In this case the prompt template is predefined, and the content of the documents obtained from the SingleStore database is injected into the prompt.

**Initial Prompt**

```
SYSTEM: You are an intelligent assistant helping the users with their questions.

Strictly Use ONLY the following pieces of context to answer the question at the end. Think step-by-step and then answer.

Do not try to make up an answer:
 - If the answer to the question cannot be determined from the context alone, say "I cannot determine the answer to that."
 - If the context is empty, just say "I do not know the answer to that."

=============
{context}
=============

Question: {question}
Helpful Answer:
```

Once a result is obtained from the LLM, this result will be processed through a second prompt to obtain a verification, where the LLM will assess the result and assign an score. Even though this step is optional, it is a good idea to consider result verification.

**Verification Prompt**

```
Does the following Answer is a good answer for the following question? Return the answer as a value from 0 to 5, where 0 is not a good answer and 5 is a good answer.
Provide an explanation of why you used that score.

QUESTION: {question}

ANSWER: {answer}

Answer:
```

#### Result presentation

Finally, results are presented through the format_answer function, which uses Markdown to present the initial answer, the verification answer, and the content.

In [5]:
def text_embedding(query):
    """Text embedding with a Large Language Model."""
    model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003")
    #Working with a single
    embeddings = model.get_embeddings([query])
    return embeddings[0].values


def query_s2(query):
    query_embeddings = json.dumps(text_embedding(query))
    num_rows = 4
    statement = f"""
            SELECT
              content, metadata,
                DOT_PRODUCT(JSON_ARRAY_PACK('{query_embeddings}'), vector) AS score
            FROM embeddings
            ORDER BY score DESC LIMIT {num_rows}
    """

    # Execute the SQL statement
    cursor = connection.cursor()
    cursor.execute(statement)
    try:
        results = cursor.fetchall()
        return results
    except:
        print("Error")


def filter_threshold(results,threshold):
    filtered_results = []
    for doc in results:
        if doc[2] > threshold:
            filtered_results.append(doc)
    return filtered_results


def filter_unique_docs(filtered_results):
    unique_docs = []
    last_doc_name = ""
    for result in filtered_results:
        doc_name= result[1]['source']
        if doc_name != last_doc_name:
            unique_docs.append(result)
            last_doc_name = doc_name
    return unique_docs


def get_context(unique_results):
    context = ""
    for doc in unique_results:
        text = doc[0]
        doc_metadata = doc[1]
        context = context+"\n----"
        context = context+f"\nThis information is contained on the document {doc_metadata['source']}"
        context = context+"\n--"
        context = context+"\n"+text
    return context


def process_llm(query, context, model):
    responses = model.generate_content(
        [template.format(question=query, context=context)],
        generation_config={
            "max_output_tokens": MAX_OUTPUT_TOKENS,
            "temperature": TEMPERATURE,
            "top_p": TOP_P,
            "top_k": TOP_K,
        },
        stream=False)
  return responses.candidates[0].content.parts[0]


def ask_question(query,model):
    # Vector Similarity Search
    results = query_s2(query)
    filtered_results = filter_threshold(results, threshold)
    # Check if there are documents within the threshold
    if len(filtered_results)==0:
        return "I'm Sorry, I don't know that."
    unique_results = filter_unique_docs(filtered_results)
    # Context Preparation
    context = get_context(unique_results)
    # LLM Query
    answer = process_llm(query,context,model)
    return context, answer

# LLM Query
template = """SYSTEM: You are an intelligent assistant helping the users with their questions.

Strictly Use ONLY the following pieces of context to answer the question at the end. Think step-by-step and then answer.

Do not try to make up an answer:
 - If the answer to the question cannot be determined from the context alone, say "I cannot determine the answer to that."
 - If the context is empty, just say "I do not know the answer to that."

=============
{context}
=============

Question: {question}
Helpful Answer:
"""

# Verfication Query
verification_template = """"
Does the following Answer is a good answer for the following question? Return the answer as a value from 0 to 5, where 0 is not a good answer and 5 is a good answer.
Provide an explanation of why you used that score.

QUESTION: {question}

ANSWER: {answer}

Answer:
"""


def format_answer(answer):
    if type(answer) == tuple:
        answer_sanitized = str(answer[1].text).replace("$","\$").replace("#","\#")
        context_sanitized = answer[0].replace("$","\$").replace("#","\#")
        display(Markdown(f"### Answer\n {answer_sanitized}\n\n<details><summary>Context</summary>{context_sanitized}</details>"))
    else:
        display(Markdown(f"### Answer\n {answer}"))

## Examples

In [6]:
query = "What is a form W-2 for?"
format_answer(ask_question(query, model))

In [7]:
query = "What is the fastest way to get a tax refund?" #@param {type:"string"}
format_answer(ask_question(query, model))

In [8]:
query = "what is a form 8922?" #@param {type:"string"}
format_answer(ask_question(query, model))

In [9]:
query = "Should I buy a yatch?" #@param {type:"string"}
format_answer(ask_question(query, model))

<div id="singlestore-footer" style="background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px"></div>
<div><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png" style="padding: 0px; margin: 0px; height: 24px"/></div>