# Azure Cache for Redis Vector Database

This sample shows how to connect with an existing Redis Database with RediSearch installed, create embeddings with openAI,
create indexes, load the vectors to the Vector Database, and query the top k results

## Prequisites

Set up your Redis database that we will use for a Vector Database using Azure Cache [Azure Cache with Redis](https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/quickstart-create-redis-enterprise)

1. On the Advanced page, select the Modules drop-down and select RediSearch. This will create your cache with the RediSearch module installed. You can also install the other modules if you’d like. 
2. In the Zone redundancy section, select Zone redundant (recommended). This will give your cache even greater availability.  
3. In the Non-TLS access only section select Enable. 
4. For Clustering Policy, select Enterprise. RediSearch is only supported using the Enterprise cluster policy.  
5. Select Review + create and finish creating your cache instance. 

- Install Python libraries using `pip install -r requirements.txt`
- Enter your Redis environment variables for REDIS_HOST, REDIS_PORT, REDIS_PASSWORD in example.env
- Enter your openAI environment variables in example.env

In [None]:
import json
import requests
import numpy as np
import pandas as pd
import openai
from dotenv import dotenv_values
from redis import Redis
from redis.commands.search.field import VectorField, TextField, TagField
from redis.commands.search.query import Query
from redis.commands.search.result import Result

### Load environment variables and keys

In [None]:
# specify the name of the .env file name 
env_name = "example.env"
config = dotenv_values(env_name)

redis_host = config['REDIS_HOST']
redis_port = config['REDIS_PORT']
redis_passwd = config['REDIS_PASSWORD']

openai_api_key = config['openai_api_key']
openai_api_base = config['openai_api_base']
openai_api_version = config['openai_api_version']
openai_deployment_embedding = config['openai_deployment_embedding']

ITEM_KEYWORD_EMBEDDING_FIELD='item_keyword_vector'

# We are using text-embedding-ada-002
embedding_length = 1536

### Establish a connection to the database

In [None]:
redis_conn = Redis(
  host=redis_host,
  port=redis_port,
  password=redis_passwd,
)
print('Connected to redis')

### Prep the data

In [None]:
df = pd.read_csv('../../Dataset/Reviews_small.csv')

In [None]:
df.head()

In [None]:
NUMBER_PRODUCTS = len(df['Id'])

### Create content and generate embeddings using OpenAI text-embedding-ada-002

In [None]:
# We will combine productid, score, and text into a single field to run embeddings on
df['combined'] = 'productid: ' + df['ProductId'] + ' ' + 'score: ' + df['Score'].astype(str) + ' ' + 'text: ' + df['Text']
df['combined'].head()

In [None]:
openai.api_type = "azure"
openai.api_key = openai_api_key
openai.api_base = openai_api_base
openai.api_version = openai_api_version

def createEmbeddings(text):
    response = openai.Embedding.create(input=text , engine=openai_deployment_embedding)
    embeddings = response['data'][0]['embedding']
    return embeddings

df['embedding'] = None
# iterate over the dataframe and create embeddings for each row
for index, row in df.iterrows():
    df.at[index, 'embedding'] = createEmbeddings(row['combined'])
    
df.head()

### Create an index on Id and insert our dataframe to the collection

In [None]:
def create_flat_index (redis_conn, vector_field_name, number_of_vectors, vector_dimensions=512, distance_metric='L2'):
    redis_conn.ft().create_index([
        VectorField(vector_field_name, "FLAT", {"TYPE": "FLOAT32", "DIM": vector_dimensions, "DISTANCE_METRIC": distance_metric, "INITIAL_CAP": number_of_vectors, "BLOCK_SIZE": number_of_vectors }),
        TagField("Id"),
        TagField("ProductId"),
        TagField("UserId"),
        TagField("ProfileName"),
        TagField("HelpfulnessNumerator"),
        TagField("HelpfulnessDenominator"),
        TagField("Score"),
        TagField("Time"),
        TagField("Summary"),
        TextField("Text"),
        TextField("combined"),
    ])

In [None]:
# Flush all data
redis_conn.flushall()

# Create flat index
create_flat_index(redis_conn, ITEM_KEYWORD_EMBEDDING_FIELD, NUMBER_PRODUCTS, embedding_length,'COSINE')

### Store the embeddings in Redis Vector Database

[Azure Cache with Redis](https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/quickstart-create-redis-enterprise) provides a simple interface to create a vector database, store and retrieve data using vector search. You can read more about Vector search [here](https://mlops.community/vector-similarity-search-from-basics-to-production/). Additionally, here is a [blog post](https://lablab.ai/t/efficient-vector-similarity-search-with-redis-a-step-by-step-tutorial) demonstrating flat index vs. HNSW. This sample uses flat index.

In [None]:
def load_vectors(client:Redis, product_metadata, vector_dict, vector_field_name):
    p = client.pipeline(transaction=False)
    for index in product_metadata.keys():    
        # Hash key
        key = 'product:'+ str(index)+ ':' + product_metadata[index]['UserId']
        
        # Hash values
        item_metadata = product_metadata[index]
        item_keywords_vector = np.array(vector_dict[index]).astype(np.float32).tobytes()
        item_metadata[vector_field_name] = item_keywords_vector
        
        # HSET
        p.hset(key, mapping=item_metadata)
            
    p.execute()

In [None]:
product_metadata = df.drop('embedding', axis=1).head(NUMBER_PRODUCTS).to_dict(orient='index')

In [None]:
product_metadata[0]

In [None]:
load_vectors(redis_conn, product_metadata, df['embedding'], ITEM_KEYWORD_EMBEDDING_FIELD)

### User Query

In [None]:
userQuestion = "Great taffy"
retrieve_k = 3 # Retrieve the top 3 documents from vector database

In [None]:
# Generate embeddings for the question and retrieve the top k document chunks
questionEmbedding = createEmbeddings(userQuestion)
questionEmbedding = np.array(questionEmbedding).astype(np.float32).tobytes()

In [None]:
# Prepare the query
q = Query(f'*=>[KNN {retrieve_k} @{ITEM_KEYWORD_EMBEDDING_FIELD} $vec_param AS vector_score]').sort_by('vector_score').paging(0, retrieve_k).return_fields(
        'Id', 'ProductId', 'UserId', 'ProfileName', 'HelpfulnessNumerator', 'HelpfulnessDenominator', 'Score', 'Time',
        'Summary', 'Text', 'combined', 'vector_score',
).dialect(2)
params_dict = {"vec_param": questionEmbedding}

# Execute the query
results = redis_conn.ft().search(q, query_params=params_dict)

### Retrieve text from database

In [None]:
df_retrieved = pd.DataFrame()
for product in results.docs:
    print('***************Product  found ************')
    print(product.combined)
    print('vector_score: ', product.vector_score)
    
    df_retrieved = pd.concat([df_retrieved, pd.DataFrame([product.__dict__], columns=product.__dict__.keys())])

In [None]:
df_retrieved

## OPTIONAL: Offer Response to User's Question
To offer a response, one can either follow a simple prompting method as shown below or leverage ways used by other libraries, such as [langchain](https://python.langchain.com/en/latest/index.html).

In [None]:
# create a prompt template 
template = """
    context :{context}
    Answer the question based on the context above. Provide the product id associated with the answer as well. If the
    information to answer the question is not present in the given context then reply "I don't know".
    Query: {query}
    Answer: """

In [None]:
# Create context for the prompt by combining the productid, score, and text of retrieved rows
df_retrieved['combined'] = 'productid: ' + df_retrieved['ProductId'] + ' ' + 'score: ' + df_retrieved['Score'].astype(str) + ' ' + 'text: ' + df_retrieved['Text']
context = '\n'.join(df_retrieved['combined'])

print(context)

In [None]:
prompt = template.format(context=context, query=userQuestion)
print(prompt)

In [None]:
response = openai.Completion.create(
    engine= config["openai_deployment_completion"],
    prompt=prompt,
    max_tokens=1024,
    n=1,
    stop=None,
    temperature=1,
)

print(response['choices'][0]['text'])