# Redis

Notes:

- Source notebook: https://github.com/RedisAI/vecsim-demo/blob/master/SemanticSearch1k.ipynb
- Used Redis Cloud,
- Available in Brazil region on AWS
- Redis doesn't have the dot product metric, but has the inner product (ref: https://www.youtube.com/watch?v=WC9YW1ya31o)
- Redis has HNSW and also FLAT. LAT index solves the KNN queries
- It is possible to tune HNSW with additional parameters (need to check if this is possible with SAI) (https://redis.io/docs/interact/search-and-query/advanced-concepts/vectors/#hnsw)
- I could load only 884 records with the free account

In [1]:
pip install redis -q

Note: you may need to restart the kernel to use updated packages.


In [1]:
import random
import numpy as np
import pandas as pd
import time
from redis import Redis
from redis.commands.search.field import VectorField
from redis.commands.search.field import TextField
from redis.commands.search.field import TagField
from redis.commands.search.query import Query
from redis.commands.search.result import Result

class color:
   PURPLE = '\033[95m'
   CYAN = '\033[96m'
   DARKCYAN = '\033[36m'
   BLUE = '\033[94m'
   GREEN = '\033[92m'
   YELLOW = '\033[93m'
   RED = '\033[91m'
   BOLD = '\033[1m'
   UNDERLINE = '\033[4m'
   END = '\033[0m'

In [2]:
from dotenv import load_dotenv
import os
load_dotenv(override=True)

True

In [3]:
# Connect REDIS
redis_conn = Redis(
  host=os.environ["REDIS_HOST"],
  port=13679,
  password=os.environ["REDIS_PASSWORD"],)

In [4]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
# Load products
NUMBER_PRODUCTS=1000

#Load Product data and truncate long text fields
all_prods_df = pd.read_csv("./data/products_data_sample.txt")
all_prods_df['primary_key'] = all_prods_df['item_id'] + '-' + all_prods_df['domain_name']
all_prods_df['item_keywords'].replace('', np.nan, inplace=True)
all_prods_df.dropna(subset=['item_keywords'], inplace=True)
all_prods_df.reset_index(drop=True,inplace=True)

#get the first 1000 products with non-empty item keywords
product_metadata = all_prods_df.head(NUMBER_PRODUCTS).to_dict(orient='index')

In [6]:
all_prods_df.head()


Unnamed: 0,item_id,marketplace,country,main_image_id,domain_name,bullet_point,item_keywords,material,brand,color,item_name,model_name,model_number,product_type,primary_key
0,B07T6RZ2CM,Amazon,IN,71dZhpsferL,amazon.in,3D Printed Hard Back Case Mobile Cover for Len...,mobile cover back cover mobile case phone case...,,Amazon Brand - Solimo,Others,Amazon Brand - Solimo Designer Couples Sitting...,Lenovo K4 Note,gz8115-SL40423,CELLULAR_PHONE_CASE,B07T6RZ2CM-amazon.in
1,B07T2JY31Y,Amazon,IN,71vX7qIEAIL,amazon.in,3D Printed Hard Back Case Mobile Cover for Son...,mobile cover back cover mobile case phone case...,Wood,Amazon Brand - Solimo,others,Amazon Brand - Solimo Designer Leaf on Wood 3D...,Sony Xperia Z1 L39H,gz8056-SL40528,CELLULAR_PHONE_CASE,B07T2JY31Y-amazon.in
2,B0849YGSCZ,Amazon,AE,A1EZF-2mB5L,amazon.ae,,small de fur rooms navidad woven girls shag pa...,,Stone & Beam,,Stone & Beam Contemporary Doily Wool Farmhouse...,,I59I8044IVYGRYC00-Parent,HOME_FURNITURE_AND_DECOR,B0849YGSCZ-amazon.ae
3,B081K6TCML,Amazon,IN,81o9EyZ-fAL,amazon.in,Solimo Plastic Multipurpose Modular Drawer; sm...,drawer modular drawer 3 rack modular drawer ki...,Plastic,Amazon Brand - Solimo,Multicolor,Amazon Brand - Solimo Plastic Multipurpose Mod...,,sol_cujo_13,HOME,B081K6TCML-amazon.in
4,B0854774X5,Amazon,IN,81xaJCVnl3L,amazon.in,"Snug fit for Nokia 8.1, with perfect cut-outs ...",Back Cover Designer Case Designer Take It Easy...,Silicon,Amazon Brand - Solimo,Multicolor,Amazon Brand - Solimo Designer Take It Easy UV...,Nokia 8.1,UV10714-SL40617,CELLULAR_PHONE_CASE,B0854774X5-amazon.in


In [8]:
%%time
# Generating embeddings

item_keywords =  [product_metadata[i]['item_keywords']  for i in product_metadata.keys()]
item_keywords_vectors = [ model.encode(sentence) for sentence in item_keywords]

CPU times: user 1min 39s, sys: 22.4 s, total: 2min 1s
Wall time: 41.3 s


In [9]:
print(f"""Len item_keywords_vectors: {len(item_keywords_vectors)} """)
print(f"""Len product_metadata: {len(product_metadata)} """)
product_metadata[0]

Len item_keywords_vectors: 731 
Len product_metadata: 731 


{'item_id': 'B07T6RZ2CM',
 'marketplace': 'Amazon',
 'country': 'IN',
 'main_image_id': '71dZhpsferL',
 'domain_name': 'amazon.in',
 'bullet_point': '3D Printed Hard Back Case Mobile Cover for Lenovo K4 Note Easy to put & take off with perfect cutouts for volume buttons, audio & charging ports. Stylish design and appearance, express your unique personality. Extreme precision design allows easy access to all buttons and ports while featuring raised bezel to life screen and camera off flat surface. Slim Hard Back Cover No Warranty None',
 'item_keywords': 'mobile cover back cover mobile case phone case mobile panel phone panel Lenovo mobile case Lenovo phone cover Lenovo back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Lenovo mobile case Lenovo phone cover Lenovo back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Lenovo mobile case Lenovo phone cover Lenovo 

In [10]:
def load_vectors(client:Redis, product_metadata, vector_dict, vector_field_name):
    p = client.pipeline(transaction=False)
    for index in product_metadata.keys():    
        #hash key
        key='product:'+ str(index)+ ':' + product_metadata[index]['primary_key']
        
        #hash values
        item_metadata = product_metadata[index]
        item_keywords_vector = vector_dict[index].astype(np.float32).tobytes()
        item_metadata[vector_field_name]=item_keywords_vector
        
        # HSET
        p.hset(key,mapping=item_metadata)
            
    p.execute()

In [11]:
def create_flat_index (redis_conn,vector_field_name,number_of_vectors, vector_dimensions=512, distance_metric='L2'):
    redis_conn.ft().create_index([
        VectorField(vector_field_name, "FLAT", {"TYPE": "FLOAT32", "DIM": vector_dimensions, "DISTANCE_METRIC": distance_metric, "INITIAL_CAP": number_of_vectors, "BLOCK_SIZE":number_of_vectors }),
        TagField("product_type"),
        TextField("item_name"),
        TextField("item_keywords"),
        TagField("country")        
    ])

def create_hnsw_index (redis_conn,vector_field_name,number_of_vectors, vector_dimensions=512, distance_metric='L2',M=40,EF=200):
    redis_conn.ft().create_index([
        VectorField(vector_field_name, "HNSW", {"TYPE": "FLOAT32", "DIM": vector_dimensions, "DISTANCE_METRIC": distance_metric, "INITIAL_CAP": number_of_vectors, "M": M, "EF_CONSTRUCTION": EF}),
        TagField("product_type"),
        TextField("item_keywords"),        
        TextField("item_name"),
        TagField("country")     
    ]) 

In [23]:
%%time

ITEM_KEYWORD_EMBEDDING_FIELD_KNN='item_keyword_vector_knn'
TEXT_EMBEDDING_DIMENSION=768
NUMBER_PRODUCTS=1000

print ('Loading and Indexing + ' +  str(NUMBER_PRODUCTS) + ' products')

#flush all data
redis_conn.flushall()

#create flat index & load vectors
create_flat_index(redis_conn, ITEM_KEYWORD_EMBEDDING_FIELD_KNN,NUMBER_PRODUCTS,TEXT_EMBEDDING_DIMENSION,'COSINE')
load_vectors(redis_conn,product_metadata,item_keywords_vectors,ITEM_KEYWORD_EMBEDDING_FIELD_KNN)

Loading and Indexing + 1000 products
CPU times: user 93.5 ms, sys: 43.1 ms, total: 137 ms
Wall time: 7.38 s


In [24]:
%%time
# brute-force index
topK=5
#product_query='beautifully crafted present for her. a special occasion'
product_query='cool way to pimp up my cell'

#vectorize the query
query_vector = model.encode(product_query).astype(np.float32).tobytes()

#prepare the query
q = ( Query(f'*=>[KNN {topK} @{ITEM_KEYWORD_EMBEDDING_FIELD_KNN} $vec_param AS vector_score]')
    .sort_by('vector_score')
    .paging(0,topK)
    .return_fields('vector_score','item_name','item_id','item_keywords')
    .dialect(2) )
params_dict = {"vec_param": query_vector}


#Execute the query
results_KNN = redis_conn.ft().search(q, query_params = params_dict)

#Print similar products found
for product in results_KNN.docs:
    print ('*************** Product found ************')
    print (color.BOLD + 'hash key = ' +  color.END + product.id)
    print (color.YELLOW + 'Item Name = ' +  color.END  + product.item_name)
    print (color.YELLOW + 'Item Id = ' +  color.END  + product.item_id)
    print (color.YELLOW + 'Item keywords = ' +  color.END  + product.item_keywords)
    print (color.YELLOW + 'Score = ' +  color.END  + product.vector_score)

*************** Product found ************
[1mhash key = [0mproduct:558:B07T7KMCSD-amazon.in
[93mItem Name = [0mAmazon Brand - Solimo Designer Wooden Door 3D Printed Hard Back Case Mobile Cover for Microsoft Lumia 650
[93mItem Id = [0mB07T7KMCSD
[93mItem keywords = [0mmobile cover back cover mobile case phone case mobile panel phone panel hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Microsoft Lumia mobile case Microsoft Lumia phone cover Microsoft Lumia back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Microsoft Lumia mobile case Microsoft Lumia phone cover Microsoft Lumia back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Microsoft Lumia mobile case Microsoft Lumia phone cover Microsoft Lumia back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile

In [25]:
%%time
# HNSW / ANN query

ITEM_KEYWORD_EMBEDDING_FIELD_ANN='item_keyword_vector_ann'
NUMBER_PRODUCTS=884
TEXT_EMBEDDING_DIMENSION=768

print ('Loading and Indexing + ' +  str(NUMBER_PRODUCTS) + ' products')

#flush all data
redis_conn.flushall()

#create flat index & load vectors
create_hnsw_index(redis_conn, ITEM_KEYWORD_EMBEDDING_FIELD_ANN,NUMBER_PRODUCTS,TEXT_EMBEDDING_DIMENSION,'COSINE',M=40,EF=200)
load_vectors(redis_conn,product_metadata,item_keywords_vectors,ITEM_KEYWORD_EMBEDDING_FIELD_ANN)

Loading and Indexing + 884 products
CPU times: user 85.2 ms, sys: 36.7 ms, total: 122 ms
Wall time: 5.99 s


In [26]:
%%time
# query with ANN
topK=5
#product_query='beautifully crafted present for her. a special occasion'
product_query='cool way to pimp up my cell'

#vectorize the query
query_vector = model.encode(product_query).astype(np.float32).tobytes()

#prepare the query
q = (Query(f'*=>[KNN {topK} @{ITEM_KEYWORD_EMBEDDING_FIELD_ANN} $vec_param AS vector_score]')
     .sort_by('vector_score')
     .paging(0,topK)
     .return_fields('vector_score','item_name','item_id','item_keywords','country')
     .dialect(2))
params_dict = {"vec_param": query_vector}


#Execute the query
results_ANN = redis_conn.ft().search(q, query_params = params_dict)

#Print similar products found
for product in results_ANN.docs:
    print ('*************** Product found ANN ************')
    print (color.BOLD + 'hash key = ' +  color.END + product.id)
    print (color.YELLOW + 'Item Name = ' +  color.END  + product.item_name)
    print (color.YELLOW + 'Item Id = ' +  color.END  + product.item_id)
    print (color.YELLOW + 'Item keywords = ' +  color.END  + product.item_keywords)
    print (color.YELLOW + 'Country = ' +  color.END  + product.country)
    print (color.YELLOW + 'Score = ' +  color.END  + product.vector_score)

*************** Product found ANN ************
[1mhash key = [0mproduct:558:B07T7KMCSD-amazon.in
[93mItem Name = [0mAmazon Brand - Solimo Designer Wooden Door 3D Printed Hard Back Case Mobile Cover for Microsoft Lumia 650
[93mItem Id = [0mB07T7KMCSD
[93mItem keywords = [0mmobile cover back cover mobile case phone case mobile panel phone panel hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Microsoft Lumia mobile case Microsoft Lumia phone cover Microsoft Lumia back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Microsoft Lumia mobile case Microsoft Lumia phone cover Microsoft Lumia back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Microsoft Lumia mobile case Microsoft Lumia phone cover Microsoft Lumia back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mo

In [33]:
# comparing outputs
for i in reversed(range(5)):
    print(f""" ============== """)
    print(f""" KNN { results_KNN.docs[i]["item_name"] } - Similarity KNN {results_KNN.docs[i]["vector_score"]}""")
    print(f""" ANN { results_ANN.docs[i]["item_name"] } - Similarity ANN {results_ANN.docs[i]["vector_score"]}""")

 KNN Amazon Brand - Solimo Designer Heart Pattern Alphabet-H 3D Printed Hard Back Case Mobile Cover for Motorola Moto E6s / Motorola Moto E6 Plus - Similarity KNN 0.650277912617
 ANN Amazon Brand - Solimo Designer Heart Pattern Alphabet-H 3D Printed Hard Back Case Mobile Cover for Motorola Moto E6s / Motorola Moto E6 Plus - Similarity ANN 0.650277912617
 KNN Amazon Brand - Solimo Designer Color Spread 3D Printed Hard Back Case Mobile Cover for Nokia 2.1 - Similarity KNN 0.647348761559
 ANN Amazon Brand - Solimo Designer Color Spread 3D Printed Hard Back Case Mobile Cover for Nokia 2.1 - Similarity ANN 0.647348761559
 KNN Amazon Brand - Solimo Designer Couples Standing in Rain 3D Printed Hard Back Case Mobile Cover for Motorola Moto G 2nd Generation - Similarity KNN 0.646299540997
 ANN Amazon Brand - Solimo Designer Couples Standing in Rain 3D Printed Hard Back Case Mobile Cover for Motorola Moto G 2nd Generation - Similarity ANN 0.646299540997
 KNN Amazon Brand - Solimo Designer Pink C

In [31]:
%%time
# HYbrid query
topK=5
# product_query='beautifully crafted carpets for a special occasion'
product_query='cool way to pimp up my cell'

#vectorize the query
query_vector = model.encode(product_query).astype(np.float32).tobytes()

#prepare the query
q = (Query(f'(@country:{{DE|IN|IT}})=>[KNN {topK} @{ITEM_KEYWORD_EMBEDDING_FIELD_ANN} $vec_param AS vector_score]')
    .sort_by('vector_score')
    .paging(0,topK)
    .return_fields('vector_score','item_name','item_id','item_keywords','country')
    .dialect(2))
params_dict = {"vec_param": query_vector}


#Execute the query
results_HYBRID = redis_conn.ft().search(q, query_params = params_dict)

#Print similar products found
for product in results_HYBRID.docs:
    print ('***************Product  found ************')
    print (color.BOLD + 'hash key = ' +  color.END + product.id)
    print (color.YELLOW + 'Item Name = ' +  color.END  + product.item_name)
    print (color.YELLOW + 'Item Id = ' +  color.END  + product.item_id)
    print (color.YELLOW + 'Item keywords = ' +  color.END  + product.item_keywords)
    print (color.YELLOW + 'Score = ' +  color.END  + product.vector_score)
    print (color.YELLOW + 'Country = ' +  color.END  + product.country)

***************Product  found ************
[1mhash key = [0mproduct:558:B07T7KMCSD-amazon.in
[93mItem Name = [0mAmazon Brand - Solimo Designer Wooden Door 3D Printed Hard Back Case Mobile Cover for Microsoft Lumia 650
[93mItem Id = [0mB07T7KMCSD
[93mItem keywords = [0mmobile cover back cover mobile case phone case mobile panel phone panel hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Microsoft Lumia mobile case Microsoft Lumia phone cover Microsoft Lumia back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Microsoft Lumia mobile case Microsoft Lumia phone cover Microsoft Lumia back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile panel phone panel Microsoft Lumia mobile case Microsoft Lumia phone cover Microsoft Lumia back case hard case 3D printed mobile cover mobile cover back cover mobile case phone case mobile

In [34]:
# comparing outputs
for i in reversed(range(5)):
    print(f""" ============== """)
    print(f""" KNN { results_KNN.docs[i]["item_name"] } - Similarity KNN {results_KNN.docs[i]["vector_score"]}""")
    print(f""" ANN { results_ANN.docs[i]["item_name"] } - Similarity ANN {results_ANN.docs[i]["vector_score"]}""")
    print(f""" HYB { results_HYBRID.docs[i]["item_name"] } - Similarity ANN {results_HYBRID.docs[i]["vector_score"]}""")    

 KNN Amazon Brand - Solimo Designer Heart Pattern Alphabet-H 3D Printed Hard Back Case Mobile Cover for Motorola Moto E6s / Motorola Moto E6 Plus - Similarity KNN 0.650277912617
 ANN Amazon Brand - Solimo Designer Heart Pattern Alphabet-H 3D Printed Hard Back Case Mobile Cover for Motorola Moto E6s / Motorola Moto E6 Plus - Similarity ANN 0.650277912617
 HYB Amazon Brand - Solimo Designer Heart Pattern Alphabet-H 3D Printed Hard Back Case Mobile Cover for Motorola Moto E6s / Motorola Moto E6 Plus - Similarity ANN 0.650277912617
 KNN Amazon Brand - Solimo Designer Color Spread 3D Printed Hard Back Case Mobile Cover for Nokia 2.1 - Similarity KNN 0.647348761559
 ANN Amazon Brand - Solimo Designer Color Spread 3D Printed Hard Back Case Mobile Cover for Nokia 2.1 - Similarity ANN 0.647348761559
 HYB Amazon Brand - Solimo Designer Color Spread 3D Printed Hard Back Case Mobile Cover for Nokia 2.1 - Similarity ANN 0.647348761559
 KNN Amazon Brand - Solimo Designer Couples Standing in Rain 3D 