# Search study

Let's say you have an existing Redis database with a search index seeded. You may wish to quickly test different search method against the existing index without having to recreate data and/or recreate the index. This demo will walk you though how to set this up and get going.

# Installation

In [None]:
%pip install redis-retrieval-optimizer

## Check markdown

In [1]:
import redis_retrieval_optimizer

redis_retrieval_optimizer.__version__

'0.4.1'

# Load data

We will load our custom car dataset for this example. 

In [2]:
import json

with open('../resources/cars/car_corpus.json', 'r') as f:
    corpus = json.load(f)

with open('../resources/cars/car_queries.json', 'r') as f:
    queries = json.load(f)

with open('../resources/cars/car_qrels.json', 'r') as f:
    qrels = json.load(f)

# Create the index with redisvl

For the search_study we are assuming that the search index already exists. The cell below will create a Redis search index and populate it with our test data for example purposes but is assumed with a search study is populated and running within your data. 

Note: the demo assumes you have a instance of redis running on localhost:6379. If this is not the case, update the redis_url to direct to your running instance or start a local instance with the following command. 

`docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

In [3]:
# assuming you have a redis instance running on localhost:6379
redis_url = "redis://localhost:6379"

In [4]:
from redisvl.index import SearchIndex
from redisvl.utils.vectorize import HFTextVectorizer

emb_model = HFTextVectorizer()

# define schema
car_schema = {
    "index": {
        "name": "cars",
        "prefix": "cars"
    },
    "fields": [
        {"name": "item_id", "type": "tag"},
        {"name": "text", "type": "text"},
        {"name": "make", "type": "tag"},
        {"name": "model", "type": "tag"},
        {
            "name": "vector",
            "type": "vector",
            "attrs": {
                "dims": 768,
                "distance_metric": "cosine",
                "algorithm": "FLAT",
                "datatype": "float32"
            },
        },
    ]
}

# create index
index = SearchIndex.from_dict(car_schema, redis_url=redis_url)
index.create(overwrite=True)

embeddings = emb_model.embed_many([c["text"] for c in corpus], as_buffer=True)

# vectorize corpus data
corpus_data = [
    {
        "text": c["text"],
        "item_id": c["item_id"],
        "make": c["query_metadata"]["make"],
        "model": c["query_metadata"]["model"],
        "vector": embeddings[i]
    }
    for i, c in enumerate(corpus)
]

index.load(corpus_data)


15:06:21 datasets INFO   PyTorch version 2.3.0 available.
15:06:21 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:06:21 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2


['cars:01K3KWCRBPQ293ENR2HS3GEWR6',
 'cars:01K3KWCRBPBG49PSRYAH6M12VQ',
 'cars:01K3KWCRBPT2D3APPF093BXRJ9',
 'cars:01K3KWCRBPZNQX33WSW300WX6F',
 'cars:01K3KWCRBPVZC0Z3KMK6EKWWDV',
 'cars:01K3KWCRBP4G9VHHXAPE2BFYR3',
 'cars:01K3KWCRBP3TFADF56PE9Q8W84',
 'cars:01K3KWCRBP1BSCG9AFK2NXWNFZ',
 'cars:01K3KWCRBPST6M371T471VV3C9',
 'cars:01K3KWCRBP4ASBVN26QB1CFKS8',
 'cars:01K3KWCRBPF10S5KYA5R2MBMBP',
 'cars:01K3KWCRBPRRHC8F307E9KR87W',
 'cars:01K3KWCRBP1P82DECES6EZD5HF',
 'cars:01K3KWCRBPXYBFDBPA7QN50WVR',
 'cars:01K3KWCRBPW9KJ520HP9NQR64D',
 'cars:01K3KWCRBPT8A895SGWWZ5A5BV',
 'cars:01K3KWCRBPQFH43196CQ60YTGB',
 'cars:01K3KWCRBPPF88ZM6K8JXW434R',
 'cars:01K3KWCRBPB5DWDJQND33DC8NS',
 'cars:01K3KWCRBPV87CE2ADVPHN992H',
 'cars:01K3KWCRBP4SJZCP3SV318H4Y9',
 'cars:01K3KWCRBP2FW4AEVQ2SEHD1BK',
 'cars:01K3KWCRBPK11AZKCWY2HTBC6Y',
 'cars:01K3KWCRBPBB7FRRMJJFQK0655',
 'cars:01K3KWCRBPFB7ZRV8ZJSET4ZG9',
 'cars:01K3KWCRBPPW648R9FEZS63GSS',
 'cars:01K3KWCRBPD25R873Q9V2M174J',
 'cars:01K3KWCRBPT55486M2Q7M

# Check index created successfully

In [5]:
index.info()["num_docs"]

464

# Review search study config

- index_name should point to index created above
- qrels and queries should point to the queries and set of labeled queries under test
- search methods should match with the custom methods defined below
- embedding_model should match with the one used to create the index

In [6]:
from redis_retrieval_optimizer.utils import load_search_study_config

search_study_config = load_search_study_config("search_study_config.yaml")
search_study_config

SearchStudyConfig(study_id='test-search-study', index_name='cars', qrels='../resources/cars/car_qrels.json', queries='../resources/cars/car_queries.json', search_methods=['base_vector', 'pre_filter_vector'], ret_k=3, id_field_name='_id', vector_field_name='vector', text_field_name='text', embedding_model=EmbeddingModel(type='hf', model='sentence-transformers/all-mpnet-base-v2', dim=768, embedding_cache_name='vec-cache', dtype='float32'))

# Define search methods for search study

A search method can be anything as long as it takes a `SearchMethodInput` and returns a `SearchMethodOutput`. Below we will compare a basic vector search to a vector search with a pre-filter. 

In [7]:
from ranx import Run
from redis_retrieval_optimizer.search_methods.base import run_search_w_time
from redisvl.query import VectorQuery
from redisvl.query.filter import Tag

from redis_retrieval_optimizer.schema import SearchMethodInput, SearchMethodOutput
from redis_retrieval_optimizer.search_methods.vector import make_score_dict_vec

def vector_query(query_info, num_results: int, emb_model) -> VectorQuery:
    vector = emb_model.embed(query_info["query"], as_buffer=True)

    return VectorQuery(
        vector=vector,
        vector_field_name="vector",
        num_results=num_results,
        return_fields=["item_id", "make", "model", "text"],  # update to read from env maybe?
    )

def pre_filter_query(query_info, num_results, emb_model) -> VectorQuery:
    vec = emb_model.embed(query_info["query"])
    make = query_info["query_metadata"]["make"]
    model = query_info["query_metadata"]["model"]

    filter = (Tag("make") == make) & (Tag("model") == model)

    # Create a vector query
    query = VectorQuery(
        vector=vec,
        vector_field_name="vector",
        num_results=num_results,
        filter_expression=filter,
        return_fields=["item_id", "make", "model", "text"]
    )

    return query

def gather_pre_filter_results(search_method_input: SearchMethodInput) -> SearchMethodOutput:
    redis_res_vector = {}

    for key, query_info in search_method_input.raw_queries.items():
        # Format query to run
        query = pre_filter_query(query_info, search_method_input.ret_k, search_method_input.emb_model)
        
        # Run with search time
        res = run_search_w_time(
            search_method_input.index, query, search_method_input.query_metrics
        )

        # Generate search dict
        score_dict = make_score_dict_vec(res, id_field_name="item_id")

        redis_res_vector[key] = score_dict

    return SearchMethodOutput(
        run=Run(redis_res_vector),
        query_metrics=search_method_input.query_metrics,
    )


def gather_vector_results(search_method_input: SearchMethodInput) -> SearchMethodOutput:
    redis_res_vector = {}

    for key, query_info in search_method_input.raw_queries.items():
        vec_query = vector_query(query_info, search_method_input.ret_k, search_method_input.emb_model)
        
        res = run_search_w_time(
            search_method_input.index, vec_query, search_method_input.query_metrics
        )
        
        score_dict = make_score_dict_vec(res, id_field_name="item_id")
        redis_res_vector[key] = score_dict
        
    return SearchMethodOutput(
        run=Run(redis_res_vector),
        query_metrics=search_method_input.query_metrics,
    )


# Run the search study

In [8]:
from redis_retrieval_optimizer.search_study import run_search_study

# Note: must match with what's in the search_study_config.
SEARCH_METHOD_MAP = {
    "base_vector": gather_vector_results,
    "pre_filter_vector": gather_pre_filter_results
}

metrics = run_search_study(
    config_path="search_study_config.yaml",
    redis_url=redis_url,
    search_method_map=SEARCH_METHOD_MAP
)

Connecting to existing index: cars
Connected to index: cars with 464 objects
15:06:36 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:06:36 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2
Running search method: base_vector
Running search method: pre_filter_vector


In [9]:
metrics

Unnamed: 0,search_method,total_indexing_time,total_memory_mb,avg_query_time,recall,ndcg,f1,precision,ret_k
0,base_vector,119.24700164794922,17.250219,0.001397,0.675,0.668037,0.519683,0.466667,3
1,pre_filter_vector,119.24700164794922,17.250219,0.000441,0.741667,0.794658,0.581905,0.533333,3
