![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)
# Advanced queries with Redis
## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/vector-search/04_advanced_queries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Prepare data

In this examples we will load a list of movie objects with the following attributes: `title`, `rating`, `description`, and `genre`. 

For the vector part of our vector search we will embed the description so that user's can search for movies that best match what they're looking for.

**If you are running this notebook locally**, FYI you may not need to perform this step at all.

In [5]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/vector-search/resources .
!rm -rf temp_repo

Cloning into 'temp_repo'...
remote: Enumerating objects: 204, done.[K
remote: Counting objects: 100% (52/52), done.[K
remote: Compressing objects: 100% (28/28), done.[K
remote: Total 204 (delta 37), reused 24 (delta 24), pack-reused 152[K
Receiving objects: 100% (204/204), 9.47 MiB | 10.76 MiB/s, done.
Resolving deltas: 100% (64/64), done.
mv: temp_repo/python-recipes/vector-search/resources: No such file or directory


## Packages

In [None]:
%pip install -q redis numpy sentence-transformers

Note: you may need to restart the kernel to use updated packages.


## Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings created from PDF document chunks. **We need to make sure we have a Redis
instance available.

#### For Colab
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [None]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

#### For Alternative Environments
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [8]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Create redis client

In [11]:
from redis import Redis
client = Redis.from_url(REDIS_URL)
client.ping()

True

In [10]:
import json

with open("resources/movies.json", 'r') as file:
    movies = json.load(file)

In [4]:
import numpy as np
from sentence_transformers import SentenceTransformer

# load model for embedding our movie descriptions
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

def embed_text(model, text):
    return np.array(model.encode(text)).astype(np.float32).tobytes()



In [5]:
# Note: convert embedding array to bytes for storage in Redis Hash data type
movie_data = [
    {
        **movie,
        "vector": embed_text(model, movie["description"])
    } for movie in movies
]

In [6]:
movie_data[0]

{'id': 1,
 'title': 'Explosive Pursuit',
 'genre': 'action',
 'rating': 7,
 'description': 'A daring cop chases a notorious criminal across the city in a high-stakes game of cat and mouse.',
 'vector': b'\x8bf|=\xc3`\n;\xf2\x91\xb7;?\xcb~\xbd\xdfd\xce\xbb\xc7\x16J=H\xa7?=\xdfv\x95<h\xfa\x06\xbe\x11Y\xcf=)\x07p=E\xdb\r\xbd\x93\xf2H\xbdke\xc6<@\xdfa=o8\x16\xbc\xf1\xd3\x13<8\xaa\x1c=\x14\xef\x89<\xc1\xb0-<\x9d\xb2\x9f\xbc^\x0b\xc3\xbd\xa5NR=ol\xf7\xbcP>\x17\xbeA\x1e\x05\xb9Hu\xbf<B\xe3b\xba\xd8\xa6\xa8\xbd\x98\xdc\xec\xbc`c%=\x81\xe7r\xbb$OG=:(\x85=a@\xa2\xbc-Z\xd0\xbdB%K\xbd\xc8\xed\x94\xbcW\xddH=\x8e&F<\xde*\xec<\x8d\xd8\x8d\xbd\xbdZ\x98<\x14\xa3\xa3=>g3\xbd$\xcd\xbd\xbd\xa1$\xf7;\x04\xf5z=\xfc\xb4\x8c=\x89\x0e\xc6\xbdhI\x90\xbd^\x16\xbd;z\xe7\x0c\xbd\x1b3\xc9\xbc\x89\xf8\xbb\xbc\x18\'u\xbb>\x8f\xca<\x02\x80J=\x0e\xaf*=\x8dOU\xbd\xcf\xf0\x95\xbc \x02\x19=\x19\xf4K<\xc5\xc2\t=J\x83\xac=\x95\xd7\xb8\xbd\xf2\xb5\x9c\xbd=\x85\x18=\x94d&=03\xf8<\xee\xf7\x88<\x80v\xf2\xbb9=[\xbdG\xac\xee\xbb<

## Define Redis index schema

In [12]:
from redis.commands.search.field import VectorField, TagField, NumericField, TextField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType

index_name = "movies"

schema = (
    VectorField(
        "vector",
        "HNSW",
        {
            "TYPE": "FLOAT32",
            "DIM": 384,
            "DISTANCE_METRIC": "COSINE"
        }
        ),
        NumericField("rating"),
        TagField("id"),
        TagField("genre"),
        TextField("title"),
        TextField("description")
)

try:
    client.ft(index_name).info()
    print("Index exists!")
except:
    # index Definition
    definition = IndexDefinition(index_type=IndexType.HASH)

    # create Index
    client.ft(index_name).create_index(fields=schema, definition=definition)

## Populate index

In [13]:
def load_docs(client: Redis, data: list[dict]):
    for i, d in enumerate(data):
        client.hset(
            i,
            mapping = d
        )

def print_results(res):
    docs = [(doc.title, doc.genre, doc.rating) for doc in res.docs]
    print(f"Top {len(docs)} movies: ", docs)

In [14]:
load_docs(client, movie_data)

In [16]:
res = client.ft(index_name).search("*")
len(res.docs)

10

## Scenario 1: Spell checking. 

Example:

I/p : Depicable me

o/p : Despicable me

In [None]:
from redis.commands.search.query import Query

user_query = "Depicable me"

client.ft(index_name).config_set("DEFAULT_DIALECT", 2)
res = client.ft(index_name).spellcheck(user_query)
res


{b'depicable': [{'score': b'0.05', 'suggestion': b'despicable'}]}

## Scenario 2:

1) Query a cluster

2) limit the number of results(rows)

3) Ability to query certain type of document(fq=type:Field1)

4) Return only specific fields in the response (fl=id,type,key,value,score).

Examples:

Note: In redis the most similar concept to a cluster would be an index.

- qt=/clustername&q=example question&rows=10
- qt=/search&group.limit=500&fl=id,type,key,value,score&q=sample query&fq=type:Field1

In [51]:
index_name = "movies" # variable

# fuzzy search
user_query = Query("Criminal mastermind")\
                .scorer("BM25STD") \
                .with_scores() \
                .return_fields("title", "genre", "rating", "description") \
                .paging(0, 10) # limits the amount of results to 10

res = client.ft(index_name).search(user_query)
res.docs

[Document {'id': '6', 'payload': None, 'score': 6.267822483123378, 'title': 'The Dark Knight', 'genre': 'action', 'rating': '9', 'description': 'Batman faces off against the Joker, a criminal mastermind who threatens to plunge Gotham into chaos.'},
 Document {'id': '17', 'payload': None, 'score': 5.846220066150412, 'title': 'Despicable Me', 'genre': 'comedy', 'rating': '7', 'description': 'When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.'}]

## Scenario 3: Ability to query by adding a weight. 

Field 1 should have twice the weight compared to field 2.

q=field1:("sample data value 1")^2+field2:("sample data value 2")

In [None]:
# Search for top 3 movies specifically in the action genre with ratings at or above a 7
user_query1 = "High tech"
user_query2 = "heroes villains"

def tokenize(query):
    return " | ".join(query.split(" ")).lower()

query = Query(f'((@description:({tokenize(user_query1)})=>{{$weight: 1}}) | (@description:({tokenize(user_query2)})=>{{$weight: 10}}))') \
        .return_fields("title", "genre", "rating", "description") \
        .dialect(2)

res = client.ft(index_name).search(query)
res.docs

[Document {'id': '9', 'payload': None, 'title': 'The Avengers', 'genre': 'action', 'rating': '8', 'description': "Earth's mightiest heroes come together to stop an alien invasion that threatens the entire planet."},
 Document {'id': '2', 'payload': None, 'title': 'Fast & Furious 9', 'genre': 'action', 'rating': '6', 'description': 'Dom and his crew face off against a high-tech enemy with advanced weapons and technology.'},
 Document {'id': '0', 'payload': None, 'title': 'Explosive Pursuit', 'genre': 'action', 'rating': '7', 'description': 'A daring cop chases a notorious criminal across the city in a high-stakes game of cat and mouse.'}]

## Scenario 4: Ability to have logical operations

q=field1:("sample data value 1")^ || field2:("sample data value 2")

q=field1:("sample data value 1")^ && field2:("sample data value 2")

In [48]:
# Search with full text search for movies that directly mention "criminal mastermind" in the description

user_query1 = "High tech"
user_query2 = "heroes villains"

def tokenize(query):
    return " | ".join(query.split(" ")).lower()

query = Query(f'((@rating:[7, inf+]) & (@description:({tokenize(user_query1)})=>{{$weight: 1}}) | (@description:({tokenize(user_query2)})=>{{$weight: 10}}))') \
        .return_fields("title", "genre", "rating", "description") \
        .dialect(2)

res = client.ft(index_name).search(query)
res.docs

[Document {'id': '9', 'payload': None, 'title': 'The Avengers', 'genre': 'action', 'rating': '8', 'description': "Earth's mightiest heroes come together to stop an alien invasion that threatens the entire planet."},
 Document {'id': '0', 'payload': None, 'title': 'Explosive Pursuit', 'genre': 'action', 'rating': '7', 'description': 'A daring cop chases a notorious criminal across the city in a high-stakes game of cat and mouse.'}]

## Scenario 5: Ability to specific type of algorithm.   

q=field1:("sample data value 1")^ && field2:("sample data value 2")&defType=edismax 

In [59]:
index_name = "movies" # variable

user_query = "cowboys and aliens"

user_query = Query(tokenize(user_query)) \
                .scorer("TFIDF") \
                .with_scores() \
                .return_fields("title", "genre", "rating", "description") \
                .paging(0, 10) # limits the amount of results to 10

res = client.ft(index_name).search(user_query)
res.docs

[Document {'id': '9', 'payload': None, 'score': 4.0, 'title': 'The Avengers', 'genre': 'action', 'rating': '8', 'description': "Earth's mightiest heroes come together to stop an alien invasion that threatens the entire planet."},
 Document {'id': '10', 'payload': None, 'score': 1.3333333333333333, 'title': 'Toy Story', 'genre': 'comedy', 'rating': '8', 'description': "Woody, a good-hearted cowboy doll who belongs to a young boy named Andy, sees his position as Andy's favorite toy jeopardized when his parents buy him a Buzz Lightyear action figure. Even worse, the arrogant Buzz thinks he's a real spaceman on a mission to return to his home planet."}]

## Scenario 6: Ability to query certain type of document which value greater than certain number

q=field1:("sample data value 1")^ && field2:("sample data value 2")&fq=field3:[50+TO+*] 

=> shown with ratings with scenario 4

## Scenario 7: Ability to return score for each document.

=> scores return shown in other examples

## Scenario 8: vector search

Beyond the capabilities shown here for feature parody, redis offers vector search which can combined with all the methods above to facilitate more powerful queries. Implementing vector search is very easy (especially with the [redis vector library](https://github.com/redis/redis-vl-python)). 

In the following example you can see how event thought the word **sentimental** doesn't show up anywhere in the corpus we're still able to effectively find movies that match the semantic meaning of that word.

In [64]:
from redis.commands.search.query import Query

user_query = "Sentimental movies"

embedded_user_query = embed_text(model, user_query)

# Note: dialect 2 and above required for vector search
query = Query("(*)=>[KNN 3 @vector $vec_param AS dist]").sort_by("dist").dialect(2).return_fields("title", "genre", "rating", "description").paging(0, 3)

res = client.ft(index_name).search(query, query_params = {'vec_param': embedded_user_query})
res.docs

[Document {'id': '17', 'payload': None, 'title': 'Despicable Me', 'genre': 'comedy', 'rating': '7', 'description': 'When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.'},
 Document {'id': '3', 'payload': None, 'title': 'Black Widow', 'genre': 'action', 'rating': '7', 'description': 'Natasha Romanoff confronts her dark past and family ties as she battles a new enemy.'},
 Document {'id': '15', 'payload': None, 'title': 'The Incredibles', 'genre': 'comedy', 'rating': '8', 'description': "A family of undercover superheroes, while trying to live the quiet suburban life, are forced into action to save the world. Bob Parr (Mr. Incredible) and his wife Helen (Elastigirl) were among the world's greatest crime fighters, but now they must assume civilian identities and retreat to the suburbs to live a 'normal' life with their three children. However, the family's desire to help the world pulls them back

## Range queries

Range queries allow you to set a pre defined "threshold" for which we want to return documents. This is helpful when you only want documents with a certain distance from the search query.

In [65]:
user_query = "Family friendly fantasy movies"

embedded_user_query = embed_text(model, user_query)

query = (
    Query("@vector:[VECTOR_RANGE $radius $vector]=>{$YIELD_DISTANCE_AS: vector_distance}")
     .sort_by("vector_distance")
     .return_fields("title", "rating", "genre", "vector_distance")
     .dialect(2)
)

# Find all vectors within 0.8 of the query vector
query_params = {
    "radius": 0.8,
    "vector": embedded_user_query
}

res = client.ft(index_name).search(query, query_params)
print_results(res)


Top 6 movies:  [('The Incredibles', 'comedy', '8'), ('Black Widow', 'action', '7'), ('Despicable Me', 'comedy', '7'), ('Shrek', 'comedy', '8'), ('Monsters, Inc.', 'comedy', '8'), ('Aladdin', 'comedy', '8')]


Like the queries above, we can also chain additional filters and conditional operators with range queries. The following adds an `or` condition that returns vector search within the defined range or with a rating at or above 9.

In [66]:
user_query = "Family friendly fantasy movies"

embedded_user_query = embed_text(model, user_query)

query = (
    Query("@rating:[9 +inf] | @vector:[VECTOR_RANGE $radius $vector]=>{$YIELD_DISTANCE_AS: vector_distance}")
     .sort_by("vector_distance")
     .return_fields("title", "rating", "genre", "vector_distance")
     .dialect(2)
)

# Find all vectors within 0.8 of the query vector
query_params = {
    "radius": 0.7,
    "vector": embedded_user_query
}

res = client.ft(index_name).search(query, query_params)
print_results(res)

Top 3 movies:  [('The Incredibles', 'comedy', '8'), ('The Dark Knight', 'action', '9'), ('Inception', 'action', '9')]


In [20]:
# clean up!
client.flushall()

True