# Complex Queries

In this notebook, we will explore more complex queries that can be performed with ``redisvl``

Before running this notebook, be sure to
1. Have installed ``redisvl`` and have that environment active for this notebook.
2. Have a running Redis instance with RediSearch > 2.4 running.

In [1]:
import numpy as np
from pprint import pprint

data = [
    {'user': 'john', 'age': 18, 'job': 'engineer', 'credit_score': 'high'},
    {'user': 'derrick', 'age': 14, 'job': 'doctor', 'credit_score': 'low'},
    {'user': 'nancy', 'age': 94, 'job': 'doctor', 'credit_score': 'high'},
    {'user': 'tyler', 'age': 100, 'job': 'engineer', 'credit_score': 'high'},
    {'user': 'tim', 'age': 12, 'job': 'dermatologist', 'credit_score': 'high'},
    {'user': 'taimur', 'age': 15, 'job': 'CEO', 'credit_score': 'low'},
    {'user': 'joe', 'age': 35, 'job': 'dentist', 'credit_score': 'medium'}
]

In [2]:
# converted to bytes for redis
vectors = [
    np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.7, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.1, 0.4, 0.5], dtype=np.float32).tobytes(),
    np.array([0.4, 0.4, 0.5], dtype=np.float32).tobytes(),
    np.array([0.6, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes(),
]

for record, vector in zip(data, vectors):
    record["user_embedding"] = vector

pprint(data)

[{'age': 18,
  'credit_score': 'high',
  'job': 'engineer',
  'user': 'john',
  'user_embedding': b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'},
 {'age': 14,
  'credit_score': 'low',
  'job': 'doctor',
  'user': 'derrick',
  'user_embedding': b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'},
 {'age': 94,
  'credit_score': 'high',
  'job': 'doctor',
  'user': 'nancy',
  'user_embedding': b'333?\xcd\xcc\xcc=\x00\x00\x00?'},
 {'age': 100,
  'credit_score': 'high',
  'job': 'engineer',
  'user': 'tyler',
  'user_embedding': b'\xcd\xcc\xcc=\xcd\xcc\xcc>\x00\x00\x00?'},
 {'age': 12,
  'credit_score': 'high',
  'job': 'dermatologist',
  'user': 'tim',
  'user_embedding': b'\xcd\xcc\xcc>\xcd\xcc\xcc>\x00\x00\x00?'},
 {'age': 15,
  'credit_score': 'low',
  'job': 'CEO',
  'user': 'taimur',
  'user_embedding': b'\x9a\x99\x19?\xcd\xcc\xcc=\x00\x00\x00?'},
 {'age': 35,
  'credit_score': 'medium',
  'job': 'dentist',
  'user': 'joe',
  'user_embedding': b'fff?fff?\xcd\xcc\xcc='}]


In [3]:
schema = {
    "index": {
        "name": "user_index",
        "prefix": "v1",
        "key_field": "user",
        "storage_type": "hash",
    },
    "fields": {
        "tag": [{"name": "credit_score"}],
        "text": [{"name": "job"}],
        "numeric": [{"name": "age"}],
        "vector": [{
                "name": "user_embedding",
                "dims": 3,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32"}
        ]
    },
}


In [4]:
from redisvl.index import SearchIndex

# construct a search index from the schema
index = SearchIndex.from_dict(schema)

# connect to local redis instance
index.connect("redis://localhost:6379")

# create the index (no data yet)
index.create(overwrite=True)

In [5]:
# use the CLI to see the created index
!rvl index listall

[32m16:36:51[0m [35msam.partee-NW9MQX5Y74[0m [34mredisvl.cli.index[74676][0m [1;30mINFO[0m Indices:
[32m16:36:51[0m [35msam.partee-NW9MQX5Y74[0m [34mredisvl.cli.index[74676][0m [1;30mINFO[0m 1. user_index
[32m16:36:51[0m [35msam.partee-NW9MQX5Y74[0m [34mredisvl.cli.index[74676][0m [1;30mINFO[0m 2. my_index


In [6]:
# load expects an iterable of dictionaries
index.load(data)

## Executing Hybrid Queries

Hybrid queries are queries that combine multiple types of filters. For example, you may want to search for a user that is a certain age, has a certain job, and is within a certain distance of a location. This is a hybrid query that combines numeric, tag, and geographic filters.

### Tag Filters

Tag filters are filters that are applied to tag fields. These are fields that are not tokenized and are used to store a single categorical value.

In [7]:
from redisvl.query import VectorQuery, TagFilter, NumericFilter

t = TagFilter("credit_score", "high")

v = VectorQuery([0.1, 0.1, 0.5],
                "user_embedding",
                return_fields=["user", "credit_score", "age", "job"],
                hybrid_filter=t)


results = index.search(v.query, query_params=v.params)
for doc in results.docs:
    print(doc)

Document {'id': 'v1:john', 'payload': None, 'vector_distance': '0', 'user': 'john', 'credit_score': 'high', 'age': '18', 'job': 'engineer'}
Document {'id': 'v1:tyler', 'payload': None, 'vector_distance': '0.109129190445', 'user': 'tyler', 'credit_score': 'high', 'age': '100', 'job': 'engineer'}
Document {'id': 'v1:tim', 'payload': None, 'vector_distance': '0.158809006214', 'user': 'tim', 'credit_score': 'high', 'age': '12', 'job': 'dermatologist'}
Document {'id': 'v1:nancy', 'payload': None, 'vector_distance': '0.266666650772', 'user': 'nancy', 'credit_score': 'high', 'age': '94', 'job': 'doctor'}


### Numeric Filters

Numeric filters are filters that are applied to numeric fields and can be used to isolate a range of values for a given field.

In [8]:
n = NumericFilter("age", 18, 100)

v.set_filter(n)

results = index.search(v.query, query_params=v.params)
for doc in results.docs:
    print(doc)

Document {'id': 'v1:john', 'payload': None, 'vector_distance': '0', 'user': 'john', 'credit_score': 'high', 'age': '18', 'job': 'engineer'}
Document {'id': 'v1:tyler', 'payload': None, 'vector_distance': '0.109129190445', 'user': 'tyler', 'credit_score': 'high', 'age': '100', 'job': 'engineer'}
Document {'id': 'v1:nancy', 'payload': None, 'vector_distance': '0.266666650772', 'user': 'nancy', 'credit_score': 'high', 'age': '94', 'job': 'doctor'}
Document {'id': 'v1:joe', 'payload': None, 'vector_distance': '0.653301358223', 'user': 'joe', 'credit_score': 'medium', 'age': '35', 'job': 'dentist'}


### Text Filters

Text filters are filters that are applied to text fields. These filters are applied to the entire text field. For example, if you have a text field that contains the text "The quick brown fox jumps over the lazy dog", a text filter of "quick" will match this text field.

In [9]:
from redisvl.query import TextFilter

text_filter = TextFilter("job", "doctor")
v.set_filter(text_filter)

results = index.search(v.query, query_params=v.params)
for doc in results.docs:
    print(doc)

Document {'id': 'v1:derrick', 'payload': None, 'vector_distance': '0', 'user': 'derrick', 'credit_score': 'low', 'age': '14', 'job': 'doctor'}
Document {'id': 'v1:nancy', 'payload': None, 'vector_distance': '0.266666650772', 'user': 'nancy', 'credit_score': 'high', 'age': '94', 'job': 'doctor'}


## Combining Filters

In this example, we will combine a numeric filter with a tag filter. We will search for users that are between the ages of 20 and 30 and have a job of "engineer".

In [10]:
t = TagFilter("credit_score", "high")
n = NumericFilter("age", 18, 100)
t += n

v = VectorQuery([0.1, 0.1, 0.5],
                "user_embedding",
                return_fields=["user", "credit_score", "age", "job", "vector_distance"],
                hybrid_filter=t)


results = index.search(v.query, query_params=v.params)
for doc in results.docs:
    print(doc)

Document {'id': 'v1:john', 'payload': None, 'vector_distance': '0', 'user': 'john', 'credit_score': 'high', 'age': '18', 'job': 'engineer'}
Document {'id': 'v1:tyler', 'payload': None, 'vector_distance': '0.109129190445', 'user': 'tyler', 'credit_score': 'high', 'age': '100', 'job': 'engineer'}
Document {'id': 'v1:nancy', 'payload': None, 'vector_distance': '0.266666650772', 'user': 'nancy', 'credit_score': 'high', 'age': '94', 'job': 'doctor'}


### Negation

The next example will combine the tag field with a negation. We will search for users that are in a numeric range.

In [11]:
t = TagFilter("credit_score", "high")
n = NumericFilter("age", 18, 100)
t -= n

v.set_filter(t)

results = index.search(v.query, query_params=v.params)
for doc in results.docs:
    print(doc)

Document {'id': 'v1:tim', 'payload': None, 'vector_distance': '0.158809006214', 'user': 'tim', 'credit_score': 'high', 'age': '12', 'job': 'dermatologist'}


### Union of Filters

This example will show how to combine multiple filters with a union. We will search for users that are either between the ages of 18 to 100 and have a high credit score.

In [12]:
t = TagFilter("credit_score", "high")
n = NumericFilter("age", 18, 100)
t &= n

v.set_filter(t)

results = index.search(v.query, query_params=v.params)
for doc in results.docs:
    print(doc)

Document {'id': 'v1:john', 'payload': None, 'vector_distance': '0', 'user': 'john', 'credit_score': 'high', 'age': '18', 'job': 'engineer'}
Document {'id': 'v1:tyler', 'payload': None, 'vector_distance': '0.109129190445', 'user': 'tyler', 'credit_score': 'high', 'age': '100', 'job': 'engineer'}
Document {'id': 'v1:tim', 'payload': None, 'vector_distance': '0.158809006214', 'user': 'tim', 'credit_score': 'high', 'age': '12', 'job': 'dermatologist'}
Document {'id': 'v1:nancy', 'payload': None, 'vector_distance': '0.266666650772', 'user': 'nancy', 'credit_score': 'high', 'age': '94', 'job': 'doctor'}
Document {'id': 'v1:joe', 'payload': None, 'vector_distance': '0.653301358223', 'user': 'joe', 'credit_score': 'medium', 'age': '35', 'job': 'dentist'}
