# SQLQuery class

It may arise that you want to use SQL-like queries to interact with your Redis vector database. While Redis does not natively support SQL, the `redisvl` library provides a `SQLQuery` class that allows you to write SQL-like queries that are automatically translated into Redis queries.

The `SQLQuery` class is a wrapper around the `sql-redis` package, which provides a SQL-to-Redis query translator. The `sql-redis` package is not installed by default with `redisvl`, so you will need to install with the optional syntax:

In [14]:
%pip install redisvl[sql]

zsh:1: no matches found: redisvl[sql]
Note: you may need to restart the kernel to use updated packages.


## Create an index to search

In [15]:
schema = {
    "index": {
        "name": "user_simple",
        "prefix": "user_simple_docs",
    },
    "fields": [
        {"name": "user", "type": "tag"},
        {"name": "credit_score", "type": "tag"},
        {"name": "job", "type": "text"},
        {"name": "age", "type": "numeric"},
        {
            "name": "user_embedding",
            "type": "vector",
            "attrs": {
                "dims": 3,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32"
            }
        }
    ]
}

## Sample Dataset Preparation

Below, create a mock dataset with `user`, `job`, `age`, `credit_score`, and
`user_embedding` fields. The `user_embedding` vectors are synthetic examples
for demonstration purposes.

For more information on creating real-world embeddings, refer to this
[article](https://mlops.community/vector-similarity-search-from-basics-to-production/).

In [16]:
import numpy as np


data = [
    {
        'user': 'john',
        'age': 34,
        'job': 'engineer',
        'credit_score': 'high',
        'user_embedding': np.array([0.4, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'bill',
        'age': 54,
        'job': 'engineer',
        'credit_score': 'low',
        'user_embedding': np.array([0.3, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'mary',
        'age': 24,
        'job': 'doctor',
        'credit_score': 'low',
        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'joe',
        'age': 17,
        'job': 'dentist',
        'credit_score': 'medium',
        'user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes()
    }
    ,
    {
        'user': 'stacy',
        'age': 61,
        'job': 'dentist',
        'credit_score': 'high',
        'user_embedding': np.array([0.9, 1.0, 0.1], dtype=np.float32).tobytes()
    }
]

## Create a `SearchIndex`

With the schema and sample dataset ready, create a `SearchIndex`.

### Bring your own Redis connection instance

This is ideal in scenarios where you have custom settings on the connection instance or if your application will share a connection pool:

In [17]:
from redisvl.index import SearchIndex
from redis import Redis

client = Redis.from_url("redis://localhost:6379")
index = SearchIndex.from_dict(schema, redis_client=client, validate_on_load=True)

### Let the index manage the connection instance

This is ideal for simple cases:

In [18]:
index = SearchIndex.from_dict(schema, redis_url="redis://localhost:6379", validate_on_load=True)

### Create the index

Now that we are connected to Redis, we need to run the create command.

In [19]:
index.create(overwrite=True)

## Load Data to `SearchIndex`

Load the sample dataset to Redis.

### Validate data entries on load
RedisVL uses pydantic validation under the hood to ensure loaded data is valid and confirms to your schema. This setting is optional and can be configured in the `SearchIndex` class.

In [20]:
keys = index.load(data)

print(keys)

['user_simple_docs:01KG0JR6VWCHVRCX78T96VT6GE', 'user_simple_docs:01KG0JR6VWCHVRCX78T96VT6GF', 'user_simple_docs:01KG0JR6VXSJEHX9P3ZMR3917Y', 'user_simple_docs:01KG0JR6VXSJEHX9P3ZMR3917Z', 'user_simple_docs:01KG0JR6VXSJEHX9P3ZMR39180']


## Create a `SQLQuery` Object

First, let's test a simple select statement such as the one below.

In [21]:
from redisvl.query import SQLQuery

sql_str = """
    SELECT user, credit_score, job, age
    FROM user_simple
    WHERE age > 17
    """

sql_query = SQLQuery(sql_str) 

## Check the created query string

In [22]:
sql_query.redis_query_string(redis_url="redis://localhost:6379")

'FT.SEARCH user_simple "@age:[(17 +inf]" RETURN 4 user credit_score job age'

### Executing the query

In [23]:
results = index.query(sql_query)
results

[{'user': 'john', 'credit_score': 'high', 'job': 'engineer', 'age': '34'},
 {'user': 'bill', 'credit_score': 'low', 'job': 'engineer', 'age': '54'},
 {'user': 'mary', 'credit_score': 'low', 'job': 'doctor', 'age': '24'},
 {'user': 'stacy', 'credit_score': 'high', 'job': 'dentist', 'age': '61'}]

## Additional query support

### Conditional operators

In [24]:
sql_str = """
    SELECT user, credit_score, job, age
    FROM user_simple
    WHERE age > 17 and credit_score = 'high'
    """

# could maybe be nice to set a connection string at the class level
# this would deviate from our other query like classes though so thinking on it
sql_query = SQLQuery(sql_str)
redis_query = sql_query.redis_query_string(redis_url="redis://localhost:6379")
print("Resulting redis query: ", redis_query)
results = index.query(sql_query)
results

Resulting redis query:  FT.SEARCH user_simple "@age:[(17 +inf] @credit_score:{high}" RETURN 4 user credit_score job age


[{'user': 'john', 'credit_score': 'high', 'job': 'engineer', 'age': '34'},
 {'user': 'stacy', 'credit_score': 'high', 'job': 'dentist', 'age': '61'}]

In [25]:
sql_str = """
    SELECT user, credit_score, job, age
    FROM user_simple
    WHERE credit_score = 'high' or credit_score = 'low'
    """

sql_query = SQLQuery(sql_str)
redis_query = sql_query.redis_query_string(redis_url="redis://localhost:6379")
print("Resulting redis query: ", redis_query)
results = index.query(sql_query)
results

Resulting redis query:  FT.SEARCH user_simple "((@credit_score:{high})|(@credit_score:{low}))" RETURN 4 user credit_score job age


[{'user': 'john', 'credit_score': 'high', 'job': 'engineer', 'age': '34'},
 {'user': 'bill', 'credit_score': 'low', 'job': 'engineer', 'age': '54'},
 {'user': 'mary', 'credit_score': 'low', 'job': 'doctor', 'age': '24'},
 {'user': 'stacy', 'credit_score': 'high', 'job': 'dentist', 'age': '61'}]

In [38]:
sql_str = """
    SELECT user, credit_score, job, age
    FROM user_simple
    WHERE user IN ('mary', 'john')
    """

# could maybe be nice to set a connection string at the class level
# this would deviate from our other query like classes though so thinking on it
sql_query = SQLQuery(sql_str)
redis_query = sql_query.redis_query_string(redis_url="redis://localhost:6379")
print("Resulting redis query: ", redis_query)
results = index.query(sql_query)
results

Resulting redis query:  FT.SEARCH user_simple "@user:{mary|john}" RETURN 4 user credit_score job age


[{'user': 'john', 'credit_score': 'high', 'job': 'engineer', 'age': '34'},
 {'user': 'mary', 'credit_score': 'low', 'job': 'doctor', 'age': '24'}]

In [39]:
sql_str = """
    SELECT user, credit_score, job, age
    FROM user_simple
    WHERE age BETWEEN 40 and 60
    """

# could maybe be nice to set a connection string at the class level
# this would deviate from our other query like classes though so thinking on it
sql_query = SQLQuery(sql_str)
redis_query = sql_query.redis_query_string(redis_url="redis://localhost:6379")
print("Resulting redis query: ", redis_query)
results = index.query(sql_query)
results

Resulting redis query:  FT.SEARCH user_simple "@age:[40 60]" RETURN 4 user credit_score job age


[{'user': 'bill', 'credit_score': 'low', 'job': 'engineer', 'age': '54'}]

### Aggregations

In [None]:
# TODO: check all operations these aren't working currently
# STDEV(age) as std_age
# FIRSTVALUE(age) as first_value_age
# COUNT(age) as count_age

sql_str = """
    SELECT
        user,
        MAX(age) as max_age,
        AVG(age) as avg_age,
        MIN(age) as min_age,
    FROM user_simple
    GROUP BY credit_score
    """

sql_query = SQLQuery(sql_str)
redis_query = sql_query.redis_query_string(redis_url="redis://localhost:6379")
print("Resulting redis query: ", redis_query)
results = index.query(sql_query)
results

Resulting redis query:  FT.AGGREGATE user_simple "*" LOAD 2 age credit_score GROUPBY 1 @credit_score REDUCE MAX 1 @age AS max_age REDUCE AVG 1 @age AS avg_age REDUCE MIN 1 @age AS min_age


[{'credit_score': 'high', 'max_age': '61', 'avg_age': '47.5', 'min_age': '34'},
 {'credit_score': 'medium', 'max_age': '17', 'avg_age': '17', 'min_age': '17'},
 {'credit_score': 'low', 'max_age': '54', 'avg_age': '39', 'min_age': '24'}]

### Vector search

In [None]:
# TODO: this also doesn't give me a means to specify what distance type I mean
# it should also support the pgvector type syntax
sql_str = """
    SELECT user, vector_distance(user_embedding, :vec) AS vector_distance
    FROM user_simple
    ORDER BY vector_distance ASC
    """

# pass vector as parameter
# TODO: I think this can function closer to the vector query
vec = np.array([1, 1, 1], dtype=np.float32).tobytes()
sql_query = SQLQuery(sql_str, params={"vec": vec})

redis_query = sql_query.redis_query_string(redis_url="redis://localhost:6379")
print("Resulting redis query: ", redis_query)
results = index.query(sql_query)

results

Resulting redis query:  FT.SEARCH user_simple "*=>[KNN 10 @user_embedding $vector AS vector_distance]" PARAMS 2 vector $vector DIALECT 2 RETURN 2 user vector_distance SORTBY vector_distance ASC


[{'vector_distance': '0.14079028368', 'user': 'joe'},
 {'vector_distance': '0.14079028368', 'user': 'stacy'},
 {'vector_distance': '0.222222208977', 'user': 'john'},
 {'vector_distance': '0.222222208977', 'user': 'bill'},
 {'vector_distance': '0.222222208977', 'user': 'mary'}]

In [None]:
# TODO: this also doesn't give me a means to specify what distance type I mean
# it should also support the pgvector type syntax
sql_str = """
    SELECT user, cosine_distance(user_embedding, :vec) AS vector_distance
    FROM user_simple
    ORDER BY vector_distance DESC
    """

# pass vector as parameter
# TODO: I think this can function closer to the vector query
vec = np.array([0.5, 0.1, 0.5], dtype=np.float32).tobytes()
sql_query = SQLQuery(sql_str, params={"vec": vec})

redis_query = sql_query.redis_query_string(redis_url="redis://localhost:6379")
print("Resulting redis query: ", redis_query)
results = index.query(sql_query)

results

Resulting redis query:  FT.SEARCH user_simple "*=>[KNN 10 @user_embedding $vector AS vector_distance]" PARAMS 2 vector $vector DIALECT 2 RETURN 2 user vector_distance SORTBY vector_distance DESC


[{'vector_distance': '0.352897465229', 'user': 'stacy'},
 {'vector_distance': '0.352897465229', 'user': 'joe'},
 {'vector_distance': '0.164599537849', 'user': 'mary'},
 {'vector_distance': '0.164599537849', 'user': 'bill'},
 {'vector_distance': '0.164599537849', 'user': 'john'}]

## Cleanup

Below we will clean up after our work. First, you can flush all data from Redis associated with the index by
using the `.clear()` method. This will leave the secondary index in place for future insertions or updates.

But if you want to clean up everything, including the index, just use `.delete()`
which will by default remove the index AND the underlying data.

In [None]:
# Clear all data from Redis associated with the index
# await index.clear()

In [None]:
# Butm the index is still in place
# await index.exists()

In [None]:
# Remove / delete the index in its entirety
# await index.delete()