![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)
# RedisVL 0.5.0 - Release overview

This notebook provides an overview of what's new with the 0.5.0 release of redisvl. It also highlights changes and potential enhancements for existing usage.

<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/redisvl-release/0.5.0_release_overview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What's new?

- Hybrid query and text query classes
- Threshold optimizer classes
- Schema validation
- Timestamp filters
- Batched queries
- Vector normalization
- Hybrid policy on knn with filters

# Env setup

## Install Redis Stack

#### For Colab
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [None]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

#### For Alternative Environments
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [1]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

# Install redisvl 0.5.0

In [2]:
%pip install git+https://github.com/redis/redis-vl-python.git@0.5.0

Collecting git+https://github.com/redis/redis-vl-python.git@0.5.0
  Cloning https://github.com/redis/redis-vl-python.git (to revision 0.5.0) to /private/var/folders/_g/rr4lnxxx1_z7m78lz89dhvsm0000gp/T/pip-req-build-8zytawrt
  Running command git clone --filter=blob:none --quiet https://github.com/redis/redis-vl-python.git /private/var/folders/_g/rr4lnxxx1_z7m78lz89dhvsm0000gp/T/pip-req-build-8zytawrt
  Running command git checkout -b 0.5.0 --track origin/0.5.0
  Switched to a new branch '0.5.0'
  branch '0.5.0' set up to track 'origin/0.5.0'.
  Resolved https://github.com/redis/redis-vl-python.git to commit 7ffe89e27e4783fe38c94c7b09ba436e9614ac51
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: redisvl
  Building wheel for redisvl (pyproject.toml) ... [?25ldone
[?25h  Created wheel for redisvl: filename=redisvl-0.4.1-py3-none-any

# Hybrid query and text query classes

In 0.5.0 we introduced classes to make it easier to perform lexical search in redis both standalone and combined with vector search.

> TODO: update hybrid search notebook to use the class and make sure it works the same

# Threshold optimization

In redis 0.5.0 we added the ability to quickly configure either you're semantic cache or semantic router with test data examples. This requires a bit of setup so check out:

See [semantic-cache/02_semantic_cache_optimization.ipynb](../semantic-cache/02_semantic_cache_optimization.ipynb) and [semantic-router/01_routing_optimization.ipynb](../semantic-router/01_routing_optimization.ipynb) for the full implementation details. 

# Schema validation

This feature makes it easier to make sure your data is in the right format.

In [10]:
from redisvl.index import SearchIndex

# sample schema
car_schema = {
    "index": {
        "name": "cars",
        "prefix": "cars",
        "storage_type": "json",
    },
    "fields": [
        {"name": "make", "type": "text"},
        {"name": "model", "type": "text"},
        {"name": "description", "type": "text"},
        {"name": "mpg", "type": "numeric"},
        {
            "name": "car_embedding",
            "type": "vector",
            "attrs": {
                "dims": 3,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32"
            }

        }
    ],
}

sample_data_bad = [
    {
        "make": "Toyota",
        "model": "Camry",
        "description": "A reliable sedan with great fuel economy.",
        "mpg": 28,
        "car_embedding": [0.1, 0.2, 0.3]
    },
    {
        # missing make and model
        "description": "A luxury SUV with advanced technology.",
        "mpg": 22,
        "car_embedding": [0.4, 0.5, 0.6]
    }
]

# this should now throw an error
index = SearchIndex.from_dict(car_schema, redis_url=REDIS_URL, validate_on_load=True)
index.create(overwrite=True)
index.load(sample_data_bad)

[32m09:35:07[0m [34mredisvl.index.index[0m [1;30mINFO[0m   Index already exists, overwriting.


['cars:01JQRS067CVA87WKDVE4GXB9Y7', 'cars:01JQRS0699VDN8WB82VWHWFJ7B']

# Timestamp filters

In Redis datetime objects are stored as numeric epoch times. Timestamp filter makes it easier to handle querying by these fields by handling conversion for you.

In [12]:
# populate example 
from redisvl.utils.vectorize import HFTextVectorizer
from redisvl.index import SearchIndex
import datetime as dt

emb_model = HFTextVectorizer()

job_data = [
  {
    "job_title": "Software Engineer",
    "job_description": "Develop and maintain web applications using JavaScript, React, and Node.js.",
    "posted": (dt.datetime.now() - dt.timedelta(days=1)).timestamp() # day ago
  },
  {
    "job_title": "Data Analyst",
    "job_description": "Analyze large datasets to provide business insights and create data visualizations.",
    "posted": (dt.datetime.now() - dt.timedelta(days=7)).timestamp() # week ago
  },
  {
    "job_title": "Marketing Manager",
    "job_description": "Develop and implement marketing strategies to drive brand awareness and customer engagement.",
    "posted": (dt.datetime.now() - dt.timedelta(days=30)).timestamp() # month ago
  }
]

job_data = [{**job, "job_embedding": emb_model.embed(job["job_description"], as_buffer=True)} for job in job_data]


job_schema = {
    "index": {
        "name": "jobs",
        "prefix": "jobs",
        "storage_type": "hash", # default setting -- HASH
    },
    "fields": [
        {"name": "job_title", "type": "text"},
        {"name": "job_description", "type": "text"},
        {"name": "posted", "type": "numeric"},
        {
            "name": "job_embedding",
            "type": "vector",
            "attrs": {
                "dims": 768,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32"
            }

        }
    ],
}

index = SearchIndex.from_dict(job_schema, redis_url=REDIS_URL)
index.create(overwrite=True, drop=True)
index.load(job_data)



[32m09:48:49[0m [34mredisvl.index.index[0m [1;30mINFO[0m   Index already exists, overwriting.


['jobs:01JQRSS9E2ENS2J2NSHEJS0THA',
 'jobs:01JQRSS9E2E7WXW5CQEB5VZZG8',
 'jobs:01JQRSS9E2J9YTY5DSFSF5HT0T']

## Filter by Datetime

In [13]:
from redisvl.query import FilterQuery
from redisvl.query.filter import Timestamp

now = dt.datetime.now()

# find all jobs
ts = Timestamp("posted") < now

filter_query = FilterQuery(
    return_fields=["job_title", "job_description", "posted"], 
    filter_expression=ts,
    num_results=10,
)
res = index.query(filter_query)
res

[{'id': 'jobs:01JQRSS9E2ENS2J2NSHEJS0THA',
  'job_title': 'Software Engineer',
  'job_description': 'Develop and maintain web applications using JavaScript, React, and Node.js.',
  'posted': '1743428929.91'},
 {'id': 'jobs:01JQRSS9E2E7WXW5CQEB5VZZG8',
  'job_title': 'Data Analyst',
  'job_description': 'Analyze large datasets to provide business insights and create data visualizations.',
  'posted': '1742910529.91'},
 {'id': 'jobs:01JQRSS9E2J9YTY5DSFSF5HT0T',
  'job_title': 'Marketing Manager',
  'job_description': 'Develop and implement marketing strategies to drive brand awareness and customer engagement.',
  'posted': '1740926929.91'}]

In [14]:
# jobs posted in the last 3 days => 1 job
ts = Timestamp("posted") > now - dt.timedelta(days=3)

filter_query = FilterQuery(
    return_fields=["job_title", "job_description", "posted"], 
    filter_expression=ts,
    num_results=10,
)
res = index.query(filter_query)
res

[{'id': 'jobs:01JQRSS9E2ENS2J2NSHEJS0THA',
  'job_title': 'Software Engineer',
  'job_description': 'Develop and maintain web applications using JavaScript, React, and Node.js.',
  'posted': '1743428929.91'}]

In [15]:
# more than 3 days ago but less than 14 days ago => 1 job
ts = Timestamp("posted").between(
    now - dt.timedelta(days=14),
    now - dt.timedelta(days=3),
)

filter_query = FilterQuery(
    return_fields=["job_title", "job_description", "posted"], 
    filter_expression=ts,
    num_results=10,
)

res = index.query(filter_query)
res

[{'id': 'jobs:01JQRSS9E2E7WXW5CQEB5VZZG8',
  'job_title': 'Data Analyst',
  'job_description': 'Analyze large datasets to provide business insights and create data visualizations.',
  'posted': '1742910529.91'}]

# Batch search

This enhancement allows you to speed up the execution of queries by reducing the impact of network latency.

In [18]:
import time
num_queries = 100

start = time.time()
for i in range(num_queries):
    # run the same filter query 
    res = index.query(filter_query)
end = time.time()
print(f"Time taken for {num_queries} queries: {end - start:.2f} seconds")

Time taken for 100 queries: 0.21 seconds


In [20]:
batched_queries = [filter_query] * num_queries

start = time.time()

index.batch_search(batched_queries, batch_size=10)

end = time.time()
print(f"Time taken for {num_queries} batched queries: {end - start:.2f} seconds")

Time taken for 100 batched queries: 0.01 seconds


# Vector normalization

By default Redis returns vector cosine *distance* when performing a search which returns a value between 0 and 2 where 0 would be a perfect match. Sometimes you may wish instead for a *similarity* score between 0 and 1 where 1 is a perfect match when turned on this flag does the conversion for you. Additionally, if this flag is set to true for L2 distance will normalize the euclidean distance to a value between 0 and 1 as well. 
 

In [5]:
from redisvl.query import VectorQuery

query = VectorQuery(
    vector=emb_model.embed("Software Engineer", as_buffer=True),
    vector_field_name="job_embedding",
    return_fields=["job_title", "job_description", "posted"],
    normalize_vector_distance=True,
)

res = index.query(query)
res

[{'id': 'jobs:01JQPY6H4MZHY7YHZP8WRVH27K',
  'vector_distance': '0.7090711295605',
  'job_title': 'Software Engineer',
  'job_description': 'Develop and maintain web applications using JavaScript, React, and Node.js.',
  'posted': '1743366449.24'},
 {'id': 'jobs:01JQRQZAXREMGYPHTRFMK72NK3',
  'vector_distance': '0.7090711295605',
  'job_title': 'Software Engineer',
  'job_description': 'Develop and maintain web applications using JavaScript, React, and Node.js.',
  'posted': '1743427030.59'},
 {'id': 'jobs:01JQPY6H4M4MVKC4S9R4EQ69KA',
  'vector_distance': '0.6049451231955',
  'job_title': 'Data Analyst',
  'job_description': 'Analyze large datasets to provide business insights and create data visualizations.',
  'posted': '1742848049.24'},
 {'id': 'jobs:01JQRQZAXRK36XRRPK0A2XJD4D',
  'vector_distance': '0.6049451231955',
  'job_title': 'Data Analyst',
  'job_description': 'Analyze large datasets to provide business insights and create data visualizations.',
  'posted': '1742908630.59'}

In [6]:
from redisvl.query import VectorQuery

query = VectorQuery(
    vector=emb_model.embed("Software Engineer", as_buffer=True),
    vector_field_name="job_embedding",
    return_fields=["job_title", "job_description", "posted"],
    normalize_vector_distance=False,
)

res = index.query(query)
res

[{'id': 'jobs:01JQPY6H4MZHY7YHZP8WRVH27K',
  'vector_distance': '0.581857740879',
  'job_title': 'Software Engineer',
  'job_description': 'Develop and maintain web applications using JavaScript, React, and Node.js.',
  'posted': '1743366449.24'},
 {'id': 'jobs:01JQRQZAXREMGYPHTRFMK72NK3',
  'vector_distance': '0.581857740879',
  'job_title': 'Software Engineer',
  'job_description': 'Develop and maintain web applications using JavaScript, React, and Node.js.',
  'posted': '1743427030.59'},
 {'id': 'jobs:01JQPY6H4M4MVKC4S9R4EQ69KA',
  'vector_distance': '0.790109753609',
  'job_title': 'Data Analyst',
  'job_description': 'Analyze large datasets to provide business insights and create data visualizations.',
  'posted': '1742848049.24'},
 {'id': 'jobs:01JQRQZAXRK36XRRPK0A2XJD4D',
  'vector_distance': '0.790109753609',
  'job_title': 'Data Analyst',
  'job_description': 'Analyze large datasets to provide business insights and create data visualizations.',
  'posted': '1742908630.59'},
 {