# Getting Started with RedisVL

RedisVL is a Python library with a dedicated CLI to help load and create vector search indices within Redis. While 

This notebook will walk through
1. Preparing a dataset with vectors.
2. Writing data schema for ``redis``
3. Loading the data and creating a vector search index
4. Performing queries

Before running this notebook, be sure to
1. Have installed ``redisvl`` and have that environment active for this notebook.
2. Have a running Redis instance with RediSearch > 2.4 running.

## Data Preparation

For this example, we will use the following overly simplified dataset


In [1]:
import numpy as np
from jupyterutils import table_print

data = [
    {'user': 'john', 'age': 1, 'job': 'engineer', 'credit_score': 'high'},
    {'user': 'mary', 'age': 2, 'job': 'doctor', 'credit_score': 'low'},
    {'user': 'joe', 'age': 3, 'job': 'dentist', 'credit_score': 'medium'}
]

This will make up 3 entries in Redis (hashes) each with 4 sub-keys (users, age, job, credit_score).

Now, we want to add vectors to represent each user. These are just dummy vectors to illustrate the point, but more complex vectors can be created and used as well. For more information on creating embeddings, see this [article](https://mlops.community/vector-similarity-search-from-basics-to-production/).

As seen below, the sample vectors need to be turned into bytes before they can be loaded into Redis. Using ``NumPy``, this is fairly trivial.

In [2]:
# converted to bytes for redis
vectors = [
    np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes(),
]

for record, vector in zip(data, vectors):
    record["user_embedding"] = vector

table_print(data)

user,age,job,credit_score,user_embedding
john,1,engineer,high,b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
mary,2,doctor,low,b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
joe,3,dentist,medium,b'fff?fff?\xcd\xcc\xcc='



Our dataset is now ready to be used with ``redisvl``. The next step is to define the schema for the data.

## Define Index Schema

In order for ``redisvl`` to be flexible for many types of data, it uses a schema specified in either a python dictionary or a yaml file. There are a couple main components

1. `index` specification
2. `fields` specification

The index specification determines how data will be stored in Redis. This includes
- `name`: the name of the index
- `prefix` (*optional*) : Redis key prefix for each loaded record

The fields specification determines what fields within the dataset will be indexed and available for queries. Each field corresponds to the name of a **column** within the dataset. The values within each specified column are arguments for the creation of that index that correspond directly to ``redis-py`` arguments.

### Example

So for example, given the above dataset, the following schema can be used in YAML file format:


```yaml

index:
  name: user_index
  prefix: user

fields:
    # define tag fields
    tag:
        - name: user
        - name: credit_store
    # define text fields
    text:
        - name: job
    # define numeric fields
    numeric:
        - name: age
    # define vector fields
    vector:
        - name: user_embedding
          algorithm: flat
          dims: 3
          distance_metric: cosine
          datatype: float32
```
> Would need to be stored locally as `schema.yaml` to be consumed by RedisVL.

In Python, this can also be represented as a simple dictionary:

In [3]:
schema = {
    "index": {
        "name": "user_index",
        "prefix": "user",
    },
    "fields": {
        "tag": [{"name": "credit_score"}],
        "text": [{"name": "job"}],
        "numeric": [{"name": "age"}],
        "vector": [{
            "name": "user_embedding",
            "dims": 3,
            "distance_metric": "cosine",
            "algorithm": "flat",
            "datatype": "float32"
        }]
    },
}

## Create a ``SearchIndex``

With the data and the index schema defined, we can now use ``redisvl`` as a library to create a search index as follows.

Note that at this point, the index will have no entries. With Redis, this is fine as new entries from this index (or that follow the schema) will automatically be indexed in the background in Redis.

In [4]:
from redisvl.index import SearchIndex

# construct a search index from the schema
index = SearchIndex.from_dict(schema) # or SearchIndex.from_yaml("schema.yaml") for yaml files

# connect to local redis instance
index.connect("redis://localhost:6379")

# create the index (no data yet, overwrite any other index that might exist)
index.create(overwrite=True)

In [5]:
# use the CLI to see the created index
!rvl index listall

[32m16:03:01[0m [34m[RedisVL][0m [1;30mINFO[0m   Indices:
[32m16:03:01[0m [34m[RedisVL][0m [1;30mINFO[0m   1. user_index


In [6]:
# use the CLI to print fields in the index
!rvl index info -i user_index



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ user_index   │ HASH           │ ['user']   │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮
│ Name           │ Attribute      │ Type    │ Field Option   │ Option Value   │
├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤
│ credit_score   │ credit_score   │ TAG     │ SEPARATOR      │ ,              │
│ job            │ job            │ TEXT    │ WEIGHT         │ 1              │
│ age            │ age            │ NUMERIC │                │                │
│ user_embedding │ user_embedding │ VECTOR  │                │                │
╰──────────────

## Load Data

Now that an index exists, data can be loaded into redis through the ``SearchIndex.load()`` function. By default, the load method will create a random value for each key in Redis, prefixed by the key prefix.

In [7]:
# load expects an iterable of dictionaries, and an optional key_field
index.load(data, key_field="user")

# key_field will use the "user" field in the data to construct a Redis key that consists of `{key_prefix}:{specified_key_field_value}`

In [8]:
# This command is not recommended for production databases
index.client.keys()

[b'user:joe', b'user:mary', b'user:john']

### Upsert data to the index
With Redis and RedisVL, it's simple to upsert data to the index created above. Just call the `.load()` method again on data you wish to add or update.

In [9]:
# Add more data
new_data = {
    'user': 'tyler',
    'age': 9,
    'job': 'engineer',
    'credit_score': 'high',
    'user_embedding': np.array([0.1, 0.3, 0.5], dtype=np.float32).tobytes()
}
index.load([new_data], key_field="user")

In [10]:
index.client.hgetall("user:tyler", )

{b'user': b'tyler',
 b'age': b'9',
 b'job': b'engineer',
 b'credit_score': b'high',
 b'user_embedding': b'\xcd\xcc\xcc=\x9a\x99\x99>\x00\x00\x00?'}

In [11]:
# Update existing records
updated_data = {
    'user': 'tyler',
    'age': 29,
}
index.load([updated_data], key_field="user")

In [12]:
index.client.hgetall("user:tyler")

{b'user': b'tyler',
 b'age': b'29',
 b'job': b'engineer',
 b'credit_score': b'high',
 b'user_embedding': b'\xcd\xcc\xcc=\x9a\x99\x99>\x00\x00\x00?'}

## Executing Queries

Next we will run a vector query on our newly populated index. This example will use a simple vector to demonstrate how vector similarity works. Vectors in production will be much larger than 3 floats and often require Machine Learning models (i.e. Huggingface sentence transformers) or an embeddings API (Cohere, OpenAI) to create.

In [13]:
from redisvl.query import VectorQuery
from jupyterutils import result_print

# create a vector query returning a number of results
# with specific fields to return.
query = VectorQuery(
    vector=[0.1, 0.1, 0.5],
    vector_field_name="user_embedding",
    return_fields=["user", "age", "job", "credit_score", "vector_distance"],
    num_results=3
)

# use the SearchIndex instance (or Redis client) to execute the query
results = index.query(query)
result_print(results)

vector_distance,user,age,job,credit_score
0.0,john,1,engineer,high
0.0,mary,2,doctor,low
0.0566299557686,tyler,29,engineer,high


## Connecting to an Existing Index

If you have an existing index, you can connect to it using the ``SearchIndex.from_existing()`` function. This will return a ``SearchIndex`` object that can be used to execute queries.


In [14]:
# create a new SearchIndex instance from an existing index
existing_index = SearchIndex.from_existing("user_index", "redis://localhost:6379")

# run the same query
results = existing_index.query(query)
result_print(results)

vector_distance,user,age,job,credit_score
0.0,john,1,engineer,high
0.0,mary,2,doctor,low
0.0566299557686,tyler,29,engineer,high


## Asynchronous Search

The AsyncSearchIndex class allows for queries, index creation, and data loading to be done asynchronously. This is useful for large datasets that may take a long time to load into Redis, for queries that may take a long time to execute, or for asynchronous applications that need to execute queries in the background like a FastAPI application.

In [15]:
## Asynchronous Search
from redisvl.index import AsyncSearchIndex

# construct an async search index from the schema
index = AsyncSearchIndex.from_dict(schema)

# connect to local redis instance
index.connect("redis://localhost:6379")

# create the index -- but don't overwrite
await index.create(overwrite=False)

# run the same vector query but asynchronously
results = await index.query(query)
result_print(results)

Index already exists, not overwriting.


vector_distance,user,age,job,credit_score
0.0,john,1,engineer,high
0.0,mary,2,doctor,low
0.0566299557686,tyler,29,engineer,high


## Update Index
In some scenarios, it makes sense to update the index schema. With Redis and RedisVL, this is easy because Redis can keep the underlying data in place while you change or make updates to the index configuration.

In [16]:
# First we will inspect the index we already have...
!rvl index info -i user_index



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ user_index   │ HASH           │ ['user']   │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮
│ Name           │ Attribute      │ Type    │ Field Option   │ Option Value   │
├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤
│ credit_score   │ credit_score   │ TAG     │ SEPARATOR      │ ,              │
│ job            │ job            │ TEXT    │ WEIGHT         │ 1              │
│ age            │ age            │ NUMERIC │                │                │
│ user_embedding │ user_embedding │ VECTOR  │                │                │
╰──────────────

So for our scenario, let's imagine we want to reindex this data in 2 ways:
- by using a `Tag` type for job field instead of `Text`
- by using an `hnsw` index for the `Vector` field instead of `flat`

In [17]:
# Inspect the previous schema
schema

{'index': {'name': 'user_index', 'prefix': 'user'},
 'fields': {'tag': [{'name': 'credit_score'}],
  'text': [{'name': 'job'}],
  'numeric': [{'name': 'age'}],
  'vector': [{'name': 'user_embedding',
    'dims': 3,
    'distance_metric': 'cosine',
    'algorithm': 'flat',
    'datatype': 'float32'}]}}

In [18]:
# We need to modify this schema dict to have what we want
schema['fields'].update({
    'text': [],
    'tag': [{'name': 'credit_score'}, {'name': 'job'}],
    'vector': [{
        'name': 'user_embedding',
        'dims': 3,
        'distance_metric': 'cosine',
        'algorithm': 'hnsw',
        'datatype': 'float32'
    }]
})

schema

{'index': {'name': 'user_index', 'prefix': 'user'},
 'fields': {'tag': [{'name': 'credit_score'}, {'name': 'job'}],
  'text': [],
  'numeric': [{'name': 'age'}],
  'vector': [{'name': 'user_embedding',
    'dims': 3,
    'distance_metric': 'cosine',
    'algorithm': 'hnsw',
    'datatype': 'float32'}]}}

In [19]:
# Delete existing index without clearing out the underlying data
await index.delete(drop=False)

# Build the new index interface
index = (
    AsyncSearchIndex
    .from_dict(schema)
    .connect("redis://localhost:6379")
)

# Run the index update
await index.create()

In [20]:
# Test query again
result_print(await index.query(query))

vector_distance,user,age,job,credit_score
0.0,mary,2,doctor,low
0.0,john,1,engineer,high
0.0566299557686,tyler,29,engineer,high


## Check Index Stats

In [21]:
# We can also use the CLI to check the stats for the index we just used
!rvl stats -i user_index


Statistics:
╭─────────────────────────────┬─────────────╮
│ Stat Key                    │ Value       │
├─────────────────────────────┼─────────────┤
│ num_docs                    │ 4           │
│ num_terms                   │ 0           │
│ max_doc_id                  │ 4           │
│ num_records                 │ 16          │
│ percent_indexed             │ 1           │
│ hash_indexing_failures      │ 0           │
│ number_of_uses              │ 3           │
│ bytes_per_record_avg        │ 1           │
│ doc_table_size_mb           │ 0.000286102 │
│ inverted_sz_mb              │ 1.52588e-05 │
│ key_table_size_mb           │ 0.000165939 │
│ offset_bits_per_record_avg  │ nan         │
│ offset_vectors_sz_mb        │ 0           │
│ offsets_per_term_avg        │ 0           │
│ records_per_doc_avg         │ 4           │
│ sortable_values_size_mb     │ 0           │
│ total_indexing_time         │ 0.59        │
│ total_inverted_index_blocks │ 7           │
│ vector_index_sz_mb 

## Cleanup

In [22]:
# clean up the index
await index.delete()