# Getting Started with RedisVL

RedisVL is a Python library with a dedicated CLI to help load and create vector search indices within Redis. While 

This notebook will walk through
1. Preparing a dataset with vectors.
2. Writing data schema for ``redis``
3. Loading the data and creating a vector search index
4. Performing queries

Before running this notebook, be sure to
1. Gave installed ``rvl`` and have that environment active for this notebook.
2. Have a running Redis instance with RediSearch > 2.4 running.

## Data Preparation

For this example, we will use the following overly simplified dataset


In [9]:
import numpy as np
from pprint import pprint

data = [
    {'user': 'john', 'age': 1, 'job': 'engineer', 'credit_score': 'high'},
    {'user': 'mary', 'age': 2, 'job': 'doctor', 'credit_score': 'low'},
    {'user': 'joe', 'age': 3, 'job': 'dentist', 'credit_score': 'medium'}
]

This will make up 3 entries in Redis (hashes) each with 4 sub-keys (users, age, job, credit_score).

Now, we want to add vectors to represent each user. These are just dummy vectors to illustrate the point, but more complex vectors can be created and used as well. For more information on creating embeddings, see this [article](https://mlops.community/vector-similarity-search-from-basics-to-production/).


In [10]:
# converted to bytes for redis
vectors = [
    np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes(),
]

for record, vector in zip(data, vectors):
    record["user_embedding"] = vector

pprint(data)

[{'age': 1,
  'credit_score': 'high',
  'job': 'engineer',
  'user': 'john',
  'user_embedding': b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'},
 {'age': 2,
  'credit_score': 'low',
  'job': 'doctor',
  'user': 'mary',
  'user_embedding': b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'},
 {'age': 3,
  'credit_score': 'medium',
  'job': 'dentist',
  'user': 'joe',
  'user_embedding': b'fff?fff?\xcd\xcc\xcc='}]


As seen above, the vectors themselves need to be turned into bytes before they can be loaded into Redis. Using ``NumPy``, this is fairly trivial. 

Our dataset is now ready to be used with ``redisvl``

## Define Index Schema

In order for ``redisvl`` to be flexible for many types of data, it uses a schema specified in either a python dictionary or a yaml file. There are a couple main components

1. index specification
2. field specification

The index specification determines how data will be stored in Redis. This includes
- ``name``: the name of the index
- ``prefix``: key prefix for each loaded entry
- ``key_field``: field within the dataset to use as unique keys

The field specification determines what fields within the dataset will be available for queries. Each field corresponds to the name of a **column** within the dataset. The values within each specified column are arguments for the creation of that index that correspond directly to ``redis-py`` arguments.

So for example, given the above dataset, the following schema can be used.

In [11]:
schema = {
    "index": {
        "name": "user_index",
        "prefix": "v1",
        "key_field": "user",
        "storage_type": "hash",
    },
    "fields": {
        "tag": [{"name": "credit_score"}],
        "text": [{"name": "job"}],
        "numeric": [{"name": "age"}],
        "vector": [{
                "name": "user_embedding",
                "dims": 3,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32"}
        ]
    },
}


## Create a ``SearchIndex``

With the data and the index schema defined, we can now use ``redisvl`` as a library to create a search index as follows.

Note that at this point, the index will have no entries. With Redis, this is fine as new entries from this index (or that follow the schema) will automatically be indexed in the background in Redis.

In [12]:
from redisvl.index import SearchIndex

# construct a search index from the schema
index = SearchIndex.from_dict(schema)

# connect to local redis instance
index.connect("redis://localhost:6379")

# create the index (no data yet)
index.create(overwrite=True)

In [13]:
# use the CLI to see the created index
!rvl index listall

[32m14:26:12[0m [35msam.partee-NW9MQX5Y74[0m [34mredisvl.cli.index[17001][0m [1;30mINFO[0m Indices:
[32m14:26:12[0m [35msam.partee-NW9MQX5Y74[0m [34mredisvl.cli.index[17001][0m [1;30mINFO[0m 1. user_index


## Load Data into the Index

Now that an index exists, data can be loaded into redis through the ``SearchIndex.load()`` function

In [14]:
# load expects an iterable of dictionaries
index.load(data)

## Executing Queries

Next we will run a vector query on our newly populated index. This example will use a simple vector to demonstrate how vector similarity works. Vectors in production will be much larger than 3 floats and often require Machine Learning models (i.e. Huggingface sentence transformers) or an embeddings API (Cohere, OpenAI) to create.

In [15]:
from redisvl.query import create_vector_query

# create a vector query returning a number of results
# with specific fields to return.
query = create_vector_query(
    return_fields=["users", "age", "job", "credit_score", "vector_score"],
    number_of_results=3,
    vector_field_name="user_embedding"
)

# establish a query vector to search against the data in Redis
query_vector = np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()

# use the SearchIndex instance (or Redis client) to execute the query
results = index.search(query, query_params={"vector": query_vector})

In [16]:
for doc in results.docs:
    print("Score:", doc.vector_score)
    print(doc)


Score: 0
Document {'id': 'v1:john', 'payload': None, 'vector_score': '0', 'age': '1', 'job': 'engineer', 'credit_score': 'high'}
Score: 0
Document {'id': 'v1:mary', 'payload': None, 'vector_score': '0', 'age': '2', 'job': 'doctor', 'credit_score': 'low'}
Score: 0.653301358223
Document {'id': 'v1:joe', 'payload': None, 'vector_score': '0.653301358223', 'age': '3', 'job': 'dentist', 'credit_score': 'medium'}
