## Redis Vector Loader (RedisVL)

RedisVL is a command line interface (CLI) and Python library to help load and create vector search indicies within Redis. While Redis offers the ``redis-cli`` which can perform similar actions, ``redisvl`` aims to be more specific to setting up VSS use cases.

This notebook will walk through
1. Preparing a dataset with vectors.
2. Writing data schema for ``redis``
3. Loading the data and creating a vector search index
4. Combining vector search with tag, text, and numeric search
5. Performing queries

Before running this notebook, be sure to
1. Gave installed ``redisvl`` and have that environment active for this notebook.
2. Have a running Redis instance with RediSearch > 2.4 running.

### 1.1 Creating Vector Embeddings

For this example, we will use the following overly simplified dataset


In [2]:
import pandas as pd
import numpy as np

data = pd.DataFrame(
    {
        "users": ["john", "mary", "joe"],
        "age": [1, 2, 3],
        "job": ["engineer", "doctor", "dentist"],
        "credit_score": ["high", "low", "medium"]
    }
)

This will make up 3 entries in Redis (hashes) each with 4 sub-keys (users, age, job, credit_score).

Now, we want to add vectors to represent each user. These are just dummy vectors to illustrate the point, but more complex vectors can be created and used as well. For more information on creating embeddings, see this [article](https://mlops.community/vector-similarity-search-from-basics-to-production/).

Let's add a ``vector`` column to the above dataframe

In [3]:
data["user_embedding"] = [
    np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes(),
]
data

Unnamed: 0,users,age,job,credit_score,user_embedding
0,john,1,engineer,high,b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
1,mary,2,doctor,low,b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
2,joe,3,dentist,medium,b'fff?fff?\xcd\xcc\xcc='


As seen above, the vectors themselves need to be turned into bytes before they can be loaded into Redis. Using ``NumPy``, this is fairly trivial. 

Our dataset is now ready to be used with ``redisvl``

### 1.2 Writing data schema for ``redisvl``

In order for ``redisvl`` to be flexible for many types of data, it uses a schema specified in either a python dictionary or a yaml file. There are a couple main components

1. index specification
2. field specification

The index specification determines how data will be stored in Redis. This includes
- ``name``: the name of the index
- ``prefix``: key prefix for each loaded entry
- ``key_field``: field within the dataset to use as unique keys

The field specification determines what fields within the dataset will be available for queries. Each field corresponds to the name of a **column** within the dataset. The values within each specified column are arguments for the creation of that index that correspond directly to ``redis-py`` arguments.

So for example, given the above dataset, the following schema can be used.

In [4]:
schema = {
    "index": {
        "name": "user_index",
        "prefix": "user:",
        "key_field": "users",
        "storage_type": "hash",
    },
    "fields": {
        # key is the field type
        # value is the name of the column in the dataset(frame)
        "tag": {"credit_score": {}},
        "text": {"job": {}},
        "numeric": {"age": {}},
        "vector": {
            "user_embedding": {
                "dims": 3,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32",
            }
        },
    },
}

### 1.3 Creating a search index

With the data and the index schema defined, we can now use ``redisvl`` as a library to create a search index as follows.

Note that at this point, the index will have no entries. With Redis, this is fine as new entries from this index (or that follow the schema) will automatically be indexed in the background in Redis.

In [5]:
from redisvl.index import SearchIndex

# construct a search index from the schema
index = SearchIndex.from_dict(schema)

# connect to local redis instance
index.connect("localhost", 6379)

# create the index (no data yet)
index.create()

### 1.4 Loading Data with PandasReader

In this section, we will take our dataframe we defined above and load it into our search index so that we can query it.

In [6]:
from redisvl.readers import PandasReader

# Initialize a reader for a pandas dataframe.
reader = PandasReader(data)

# load the data into Redis
index.load(reader)

### 1.5 Running a Vector Search

Next we will run a vector query on our newly populated index. This example will use a simple vector to demonstrate how vector similarity works. Vectors in production will be much larger than 3 floats and often require Machine Learning models (i.e. Huggingface sentence transformers) or an embeddings API (Cohere, OpenAI) to create.

In [8]:
from redisvl.query import create_vector_query

# create a vector query returning a number of results
# with specific fields to return.
query = create_vector_query(
    return_fields=["users", "age", "job", "credit_score", "vector_score"],
    number_of_results=3,
    vector_param_name="vec_param",
    vector_field_name="user_embedding",
    sort=True
)

# establish a query vector to search against the data in Redis
query_vector = np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()

# use the SearchIndex instance (or Redis client) to execute the query
results = index.search(query, query_params={"vec_param": query_vector})

In [9]:
for doc in results.docs:
    print("Score:", doc.vector_score)
    print(doc)


Score: 0
Document {'id': 'user:john', 'payload': None, 'vector_score': '0', 'users': 'john', 'age': '1', 'job': 'engineer', 'credit_score': 'high'}
Score: 0
Document {'id': 'user:mary', 'payload': None, 'vector_score': '0', 'users': 'mary', 'age': '2', 'job': 'doctor', 'credit_score': 'low'}
Score: 0.653301358223
Document {'id': 'user:joe', 'payload': None, 'vector_score': '0.653301358223', 'users': 'joe', 'age': '3', 'job': 'dentist', 'credit_score': 'medium'}
