# Getting Started with RedisVL

RedisVL is a Python library with a dedicated CLI to build powerful AI applications on top of [Redis](https://redis.com).

**This notebook will demonstrate**:
1. Defining an `IndexSchema`
2. Preparing a sample dataset for
3. Creating a `SearchIndex` and loading with sample data
4. Building `VectorQuery` objects and performing search
5. Updating a `SearchIndex`

**Before running this notebook**:
1. Have [installed](https://www.redisvl.com/overview/installation.html) `redisvl` in the Python environment active for this notebook.
2. Have a running [Redis Stack](https://redis.io/docs/install/install-stack/) or [Redis Cloud](https://redis.com/try-free) instance.

_____

## Define an `IndexSchema`

An `IndexSchema` maintains crucial index specifications and field definitions to
enable search with Redis. The schema can be defined as a python dictionary, yaml
file, or `IndexSchema` object.

[Go here to learn more] about working with `IndexSchema` in `redisvl`.

### IndexSchema example

Say we have a dataset of users, information about their jobs, age, credit score,
and potentially other relevant pieces:
- Assuming we want to name the search index as `user_index` and use `user:` as
the key prefix in Redis...
- Assuming we want to index each field in the dataset...
- Assuming we also have a `user_embedding` field that contains some 3
dimensional float32 vector...

We can define an index schema via YAML as follows:

```yaml
index:
  name: user_index
  prefix: user

fields:
    # define tag fields
    tag:
        - name: user
        - name: credit_store
    # define text fields
    text:
        - name: job
    # define numeric fields
    numeric:
        - name: age
    # define vector fields
    vector:
        - name: user_embedding
          algorithm: flat
          dims: 3
          distance_metric: cosine
          datatype: float32
```
> Would need to be stored in a file locally (e.g. `schema.yaml`) to be consumed by RedisVL.

**Alternatively**, in Python code, this can also be represented as a simple
dictionary:

In [1]:
schema = {
    "index": {
        "name": "user_index",
        "prefix": "user",
        "storage_type": "hash",
    },
    "fields": {
        "tag": [{"name": "credit_score"}],
        "text": [{"name": "job"}],
        "numeric": [{"name": "age"}],
        "vector": [{
            "name": "user_embedding",
            "dims": 3,
            "distance_metric": "cosine",
            "algorithm": "flat",
            "datatype": "float32"
        }]
    },
}

## Data Preparation

Below we will build our "dummy" user dataset with fields for `user` (name),
`job`, `age`, `credit_score`, and `user_embedding`.

The `user_embedding` vectors here are purely synthetic in order to illustrate
the concepts.

For more information on creating real-world embeddings, check out this introductory
[article](https://mlops.community/vector-similarity-search-from-basics-to-production/).


In [2]:
import numpy as np

from redisvl.utils.utils import table_print


data = [
    {
        'user': 'john',
        'age': 1,
        'job': 'engineer',
        'credit_score': 'high',
        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'mary',
        'age': 2,
        'job': 'doctor',
        'credit_score': 'low',
        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'joe',
        'age': 3,
        'job': 'dentist',
        'credit_score': 'medium',
        'user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes()
    }
]

table_print(data)

user,age,job,credit_score,user_embedding
john,1,engineer,high,b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
mary,2,doctor,low,b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
joe,3,dentist,medium,b'fff?fff?\xcd\xcc\xcc='


>As seen above, the sample `user_embedding` vectors are converted into bytes. Using the `NumPy`, this is fairly trivial.

## Create a ``SearchIndex``

With the sample dataset and the `IndexSchema` defined, we can create a `SearchIndex` with a few lines of code:

In [3]:
from redisvl.index import SearchIndex

# construct a search index from the schema
index = SearchIndex.from_dict(schema) # or SearchIndex.from_yaml("schema.yaml") for yaml files

# connect to local redis instance
index.connect("redis://localhost:6379")

# create the index (no data yet, overwrite any other index that might exist)
index.create(overwrite=True)

Index already exists, overwriting.


>Note that at this point, the index will likely have no entries. We will load our sample dataset below.

### Inspect `SearchIndex` properties with the `rvl` CLI

In [4]:
# use the CLI to see the created index
!rvl index listall

[32m17:09:24[0m [34m[RedisVL][0m [1;30mINFO[0m   Indices:
[32m17:09:24[0m [34m[RedisVL][0m [1;30mINFO[0m   1. user_index


In [5]:
# use the CLI to print fields in the index
!rvl index info -i user_index



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ user_index   │ HASH           │ ['user']   │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮
│ Name           │ Attribute      │ Type    │ Field Option   │ Option Value   │
├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤
│ credit_score   │ credit_score   │ TAG     │ SEPARATOR      │ ,              │
│ job            │ job            │ TEXT    │ WEIGHT         │ 1              │
│ age            │ age            │ NUMERIC │                │                │
│ user_embedding │ user_embedding │ VECTOR  │                │                │
╰──────────────

### Load Data to `SearchIndex`

Now that an index exists, data can be loaded into redis through the `load` method. By default, the load method will create a unique "key" as a combination of the specific index key `prefix` and a UUID.

In [6]:
index.load(data)

In [7]:
# This command is not recommended for production databases
index_keys = [key for key in index.client.scan_iter("user:*")]
print(index_keys)

[b'user:58fd2181b6004a1a80c735feab8251e0', b'user:9fd209a2627c4597b979f49d18d8d655', b'user:5866b77ddb9e4433bc72dba025192207']


### Upsert data to the index
With Redis and `redisvl`, it's simple to upsert data to the index created above. Just call the `load` method again on data you wish to add or update.

In [8]:
# Add more data
new_data = [{
    'user': 'tyler',
    'age': 9,
    'job': 'engineer',
    'credit_score': 'high',
    'user_embedding': np.array([0.1, 0.3, 0.5], dtype=np.float32).tobytes()
}]
index.load(new_data)

## Creating `VectorQuery` objects

Next we will create a vector query object for our newly populated index. This example will use a simple vector to demonstrate how vector similarity works. Vectors in production will likely be much larger than 3 floats and often require Machine Learning models (i.e. Huggingface sentence transformers) or an embeddings API (Cohere, OpenAI) to create.

In [9]:
from redisvl.query import VectorQuery
from redisvl.utils.utils import result_print

# create a vector query returning a number of results
# with specific fields to return.
query = VectorQuery(
    vector=[0.1, 0.1, 0.5],
    vector_field_name="user_embedding",
    return_fields=["user", "age", "job", "credit_score", "vector_distance"],
    num_results=3
)

### Executing queries
With our `VectorQuery` object defined above, we can execute the query over the `SearchIndex` using the `query` method.

In [10]:
# execute the query
results = index.query(query)

result_print(results)

vector_distance,user,age,job,credit_score
0.0,john,1,engineer,high
0.0,mary,2,doctor,low
0.0566299557686,tyler,9,engineer,high


## Asynchronous `SearchIndex`

The SearchIndex class allows for queries, index creation, and data loading to be done asynchronously. This is the
recommended route for working with `redisvl` in production-like settings.

In order to enable it, you must either pass the `use_async` flag to the index
initializer, or provide an existing async redis client connection.

In [11]:
# construct an async search index from the schema
index = SearchIndex.from_dict(
    schema,
    redis_url="redis://localhost:6379",
    use_async=True
)

# create the index -- but don't overwrite
await index.acreate(overwrite=False)

# run the same vector query but asynchronously
results = await index.aquery(query)

result_print(results)

Index already exists, not overwriting.


vector_distance,user,age,job,credit_score
0.0,john,1,engineer,high
0.0,mary,2,doctor,low
0.0566299557686,tyler,9,engineer,high


## Update a `SearchIndex`
In some scenarios, it makes sense to update the index schema. With Redis and `redisvl`, this is easy because Redis can keep the underlying data in place while you change or make updates to the index configuration.

In [12]:
# First we will inspect the index we already have...
!rvl index info -i user_index



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ user_index   │ HASH           │ ['user']   │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮
│ Name           │ Attribute      │ Type    │ Field Option   │ Option Value   │
├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤
│ credit_score   │ credit_score   │ TAG     │ SEPARATOR      │ ,              │
│ job            │ job            │ TEXT    │ WEIGHT         │ 1              │
│ age            │ age            │ NUMERIC │                │                │
│ user_embedding │ user_embedding │ VECTOR  │                │                │
╰──────────────

So for our scenario, let's imagine we want to reindex this data in 2 ways:
- by using a `Tag` type for job field instead of `Text`
- by using an `hnsw` index for the `Vector` field instead of `flat`

In [13]:
# Inspect the previous schema
schema

{'index': {'name': 'user_index', 'prefix': 'user', 'storage_type': 'hash'},
 'fields': {'tag': [{'name': 'credit_score'}],
  'text': [{'name': 'job'}],
  'numeric': [{'name': 'age'}],
  'vector': [{'name': 'user_embedding',
    'dims': 3,
    'distance_metric': 'cosine',
    'algorithm': 'flat',
    'datatype': 'float32'}]}}

In [14]:
# We need to modify this schema dict to have what we want
schema['fields'].update({
    'text': [],
    'tag': [{'name': 'credit_score'}, {'name': 'job'}],
    'vector': [{
        'name': 'user_embedding',
        'dims': 3,
        'distance_metric': 'cosine',
        'algorithm': 'hnsw',
        'datatype': 'float32'
    }]
})

schema

{'index': {'name': 'user_index', 'prefix': 'user', 'storage_type': 'hash'},
 'fields': {'tag': [{'name': 'credit_score'}, {'name': 'job'}],
  'text': [],
  'numeric': [{'name': 'age'}],
  'vector': [{'name': 'user_embedding',
    'dims': 3,
    'distance_metric': 'cosine',
    'algorithm': 'hnsw',
    'datatype': 'float32'}]}}

In [16]:
# Delete existing index without clearing out the underlying data
await index.adelete(drop=False)

# Build the new index interface
index = (
    SearchIndex
    .from_dict(schema)
    .connect("redis://localhost:6379", use_async=True)
)

# Run the index update
await index.acreate()

In [17]:
# Test query again
result_print(await index.aquery(query))

vector_distance,user,age,job,credit_score
0.0,mary,2,doctor,low
0.0,john,1,engineer,high
0.0566299557686,tyler,9,engineer,high


## Check Index Stats

In [18]:
# We can also use the CLI to check the stats for the index we just used
!rvl stats -i user_index


Statistics:
╭─────────────────────────────┬─────────────╮
│ Stat Key                    │ Value       │
├─────────────────────────────┼─────────────┤
│ num_docs                    │ 4           │
│ num_terms                   │ 0           │
│ max_doc_id                  │ 4           │
│ num_records                 │ 16          │
│ percent_indexed             │ 1           │
│ hash_indexing_failures      │ 0           │
│ number_of_uses              │ 2           │
│ bytes_per_record_avg        │ 1           │
│ doc_table_size_mb           │ 0.000400543 │
│ inverted_sz_mb              │ 1.52588e-05 │
│ key_table_size_mb           │ 0.000165939 │
│ offset_bits_per_record_avg  │ nan         │
│ offset_vectors_sz_mb        │ 0           │
│ offsets_per_term_avg        │ 0           │
│ records_per_doc_avg         │ 4           │
│ sortable_values_size_mb     │ 0           │
│ total_indexing_time         │ 3.486       │
│ total_inverted_index_blocks │ 7           │
│ vector_index_sz_mb 

## Cleanup

In [None]:
# clean up the index
await index.adelete()