# Getting Started with RedisVL
`redisvl` is a versatile Python library with an integrated CLI, designed to enhance AI applications using Redis. This guide will walk you through the following steps:

1. Defining an `IndexSchema`
2. Preparing a sample dataset
3. Creating a `SearchIndex` object
4. Testing `rvl` CLI functionality
5. Loading the sample data
6. Building `VectorQuery` objects and executing searches
7. Updating a `SearchIndex` object

...and more!

Prerequisites:
- Ensure `redisvl` is installed in your Python environment.
- Have a running instance of [Redis Stack](https://redis.io/docs/install/install-stack/) or [Redis Cloud](https://redis.com/try-free).

_____

## Define an `IndexSchema`

The `IndexSchema` maintains crucial **index configuration** and **field definitions** to
enable search with Redis. For ease of use, the schema can be constructed from a
python dictionary or yaml file.

### Example Schema Creation
Consider a dataset with user information, including `job`, `age`, `credit_score`,
and a 3-dimensional `user_embedding` vector.

You must also decide on a Redis index name and key prefix to use for this
dataset. Below are example schema definitions in both YAML and Dict format.

**YAML Definition:**

```yaml
index:
  name: user_index
  prefix: user

fields:
    # define tag fields
    tag:
        - name: user
        - name: credit_store
    # define text fields
    text:
        - name: job
    # define numeric fields
    numeric:
        - name: age
    # define vector fields
    vector:
        - name: user_embedding
          algorithm: flat
          dims: 3
          distance_metric: cosine
          datatype: float32
```
> Store this in a local file, such as `schema.yaml`, for RedisVL usage.

**Python Dictionary:**

In [1]:
schema = {
    "index": {
        "name": "user_index",
        "prefix": "user",
        "storage_type": "hash",
    },
    "fields": {
        "tag": [{"name": "user"}, {"name": "credit_score"}],
        "text": [{"name": "job"}],
        "numeric": [{"name": "age"}],
        "vector": [{
            "name": "user_embedding",
            "dims": 3,
            "distance_metric": "cosine",
            "algorithm": "flat",
            "datatype": "float32"
        }]
    },
}

## Sample Dataset Preparation

Below, create a mock dataset with `user`, `job`, `age`, `credit_score`, and
`user_embedding` fields. The `user_embedding` vectors are synthetic examples
for demonstration purposes.

For more information on creating real-world embeddings, refer to this
[article](https://mlops.community/vector-similarity-search-from-basics-to-production/).

In [2]:
import numpy as np


data = [
    {
        'user': 'john',
        'age': 1,
        'job': 'engineer',
        'credit_score': 'high',
        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'mary',
        'age': 2,
        'job': 'doctor',
        'credit_score': 'low',
        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'joe',
        'age': 3,
        'job': 'dentist',
        'credit_score': 'medium',
        'user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes()
    }
]

>As seen above, the sample `user_embedding` vectors are converted into bytes. Using the `NumPy`, this is fairly trivial.

## Create a `SearchIndex`

With the schema and sample dataset ready, instantiate a `SearchIndex`:

In [3]:
from redisvl.index import SearchIndex

index = SearchIndex.from_dict(schema) # or use .from_yaml(...)
index.connect("redis://localhost:6379")
index.create(overwrite=True)

>Note that at this point, the index has no entries. Data loading follows.

## Inspect with the `rvl` CLI
Use the `rvl` CLI to inspect the created index and its fields:

In [4]:
!rvl index listall

[32m11:20:00[0m [34m[RedisVL][0m [1;30mINFO[0m   Indices:
[32m11:20:00[0m [34m[RedisVL][0m [1;30mINFO[0m   1. user_index


In [5]:
!rvl index info -i user_index



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ user_index   │ HASH           │ ['user']   │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮
│ Name           │ Attribute      │ Type    │ Field Option   │ Option Value   │
├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤
│ user           │ user           │ TAG     │ SEPARATOR      │ ,              │
│ credit_score   │ credit_score   │ TAG     │ SEPARATOR      │ ,              │
│ job            │ job            │ TEXT    │ WEIGHT         │ 1              │
│ age            │ age            │ NUMERIC │                │                │
│ user_embeddin

## Load Data to `SearchIndex`

Load the sample dataset to Redis:

In [6]:
index.load(data)

>By default, `load` will create a unique Redis "key" as a combination of the index key `prefix` and a UUID.

In [7]:
# Investigate the written kes
index_keys = [key for key in index.client.scan_iter("user:*")]
print(index_keys)

[b'user:75dc4a80a6344c69bdf6f7d017d156fd', b'user:4eac1d79ed2b4b008418cf2f72dcc620', b'user:904500d11c1e45d2a363db778c0ded11']


### Upsert the index with new data
Upsert data by using the `load` method:= again:

In [8]:
# Add more data
new_data = [{
    'user': 'tyler',
    'age': 9,
    'job': 'engineer',
    'credit_score': 'high',
    'user_embedding': np.array([0.1, 0.3, 0.5], dtype=np.float32).tobytes()
}]
index.load(new_data)

## Creating `VectorQuery` Objects

Next we will create a vector query object for our newly populated index. This example will use a simple vector to demonstrate how vector similarity works. Vectors in production will likely be much larger than 3 floats and often require Machine Learning models (i.e. Huggingface sentence transformers) or an embeddings API (Cohere, OpenAI). `redisvl` provides a set of [Vectorizers](https://www.redisvl.com/user_guide/vectorizers_04.html#openai) to assist in vector creation.

In [9]:
from redisvl.query import VectorQuery
from redisvl.utils.utils import result_print


query = VectorQuery(
    vector=[0.1, 0.1, 0.5],
    vector_field_name="user_embedding",
    return_fields=["user", "age", "job", "credit_score", "vector_distance"],
    num_results=3
)

### Executing queries
With our `VectorQuery` object defined above, we can execute the query over the `SearchIndex` using the `query` method.

In [10]:
results = index.query(query)
result_print(results)

vector_distance,user,age,job,credit_score
0.0,john,1,engineer,high
0.0,mary,2,doctor,low
0.0566299557686,tyler,9,engineer,high


## Using an Asynchronous Redis Client

The `SearchIndex` class allows for queries, index creation, and data loading to be done asynchronously. This is the
recommended route for working with `redisvl` in production-like settings.

In order to enable it, you must either pass the `use_async` flag to the index
initializer, or provide an existing async redis client connection.

In [11]:
index = SearchIndex.from_dict(
    schema,
    redis_url="redis://localhost:6379",
    use_async=True
)

# create the index
await index.acreate()

# execute the vector query async
results = await index.aquery(query)
result_print(results)

Index already exists, not overwriting.


vector_distance,user,age,job,credit_score
0.0,john,1,engineer,high
0.0,mary,2,doctor,low
0.0566299557686,tyler,9,engineer,high


## Update a `SearchIndex`
In some scenarios, it makes sense to update the index schema. With Redis and `redisvl`, this is easy because Redis can keep the underlying data in place while you change or make updates to the index configuration.

In [12]:
# First we will inspect the index we already have...
!rvl index info -i user_index



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ user_index   │ HASH           │ ['user']   │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮
│ Name           │ Attribute      │ Type    │ Field Option   │ Option Value   │
├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤
│ user           │ user           │ TAG     │ SEPARATOR      │ ,              │
│ credit_score   │ credit_score   │ TAG     │ SEPARATOR      │ ,              │
│ job            │ job            │ TEXT    │ WEIGHT         │ 1              │
│ age            │ age            │ NUMERIC │                │                │
│ user_embeddin

So for our scenario, let's imagine we want to reindex this data in 2 ways:
- by using a `Tag` type for job field instead of `Text`
- by using an `hnsw` index for the `Vector` field instead of `flat`

In [13]:
# Inspect the previous schema
schema

{'index': {'name': 'user_index', 'prefix': 'user', 'storage_type': 'hash'},
 'fields': {'tag': [{'name': 'user'}, {'name': 'credit_score'}],
  'text': [{'name': 'job'}],
  'numeric': [{'name': 'age'}],
  'vector': [{'name': 'user_embedding',
    'dims': 3,
    'distance_metric': 'cosine',
    'algorithm': 'flat',
    'datatype': 'float32'}]}}

In [14]:
# We need to modify this schema dict to have what we want
schema['fields'].update({
    'text': [],
    'tag': [{'name': 'credit_score'}, {'name': 'job'}],
    'vector': [{
        'name': 'user_embedding',
        'dims': 3,
        'distance_metric': 'cosine',
        'algorithm': 'hnsw',
        'datatype': 'float32'
    }]
})

schema

{'index': {'name': 'user_index', 'prefix': 'user', 'storage_type': 'hash'},
 'fields': {'tag': [{'name': 'credit_score'}, {'name': 'job'}],
  'text': [],
  'numeric': [{'name': 'age'}],
  'vector': [{'name': 'user_embedding',
    'dims': 3,
    'distance_metric': 'cosine',
    'algorithm': 'hnsw',
    'datatype': 'float32'}]}}

In [15]:
# Delete existing index without clearing out the underlying data
await index.adelete(drop=False)

# Build the new index interface with updated schema
index = (
    SearchIndex
    .from_dict(schema)
    .connect("redis://localhost:6379", use_async=True)
)

# Run the index update
await index.acreate()

In [16]:
# Execute the vector query async
results = await index.aquery(query)
result_print(results)

vector_distance,user,age,job,credit_score
0.0,mary,2,doctor,low
0.0,john,1,engineer,high
0.0566299557686,tyler,9,engineer,high


## Check Index Stats
Use the `rvl` CLI to check the stats for the index:

In [17]:
!rvl stats -i user_index


Statistics:
╭─────────────────────────────┬─────────────╮
│ Stat Key                    │ Value       │
├─────────────────────────────┼─────────────┤
│ num_docs                    │ 4           │
│ num_terms                   │ 0           │
│ max_doc_id                  │ 4           │
│ num_records                 │ 16          │
│ percent_indexed             │ 1           │
│ hash_indexing_failures      │ 0           │
│ number_of_uses              │ 2           │
│ bytes_per_record_avg        │ 1           │
│ doc_table_size_mb           │ 0.000400543 │
│ inverted_sz_mb              │ 1.52588e-05 │
│ key_table_size_mb           │ 0.000138283 │
│ offset_bits_per_record_avg  │ nan         │
│ offset_vectors_sz_mb        │ 0           │
│ offsets_per_term_avg        │ 0           │
│ records_per_doc_avg         │ 4           │
│ sortable_values_size_mb     │ 0           │
│ total_indexing_time         │ 2.393       │
│ total_inverted_index_blocks │ 7           │
│ vector_index_sz_mb 

## Cleanup

In [18]:
# clean up the index
await index.adelete()