# Getting Started with RedisVL
`RedisVL` is a versatile Java library designed to enhance AI applications using Redis. This guide will walk you through the following steps:

1. Defining an `IndexSchema`
2. Preparing a sample dataset
3. Creating a `SearchIndex` object
4. Loading the sample data
5. Building `VectorQuery` objects and executing searches
6. Updating a `SearchIndex` object

...and more!

Prerequisites:
- Ensure `RedisVL` is installed in your Java environment.
- Have a running instance of [Redis Stack](https://redis.io/docs/install/install-stack/) or [Redis Cloud](https://redis.io/cloud).

_____

In [1]:
// Load Maven dependencies
%maven redis.clients:jedis:6.2.0
%maven org.slf4j:slf4j-nop:2.0.16
%maven com.fasterxml.jackson.core:jackson-databind:2.18.0
%maven com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.18.0
%maven com.github.f4b6a3:ulid-creator:5.2.3

// Import RedisVL classes
import com.redis.vl.index.SearchIndex;
import com.redis.vl.schema.IndexSchema;
import com.redis.vl.query.VectorQuery;

// Import Redis client
import redis.clients.jedis.UnifiedJedis;
import redis.clients.jedis.HostAndPort;

// Import ULID generator
import com.github.f4b6a3.ulid.UlidCreator;

// Import Java standard libraries
import java.util.*;
import java.nio.*;

## Define an `IndexSchema`

The `IndexSchema` maintains crucial **index configuration** and **field definitions** to
enable search with Redis. For ease of use, the schema can be constructed from a
Java Map or YAML file.

### Example Schema Creation
Consider a dataset with user information, including `job`, `age`, `credit_score`,
and a 3-dimensional `user_embedding` vector.

You must also decide on a Redis index name and key prefix to use for this
dataset. Below are example schema definitions in both YAML and Dict format.

**YAML Definition:**

```yaml
version: '0.1.0'

index:
  name: user_simple
  prefix: user_simple_docs

fields:
    - name: user
      type: tag
    - name: credit_score
      type: tag
    - name: job
      type: text
    - name: age
      type: numeric
    - name: user_embedding
      type: vector
      attrs:
        algorithm: flat
        dims: 3
        distance_metric: cosine
        datatype: float32
```
> Store this in a local file, such as `schema.yaml`, for RedisVL usage.

**Java Dictionary:**

In [2]:
Map<String,Object> schema = Map.<String, Object>of(
  "index", Map.of(
    "name", "user_simple",
    "prefix", "user_simple_docs"
  ),
  "fields", List.of(
    Map.of("name", "user", "type", "tag"),
    Map.of("name", "credit_score", "type", "tag"),
    Map.of("name", "job", "type", "text"),
    Map.of("name", "age", "type", "numeric"),
    Map.of(
      "name", "user_embedding",
      "type", "vector",
      "attrs", Map.of(
        "dims", 3,
        "distance_metric", "cosine",
        "algorithm", "flat",
        "datatype", "float32"
      )
    )
  )
);

// (optional) count fields
int count = ((List<?>) schema.get("fields")).size();
System.out.println("Schema created with " + count + " fields");

Schema created with 5 fields


## Sample Dataset Preparation

Below, create a mock dataset with `user`, `job`, `age`, `credit_score`, and
`user_embedding` fields. The `user_embedding` vectors are synthetic examples
for demonstration purposes.

For more information on creating real-world embeddings, refer to this
[article](https://mlops.community/vector-similarity-search-from-basics-to-production/).

In [3]:
List<Map<String, Object>> data = List.<Map<String, Object>>of(
  Map.of(
    "user", "john",
    "age", 1,
    "job", "engineer",
    "credit_score", "high",
    "user_embedding", new float[]{0.1f, 0.1f, 0.5f}
  ),
  Map.of(
    "user", "mary",
    "age", 2,
    "job", "doctor",
    "credit_score", "low",
    "user_embedding", new float[]{0.1f, 0.1f, 0.5f}
  ),
  Map.of(
    "user", "joe",
    "age", 3,
    "job", "dentist",
    "credit_score", "medium",
    "user_embedding", new float[]{0.9f, 0.9f, 0.1f}
  )
);

>As seen above, the sample `user_embedding` vectors are provided as float arrays. The RedisVL framework automatically handles the conversion to bytes when storing in Redis.

## Create a `SearchIndex`

With the schema and sample dataset ready, create a `SearchIndex`.

### Bring your own Redis connection instance

This is ideal in scenarios where you have custom settings on the connection instance or if your application will share a connection pool:

In [4]:
// Connect to Redis
UnifiedJedis client = new UnifiedJedis(new HostAndPort("redis-stack", 6379));

// Create SearchIndex from schema (validateOnLoad is true by default)
SearchIndex index = SearchIndex.fromDict(schema, client);

// Or explicitly enable validation
// SearchIndex index = SearchIndex.fromDict(schema, client, true);

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


### Let the index manage the connection instance

You can also pass a Redis URL string directly to the SearchIndex, and it will manage the connection for you:

In [5]:
// Create SearchIndex with Redis URL - the index will manage the connection
SearchIndex indexWithUrl = SearchIndex.fromDict(schema, "redis://redis-stack:6379");

// Or with validateOnLoad option enabled
SearchIndex indexWithValidation = SearchIndex.fromDict(schema, "redis://redis-stack:6379", true);

System.out.println("Created indices with URL-based connection");

Created indices with URL-based connection


### Create the index

Now that we are connected to Redis, we need to run the create command.

In [6]:
index.create(true); // overwrite=True

>Note that at this point, the index has no entries. Data loading follows.

## Load Data to `SearchIndex`

Load the sample dataset to Redis.

### Validate data entries on load
RedisVL uses built-in validation to ensure loaded data is valid and conforms to your schema. This validation happens automatically during load operations when the index is created with `validateOnLoad` enabled (which is the default behavior).

In [7]:
// The SearchIndex automatically validates data during load operations
// Let's verify validation is enabled on our index
System.out.println("Validation enabled: " + index.isValidateOnLoad());
System.out.println("Index name: " + index.getName());
System.out.println("Index prefix: " + index.getPrefix());

Validation enabled: false
Index name: user_simple
Index prefix: user_simple_docs


In [8]:
// Load data with automatic ULID generation
List<String> keys = index.load(data);
System.out.println(keys);

[user_simple_docs:01K70ZR1PYMQFQ737CTSDCMAP5, user_simple_docs:01K70ZR1PY414PXY73W2AEANE7, user_simple_docs:01K70ZR1PYZZTV2QSHM4DJYB89]


>By default, `load` will create a unique Redis key as a combination of the index key `prefix` and a random ULID. You can also customize the key by providing direct keys or pointing to a specified `id_field` on load.

### Upsert the index with new data
Upsert data by using the `load` method again:

### Load INVALID data
This will raise a validation error since RedisVL now validates data against the schema automatically during load operations.

In [9]:
// Try to load invalid data - this should fail during the addDocument operation
try {
    Map<String, Object> invalidDoc = new HashMap<>();
    invalidDoc.put("user_embedding", true); // Wrong type - should be byte[], float[], or double[]

    String invalidKey = index.getPrefix() + ":invalid";
    index.addDocument(invalidKey, invalidDoc);

    System.out.println("Validation passed (shouldn't happen)");
} catch (Exception e) {
    System.out.println("Data validation failed during load operation");
    System.out.println(e.getMessage());
}

Data validation failed during load operation
Schema validation failed for field 'user_embedding'. Field expects bytes (vector data), but got Boolean value 'true'. If this should be a vector field, provide a list of numbers or bytes.


### Upsert the index with new data
Upsert data by using the `load` method again:

In [10]:
// Add more data using the simplified load method
var newData = Map.<String, Object>of(
  "user", "tyler",
  "age", 9,
  "job", "engineer",
  "credit_score", "high",
  "user_embedding", new float[]{0.1f, 0.3f, 0.5f}
);

// Load a single document with automatic ULID generation
var keys = index.load(List.of(newData));
System.out.println(keys);

[user_simple_docs:01K70ZR1WATEXBRXMSNFVP4GCH]


## Creating `VectorQuery` Objects

Next we will create a vector query object for our newly populated index. This example will use a simple vector to demonstrate how vector similarity works. Vectors in production will likely be much larger than 3 floats and often require Machine Learning models (i.e. Huggingface sentence transformers) or an embeddings API (Cohere, OpenAI).

In [11]:
VectorQuery query = VectorQuery.builder()
    .vector(new float[]{0.1f, 0.1f, 0.5f})
    .field("user_embedding")
    .returnFields("user", "age", "job", "credit_score", "vector_distance")
    .numResults(3)
    .build();

### Executing queries
With our `VectorQuery` object defined above, we can execute the query over the `SearchIndex` using the `query` method.

In [12]:
List<Map<String, Object>> results = index.query(query);

// Display results
for (Map<String, Object> result : results) {
    System.out.println(result);
}

{credit_score=high, score=1.0, vector_distance=0, user_embedding=���=���=   ?, id=user_simple_docs:01K70ZR1PYMQFQ737CTSDCMAP5, job=engineer, user=john, age=1}
{credit_score=low, score=1.0, vector_distance=0, user_embedding=���=���=   ?, id=user_simple_docs:01K70ZR1PY414PXY73W2AEANE7, job=doctor, user=mary, age=2}
{credit_score=high, score=1.0, vector_distance=0.0566298961639, user_embedding=���=���>   ?, id=user_simple_docs:01K70ZR1WATEXBRXMSNFVP4GCH, job=engineer, user=tyler, age=9}


## Updating a schema
In some scenarios, it makes sense to update the index schema. With Redis and `RedisVL`, this is easy because Redis can keep the underlying data in place while you change or make updates to the index configuration.

So for our scenario, let's imagine we want to reindex this data in 2 ways:
- by using a `Tag` type for `job` field instead of `Text`
- by using an `hnsw` vector index for the `user_embedding` field instead of a `flat` vector index

In [13]:
// Modify the schema to have what we want
// First remove the fields we want to change
index.getSchema().removeField("job");
index.getSchema().removeField("user_embedding");

// Then add them back with new types
index.getSchema().addFields(List.of(
    Map.of("name", "job", "type", "tag"),       // changed text -> tag
    Map.of(                                      // HNSW vector field
      "name", "user_embedding",
      "type", "vector",
      "attrs", Map.of(
        "dims", 3,
        "distance_metric", "cosine",
        "algorithm", "hnsw",                     // flat -> hnsw
        "datatype", "float32"
      )
    )
));

// Run the index update but keep underlying data in place
index.create(true, false); // overwrite=true, drop=false

In [14]:
// Execute the vector query with updated schema
results = index.query(query);

// Display results
for (Map<String, Object> result : results) {
    System.out.println(result);
}

{credit_score=low, score=1.0, vector_distance=0, user_embedding=���=���=   ?, id=user_simple_docs:01K70ZR1PY414PXY73W2AEANE7, job=doctor, user=mary, age=2}
{credit_score=high, score=1.0, vector_distance=0, user_embedding=���=���=   ?, id=user_simple_docs:01K70ZR1PYMQFQ737CTSDCMAP5, job=engineer, user=john, age=1}
{credit_score=high, score=1.0, vector_distance=0.0566298961639, user_embedding=���=���>   ?, id=user_simple_docs:01K70ZR1WATEXBRXMSNFVP4GCH, job=engineer, user=tyler, age=9}


## Loading Schema from YAML File

RedisVL also supports loading schemas from YAML files for convenience, similar to the Python version:

In [15]:
// First, let's create a YAML schema file (matching the YAML from the beginning)
String yamlSchema = """
version: '0.1.0'

index:
  name: user_simple_yaml
  prefix: user_simple_yaml_docs

fields:
    - name: user
      type: tag
    - name: credit_score
      type: tag
    - name: job
      type: text
    - name: age
      type: numeric
    - name: user_embedding
      type: vector
      attrs:
        algorithm: flat
        dims: 3
        distance_metric: cosine
        datatype: float32
""";

// Write the YAML to a file
try (java.io.FileWriter writer = new java.io.FileWriter("schema.yaml")) {
    writer.write(yamlSchema);
    System.out.println("Created schema.yaml file");
} catch (java.io.IOException e) {
    System.err.println("Error writing schema file: " + e.getMessage());
}

// Load schema from YAML file
IndexSchema schemaFromYaml = IndexSchema.fromYamlFile("schema.yaml");
System.out.println("Loaded schema from YAML: " + schemaFromYaml.getName());

// Create an index from the YAML schema
SearchIndex yamlIndex = new SearchIndex(schemaFromYaml, client);
yamlIndex.create(true);
System.out.println("Created index from YAML schema");

Created schema.yaml file
Loaded schema from YAML: user_simple_yaml
Created index from YAML schema


## Cleanup

Below we will clean up after our work. First, you can flush all data from Redis associated with the index by
using the `.clear()` method. This will leave the secondary index in place for future insertions or updates.

But if you want to clean up everything, including the index, just use `.delete()`
which will by default remove the index AND the underlying data.

In [16]:
// Clear all data from Redis associated with the index
int cleared = index.clear();
System.out.println(cleared);

4


In [17]:
// But the index is still in place
System.out.println(index.exists());

true


In [18]:
// Remove / delete the index in its entirety
index.delete(true);

// Also clean up the YAML-based index
yamlIndex.delete(true);

System.out.println("All indices deleted");

// Close the Redis connection
client.close();
System.out.println("Redis connection closed");

All indices deleted
Redis connection closed
