![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)
# Implementing multi vector search with Redis

Multi vector search is the ability to combine the scores of multiple different vector similarity values to determine relevancy. This notebook will cover how to define a multi vector index and execute multi vector queries.

## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/vector-search/05_multivector_search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


### Install Packages

In [None]:
%pip install  "redisvl>=0.9.0"

### Data/Index Preparation

In this section:

1. We prepare the data necessary for our multi vector search implementations by loading a collection of movies. Each movie object contains the following attributes:
    - `title`
    - `rating`
    - `description`
    - `genre`

2. We'll generate vector embeddings from the movie descriptions. We'll use different models and generate multiple different vectors for each movie.

3. After preparing the data, we populate a search index with these movie records, each with multiple vectors.

Running remotely or in collab? Run this cell to download the necessary dataset.

In [None]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/vector-search/resources .
!rm -rf temp_repo

### Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings and full text fields. **We need to have a Redis
instance available.**

#### Local Redis
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [None]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

#### Alternative Redis Access (Cloud, Docker, other)
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [None]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Create redis client, load data, generate embeddings

In [None]:
from redis import Redis

client = Redis.from_url(REDIS_URL)
client.ping()

In [None]:
import json

with open("resources/movies.json", 'r') as file:
    movies = json.load(file)

## Multi-Vector Index with Multiple Embedding Models

Now let's create a multi-vector search setup by using multiple embedding models for different aspects of our movie data. This approach allows us to:

1. **Use specialized embeddings** - Different models optimized for different tasks
2. **Combine multiple perspectives** - Search across different semantic representations
3. **Improve search quality** - Generate embeddings from different sections of your data

We'll create a new index with multiple vector fields and demonstrate how to query across them.

In [None]:
from redisvl.utils.vectorize import HFTextVectorizer

# Model 1: General purpose embeddings (what we used before)
general_model = HFTextVectorizer(
    model='sentence-transformers/all-MiniLM-L6-v2',
    cache=EmbeddingsCache(
        name="embedcache_general",
        ttl=600,
        redis_client=client,
    )
    dtype="float64"
)

# Model 2: A different model that captures different aspects of the description data
movie_model = HFTextVectorizer(
    model='sentence-transformers/all-mpnet-base-v2',
    cache=EmbeddingsCache(
        name="embedcache_movie",
        ttl=600,
        redis_client=client,
    )
    dtype="float32"
)

# Model 3: Genre-focused embeddings for better genre understanding
genre_model = HFTextVectorizer(
    model='sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2',
    cache=EmbeddingsCache(
        name="embedcache_genre",
        ttl=600,
        redis_client=client,
    )
    dtype="float32"
)

Let's highlight the flexibility of multi-vector search in RedisVL:
- Different embedding models can be used for each field. Here we have 3 different HuggingFace vectorizors, but you can mix and match any vectorizer.
- These models can have different dimensions and datatypes. Want to use a large model with high fidelity and pair it with a lightweight model? Sure thing.
- The data source for these embeddings can be anything. In the cell below we generate embeddings from different aspects of our data. You can have a multiple text embeddings and image embeddings on the same document.


In [None]:
# Generate multiple embeddings for each movie
print("Generating multiple embeddings for movies...")

multi_vector_movie_data = []
for movie in movies:
    movie_with_vectors = {
        **movie,
        "description_vector_general": general_model.embed(movie["description"], as_buffer=True),
        "description_vector_movie": movie_model.embed(movie["description"], as_buffer=True),
        "description_vector_genre": genre_model.embed(f"{movie['genre']} {movie['description']}", as_buffer=True),
    }
    multi_vector_movie_data.append(movie_with_vectors)

print(f"Generated embeddings for {len(multi_vector_movie_data)} movies")

In [None]:
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex

# Create a multi-vector index schema
multi_vector_schema = IndexSchema.from_dict({
  "index": {
    "name": "movies_multivector",
    "prefix": "movie_mv",
    "storage": "hash"
  },
  "fields": [
    { "name": "title", "type": "text" },
    { "name": "description", "type": "text" },
    { "name": "genre", "type": "tag", "attrs": {"sortable": True}},
    { "name": "rating", "type": "numeric", "attrs": {"sortable": True}},
    {
        "name": "description_vector_general",
        "type": "vector",
        "attrs": {
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float64"
        }
    },
    {
        "name": "description_vector_movie",
        "type": "vector",
        "attrs": {
            "dims": 768,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    },
    {
        "name": "description_vector_genre",
        "type": "vector",
        "attrs": {
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    },
  ]
})

# Create the multi-vector index
multi_vector_index = SearchIndex(multi_vector_schema, client, validate_on_load=True)
multi_vector_index.create(overwrite=True, drop=True)

# Load the multi-vector data
multi_vector_index.load(multi_vector_movie_data)
print("Multi-vector index created and populated successfully!")

Notice how each vector field in our index has its own definition with its own attributes.
When Constructing an index that is intended for use with MultiVectorQuery each vector field may have different `datatype` `dims` and `algorithm`, but all must have a `cosine` `distance_metric` in order to properly compute the relative weighting.

### Using the Official MultiVectorQuery Class

Now let's demonstrate how to run a `MultiVectorQuery` in RedisVL. This class provides a clean, way to perform multi-vector search with weighted combinations. It utilizes the `Vector` class to contain the individual query vectors, which is a departure in syntax from other RedisVL queries.

In [None]:
from redisvl.query import Vector
query_text = "action movie with superheroes and explosions"

# Create Vector objects for each embedding model
query_vectors = [
    Vector(
        vector=general_model.embed(query_text, as_buffer=True),
        field_name="description_vector_general",
        dtype="float64",
        weight=0.3  # 30% weight for general embeddings
    ),
    Vector(
        vector=movie_model.embed(query_text, as_buffer=True),
        field_name="description_vector_movie",
        dtype="float32",
        weight=0.5  # 50% weight for movie-specific embeddings
    ),
    Vector(
        vector=genre_model.embed(f"{query_text}", as_buffer=True),
        field_name="description_vector_genre",
        dtype="float32",
        weight=0.2  # 20% weight for genre-focused embeddings
    )
]

query = MultiVectorQuery(
    vectors=query_vectors,
    num_results=num_results,
    return_fields=["title", "description", "genre", "rating"],
)

results = multi_vector_index.query(query)

for i, result in enumerate(results, 1):
    print(f"{i}. {result['title']} ")
    print(f"   Genre: {result['genre']}, Rating: {result['rating']}")
    print(f"   Description: {result['description'][:100]}...")
    print()

# Wrap up
That's a wrap! Hopefully from this you were able to learn how to implement multi-vector search with multiple embedding models

In [None]:
# clean up!
index.delete()