# Vector Similarity
**Vectors** (also called "Embeddings"), represent an AI model's impression (or understanding) of a piece of unstructured data like text, images, audio, videos, etc. Vector Similarity Search (VSS) is the process of finding vectors in the vector database that are similar to a given query vector. Popular VSS uses include recommendation systems, image and video search, document retrieval, and question answering.

## Index Creation
Before doing vector search, first define the schema and create an index.

In [2]:
import redis
from redis.commands.search.field import TagField, VectorField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query

r = redis.Redis(host="localhost", port=6379)

INDEX_NAME = "index"                       # Vector Index Name
VECTOR_DIMENSIONS = 1536                   # Number of Vector Dimensions
DOC_PREFIX = "doc:"                        # RediSearch Key Prefix for the Index

# Schema
schema = (
    TagField("tag"),
    VectorField("vector",                  # Vector Field Name
        "FLAT", {                          # Vector Index Type: FLAT or HNSW
            "TYPE": "FLOAT32",             # FLOAT32 or FLOAT64
            "DIM": VECTOR_DIMENSIONS,      # Number of Vector Dimensions
            "DISTANCE_METRIC": "COSINE",   # Vector Search Distance Metric
            "INITIAL_CAP": 1000            # Helps with memory planning and allocation
        }
    ),
)

# Index Definition
definition = IndexDefinition(prefix=[DOC_PREFIX], index_type=IndexType.HASH)

# Create Index
r.ft(INDEX_NAME).create_index(fields=schema, definition=definition)

b'OK'

## Adding Vectors to Redis

Next, we add vectors (dummy data) to Redis using `hset`. The search index listens to keyspace notifications and will include any written HASH objects prefixed by `DOC_PREFIX`.

In [None]:
%pip install numpy

In [3]:
import numpy as np

In [4]:
r.hset(f"doc:a", mapping={
    "vector": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes(),
    "tag": "foo"
})
r.hset(f"doc:b", mapping={
    "vector": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes(),
    "tag": "foo"
})
r.hset(f"doc:c", mapping={
    "vector": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes(),
    "tag": "bar"
})

2

## Searching
You can use VSS queries with the `.ft(...).search(...)` query command. To use a VSS query, you must specify the option `.dialect(2)`.

There are two supported types of vector queries in Redis: `KNN` and `Range`. `Hybrid` queries can work in both settings and combine elements of traditional search and VSS.

### KNN Queries
KNN queries are for finding the topK most similar vectors given a query vector.

In [5]:
query = (
    Query("*=>[KNN 2 @vector $vec as score]")
     .sort_by("score")
     .return_fields("id", "score")
     .paging(0, 2)
     .dialect(2)
)

query_params = {
    "vec": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
}
r.ft(INDEX_NAME).search(query, query_params).docs

[Document {'id': 'doc:c', 'payload': None, 'score': '0.240548729897'},
 Document {'id': 'doc:b', 'payload': None, 'score': '0.249360978603'}]

### Range Queries
Range queries provide a way to filter results by the distance between a vector field in Redis and a query vector based on some pre-defined threshold (radius).

In [6]:
query = (
    Query("@vector:[VECTOR_RANGE $radius $vec]=>{$YIELD_DISTANCE_AS: score}")
     .sort_by("score")
     .return_fields("id", "score")
     .paging(0, 3)
     .dialect(2)
)

# Find all vectors within 0.8 of the query vector
query_params = {
    "radius": 0.8,
    "vec": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
}
r.ft(INDEX_NAME).search(query, query_params).docs

[Document {'id': 'doc:a', 'payload': None, 'score': '0.247005105019'},
 Document {'id': 'doc:c', 'payload': None, 'score': '0.247780561447'},
 Document {'id': 'doc:b', 'payload': None, 'score': '0.255188882351'}]

See additional Range Query examples in [this Jupyter notebook](https://github.com/RediSearch/RediSearch/blob/master/docs/docs/vecsim-range_queries_examples.ipynb).

### Hybrid Queries
Hybrid queries contain both traditional filters (numeric, tags, text) and VSS in one single Redis command.

In [7]:
query = (
    Query("(@tag:{ foo })=>[KNN 2 @vector $vec as score]")
     .sort_by("score")
     .return_fields("id", "tag", "score")
     .paging(0, 2)
     .dialect(2)
)

query_params = {
    "vec": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
}
r.ft(INDEX_NAME).search(query, query_params).docs

[Document {'id': 'doc:a', 'payload': None, 'score': '0.244236528873', 'tag': 'foo'},
 Document {'id': 'doc:b', 'payload': None, 'score': '0.249226748943', 'tag': 'foo'}]

See additional Hybrid Query examples in [this Jupyter notebook](https://github.com/RediSearch/RediSearch/blob/master/docs/docs/vecsim-hybrid_queries_examples.ipynb).

## Vector Creation and Storage Examples
The above examples use dummy data as vectors. However, in reality, most use cases leverage production-grade AI models for creating embeddings. Below we will take some sample text data, pass it to the OpenAI and Cohere API's respectively, and then write them to Redis.

In [8]:
texts = [
    "Today is a really great day!",
    "The dog next door barks really loudly.",
    "My cat escaped and got out before I could close the door."
]

### OpenAI Embeddings

In [None]:
%pip install openai

In [11]:
import openai

openai.api_key = "YOUR OPENAI API KEY"

In [12]:
# Create Embeddings with OpenAI text-embedding-ada-002
# https://openai.com/blog/new-and-improved-embedding-model
response = openai.Embedding.create(input=texts, engine="text-embedding-ada-002")
embeddings = np.array([r["embedding"] for r in response["data"]], dtype=np.float32)

# Write to Redis
for i, embedding in enumerate(embeddings):
    r.hset(f"doc:{i}", mapping = {
        "vector": embedding.tobytes(),
        "tag": "openai"
    })

In [14]:
embeddings

array([[ 0.00509819,  0.0010873 , -0.00228475, ..., -0.00457579,
         0.01329307, -0.03167175],
       [-0.00352492, -0.00551083, -0.01318067, ..., -0.02915609,
         0.01472911, -0.01369681],
       [-0.01286718,  0.00351361, -0.01723753, ..., -0.01536361,
         0.01949651, -0.05041625]], dtype=float32)

### Cohere Embeddings

In [None]:
%pip install cohere

In [18]:
import cohere

co = cohere.Client("YOUR COHERE API KEY")

In [19]:
# Create Embeddings with OpenAI
response = co.embed(texts=texts, model="small")
embeddings = np.array(response.embeddings, dtype=np.float32)

# Write to Redis
for i, embedding in enumerate(embeddings):
    r.hset(f"doc:{i}", mapping = {
        "vector": embedding.tobytes(),
        "tag": "cohere"
    })

In [20]:
embeddings

array([[-0.3010254 , -0.7158203 , -0.28515625, ...,  0.8125    ,
         1.0292969 , -0.8095703 ],
       [-0.02745056, -1.4892578 ,  0.23937988, ..., -0.8930664 ,
         0.15991211, -3.2050781 ],
       [ 0.09777832,  0.7270508 , -0.296875  , ..., -1.9638672 ,
         1.6650391 , -0.23693848]], dtype=float32)

Find more example apps, tutorials, and projects [in this GitHub organization](https://github.com/RedisVentures).