### Working with Vector Embeddings

Semantic Search is quickly becoming the norm when looking for contextually rich content across vast amounts of data. Vector embeddings, a method of mathematically representing data points about an object as an array of floats, are compared with each other to discover similarity distances. 

In this tutorial we'll go through several processes to prepare you to integrate vector embeddings into your application's search capabilities. This codebase is in support of the video [Redis as a Vector Database](https://youtube.com/tbd). The full repository may be found [here](https://github.com/redis-developer/redis-as-a-vector-database.git). To learn more about Vector Databases in Redis, check out our full documentation here or try the Redis University course RU204:Redis as a Vector Database.


- Select a model for embedding existing data fields and query text
- Convert a text field into a vector embedding
- Store a vector embedding in a JSON object in Redis
- Create an index for vector search over JSON objects stored in Redis
- Prepare and execute query for vector search

The database contains approximately 15,000 JSON objects, each representing a science fiction or fantasy book. The description text will be converted into a vector embedding and used for our query examples. Here is an example of a typical JSON book object:

``` json
{
  "title": "Fire In His Spirit",  
  "author": "Ruby Dixon",  
  "score": "3.94",  
  "votes": "2754",  
  "description": "Gwen’s never wanted to be a leader, but when no one else stepped up, she took on the role. As the mayor of post-apocalyptic Shreveport, she’s made decisions to protect her people... and most of them have backfired disastrously. When she discovers that the dangerous gold dragon lurking outside of the fort has decided she’s his mate, heartsick Gwen thinks that the best thing she can do is confront him and take him far away from the city. She does this to save her people - her sister, her friends, her fort. She doesn’t expect to understand the dragon. She certainly doesn’t expect to fall in love.",
  "year_published": "2018",  
  "url": "http://www.goodreads.com/book/show/40790825-fire-in-his-spirit",  
  "genres":["Romance","Fantasy (Dragons)","Fantasy","Romance (Paranormal Romance)","Fantasy (Paranormal)","Science Fiction","Paranormal (Shapeshifters)","Science Fiction (Aliens)","Science Fiction (Dystopia)","Apocalyptic (Post Apocalyptic)"],  
  "editions": ["English", "Japanese", "Arabic", "French"],  
  "pages": 241
}
```

### 1. Set a model for embedding
We will be using the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) sentence transformer for embedding our book descriptions. Let's fetch and set the model for our embeddings.

In [8]:
# Import sentence transformers and set "all-MiniLM-L6-v2" as the pretrained model to use
from sentence_transformers import SentenceTransformer
model_name = "all-MiniLM-L6-v2"
model = SentenceTransformer(model_name)


  from .autonotebook import tqdm as notebook_tqdm
  return self.fget.__get__(instance, owner)()


### 2. Convert a text field within a JSON object into a vector embedding
We'll use the above JSON book object to convert the text description to a vector embedding.

In [41]:
# convert the description text to a vector
book = {
  "title": "Fire In His Spirit",  
  "author": "Ruby Dixon",  
  "score": "3.94",  
  "votes": "2754",  
  "description": "Gwen’s never wanted to be a leader, but when no one else stepped up, she took on the role. As the mayor of post-apocalyptic Shreveport, she’s made decisions to protect her people... and most of them have backfired disastrously. When she discovers that the dangerous gold dragon lurking outside of the fort has decided she’s his mate, heartsick Gwen thinks that the best thing she can do is confront him and take him far away from the city. She does this to save her people - her sister, her friends, her fort. She doesn’t expect to understand the dragon. She certainly doesn’t expect to fall in love.",
  "year_published": "2018",  
  "url": "http://www.goodreads.com/book/show/40790825-fire-in-his-spirit",  
  "genres":["Romance","Fantasy (Dragons)","Fantasy","Romance (Paranormal Romance)","Fantasy (Paranormal)","Science Fiction","Paranormal (Shapeshifters)","Science Fiction (Aliens)","Science Fiction (Dystopia)","Apocalyptic (Post Apocalyptic)"],  
  "editions": ["English", "Japanese", "Arabic", "French"],  
  "pages": 241
}

book_description = book["description"]
embedding = model.encode(book_description).tolist()
book["embedding"] = embedding


### 3. Create a schema for indexing JSON objects
We'll be using the [RedisVL](https://www.redisvl.com/index.html) client library to add the above book object to our Redis Vector Database. Let's create a search index before we insert our object. This will require defining a schema of the JSON field names we want to monitor as well as the associated data type. This dictates how we will search our data.

In [41]:
# Import RedisVL client library
from redisvl.index import SearchIndex
from redisvl.query import VectorQuery
from redisvl.query.filter import Tag, Num


In [29]:
# Create schema for indexing JSON book objects
schema = {
    "index": {
        "name": "book_index",
        "prefix": "book",
        "storage_type": "json",
    },
    "fields": {
        "tag": [
            {"name": "id"},
            {"name": "$.editions[*]", "as_name" : "editions"},
            {"name": "$.genres[*]", "as_name" : "genres"},
        ],
        "text": [
            {"name": "$.author", "as_name" : "author"},
            {"name": "$.description", "as_name": "description"},
            {"name": "$.title", "as_name" : "title"},
        ],
        "numeric": [
            {"name": "pages"},
            {"name": "year_published"},
            {"name": "$.votes", "as_name" : "votes"},
            {"name": "$.score", "as_name" : "score"}
        ],
        "vector": [{
            "name": "$.embedding",
            "as_name": "embedding",
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "flat",
            "datatype": "float32"
        }]
    },
}
      

In [25]:
# Create a Vector Index with RedisVL
index = SearchIndex.from_dict(
    schema, 
    redis_url="redis://localhost:6379"
)
index.create(overwrite=False)

Index already exists, not overwriting.


### Adding a JSON object with a Vector Embedding to Redis
Now that the book_index search index has been created, adding new JSON objects to the database will also have them indexed for search. Note that the .load() function accepts an array of objects as an argument. We'll output the new entry's key to verify success.

In [45]:
new_book = [book]
new_book_key = index.load(new_book)

print(new_book_key)

['book:aac60c97ad5443a4bccee7b9fd1386e3']


### Prepare a query embedding for semantic search
Now that we've seen how to add a JSON object with a vector embedding into Redis, lets switch to querying our database for results. We will need to convert our query text into an embedding. The approach is similar to our operations on the book description above. We'll output the vector embedding length to verify success.

In [26]:
# Prepare query text as an embedding
query_text = "I want a story with a woman as a main character in a post-apocalyptic world. There should be dragons."
# tag_filter = Tag("genre") == "Science Fiction"

embedding_query = model.encode(query_text).tolist()
print(len(embedding_query))

384


### Create the query object
We will create a VectorQuery object in RedisVL which requires the query vector, what vector field to search (embedding), what fields to return, and the total number of results.

In [30]:
# Create a Vector Query Search object with RedisVL. Return the top three results with title, description, and vector distance
query = VectorQuery(
    vector = embedding_query,
    vector_field_name = "embedding",
    return_fields=["author","title", "description", "genres"],
    dialect=3,
    num_results=3
)

### Execute the Vector Query
We run the query function with the VectorQuery object passed in as a parameter. We receive 3 results with our requested fields. Note the included `vector_distance` field. A lower distance represents a closer similarity to the original search query. A higher distance represents less similarity.

In [31]:
# Execute the Vector Query
import pprint
pp = pprint.PrettyPrinter(indent=2)

results = index.query(query)
pp.pprint(results)

[ { 'author': '["Marie Brennan"]',
    'description': '["You, dear reader, continue at your own risk. It is not '
                   'for the faint of heart—no more so than the study of '
                   'dragons itself. But such study offers rewards beyond '
                   "compare: to stand in a dragon's presence, even for the "
                   "briefest of moments—even at the risk of one's life—is a "
                   'delight that, once experienced, can never be forgotten. . '
                   '. .All the world, from Scirland to the farthest reaches of '
                   "Eriga, know Isabella, Lady Trent, to be the world's "
                   'preeminent dragon naturalist. She is the remarkable woman '
                   'who brought the study of dragons out of the misty shadows '
                   'of myth and misunderstanding into the clear light of '
                   'modern science. But before she became the illustrious '
                   'figure we know t

### Filtered Searches
We've received some great results, all wtih a woman as a lead character and dragons! Let's enhance our search by adding a filter. We can request only results that have the `Science Fiction` tag in the genres array and no mention of `Fantasy`.

In [43]:
science_tag = Tag("genres") == "Science Fiction"
page_count = Num("genres") <= 400
tag_filter = science_tag & no_romance
query = VectorQuery(
    vector = embedding_query,
    vector_field_name = "embedding",
    filter_expression = tag_filter,
    return_fields=["author","title", "description", "genres", "pages"],
    dialect=3,
    num_results=3
)

results = index.query(query)
pp.pprint(results)


[ { 'author': '["Anne McCaffrey"]',
    'description': '["HOW CAN ONE GIRL SAVE AN ENTIRE WORLD?To the nobles who '
                   'live in Benden Weyr, Lessa is nothing but a ragged kitchen '
                   'girl. For most of her life she has survived by serving '
                   'those who betrayed her father and took over his lands. Now '
                   'the time has come for Lessa to shed her disguise—and take '
                   'back her stolen birthright. But everything changes when '
                   'she meets a queen dragon. The bond they share will be deep '
                   'and last forever. It will protect them when, for the first '
                   'time in centuries, Lessa’s world is threatened by Thread, '
                   'an evil substance that falls like rain and destroys '
                   'everything it touches. Dragons and their Riders once '
                   'protected the planet from Thread, but there are very few '
                 