# Creating Text Query Embeddings

If you are creating an embedding to be used as a search query, using one of following query templates will improve ranking results.

**When comparing to text (e.g. in a text to text retrieval use-case):**
> Instruction: Given a query, retrieve passages that are relevant to the query:\nQuery: *{yourText}*

**When comparing to documents (e.g. in a text to document retrieval use-case):**
> Instruction: Retrieve a document that answers the following query:\nQuery: *{yourText}*

**When comparing to images, videos, or documents (e.g. in a text to multimodal and multi-file retrieval use-case):**
> Instruction: Find an image, video or document that match with the following description:\nQuery: *{yourText}*

The example below will let you explore the difference between using a non-optimized query embedding, and one that has been created using the optimized approach above.

Run the cell below to set up a helper function we'll use later.

In [None]:
# Restore variables from setup notebook
%store -r s3_bucket
print(f"Using S3 bucket: {s3_bucket}")
%store -r region_name
print(f"Using region: {region_name}")

In [None]:
from utils.utils import cosine_sim
import nova_embeddings


def sort_by_similarity(indexed_items, query_embedding):
    """Sort items by cosine similarity to a query embedding.

    Args:
        indexed_items (list): List of dictionaries, each containing an "embedding" key.
        query_embedding: The embedding vector to compare against.

    Returns:
        list: Sorted list of dictionaries with "similarity" and "item" keys,
              ordered by similarity score in descending order.
    """
    sorted_items = []
    for item in indexed_items:
        scored_item = {
            "similarity": cosine_sim(item["embedding"], query_embedding),
            "item": item,
        }
        sorted_items.append(scored_item)

    sorted_items.sort(key=lambda x: x["similarity"], reverse=True)
    return sorted_items

Set the size of the embeddings to use throughout this example.

In [None]:
embedding_dimension = 3072

Create a small data set containing a passage of text and an embedding for each passage.

In [None]:
example_passages = [
    "The Science of Laughter: Why Giggles Might Be Humanity's Superpower",
    "Satellites, Selfies, and Space Junk: How Orbit Is Getting Crowded",
    "DIY DNA: How a Cup of Coffee Can Unlock Genetic Mysteries",
    "Rocket Science Isn't Hard? Meet the Everyday Physics Behind Liftoff",
    "Ant Cities vs. Human Cities: Who Builds Better?",
    "Why Mars Wants You: The Surprising Skills Future Space Colonists Will Need",
    "Volcanoes Under Ice: The Hidden Heat Shaping Our Planet",
    "The Secret Life of Bananas: How Your Fruit Bowl Explains Evolution",
    "The Cosmic Ocean: What Jellyfish Teach Us About Traveling the Stars",
    "Time Travel for Beginners: Why Your Microwave Is Already Bending Physics",
]

passages_with_embeddings = []

for index, passage in enumerate(example_passages):
    print(
        f"\rGenerating embedding {index + 1} of {len(example_passages)}",
        end="",
        flush=True,
    )

    indexing_embedding_params = {
        "taskType": "SINGLE_EMBEDDING",
        "singleEmbeddingParams": {
            "embeddingPurpose": "GENERIC_INDEX",
            "embeddingDimension": embedding_dimension,
            "text": {"truncationMode": "END", "value": passage},
        },
    }
    result_body, _ = nova_embeddings.generate_embedding_sync(indexing_embedding_params)
    embedding = nova_embeddings.extract_embedding(result_body)
    passages_with_embeddings.append(
        {"index": index, "text": passage, "embedding": embedding}
    )

We will sort the above list based on similarity to query text you provide. Edit the query text below if you would like, and run the cell to set the `query_text` variable.

In [None]:
query_text = "Stories about rocket launches"

First, we'll try doing a similarity sort using the query text as-is, passing it straight through to create the query embedding.

In [None]:
result_body, _ = nova_embeddings.generate_embedding_sync(
    {
        "taskType": "SINGLE_EMBEDDING",
        "singleEmbeddingParams": {
            "embeddingPurpose": "TEXT_RETRIEVAL",
            "text": {"truncationMode": "END", "value": query_text},
        },
    }
)

query_embedding = nova_embeddings.extract_embedding(result_body)

# Create a list of passages sorted by cosine similarity.
sorted_passages = sort_by_similarity(passages_with_embeddings, query_embedding)

# Print the sorted list.
for item in sorted_passages:
    print(f"Similarity: {item['similarity']:.6f}, Text: {item['item']['text']}")