# A Complete Noobs Guide to Vector Search
## Part 2: Searching for Vectors

Vector search is a powerful technique for finding similar points in a massive collection of points by comparing the similarity of their vector representations. It's an efficient wasy to perform retrieval of relevant information based on semantic similarity rather than exact keyword matching.  

At its core, vector search is like finding a needle in the haystack. But instead of hay, you have vectors, and instead of a needle, you have a query vector.  

In the previous blog in this series, I showed you how to transform text into a vector representation using an embedding model. That vector representation is a point in a multi-dimensional space. Using the search functionality in Qdrant, you're going to learn how to navigating this space to find the points most similar to your query.

### Similarity Search: Finding the Closest Neighbors

"Similarity" in this context refers to how close vectors are to each other in this multi-dimensional space. 

Vectors that are close together represent items that share similar characteristics or meaning. For example, vectors representing documents about deep learning research would be closer to each other than vectors representing documents about advanced filo pastry baking techniques. But, how do you quantify similarity? 

This is where **metrics** come in. 

Metrics are basically equations that are used to compute distances between vectors. Qdrant lets you pick between similarity metrics like the dot product, cosine similarity, Euclidean Distance, and Manhattan distance. Choosing the appropriate similarity metric depends on the data you're working with and the specific task. It's a cop-out answer, but seriously, every question like this in AI always "depends" on something. 

But, I'd like to offer some heuristics I've picked up over the years to help you reason about how to pick an appropriate similarity metric:

##### 📐 **Cosine Similarity**

<img src="https://qdrant.tech/docs/cos.png">

 - Use cosine similarity when the magnitude of the vectors is not important, but the direction is.

 - It is commonly used in text similarity tasks, such as document clustering or information retrieval.
 
 - Cosine similarity measures the cosine of the angle between two vectors, ignoring their lengths.

 - If your data is sparse (i.e., many zero values), cosine similarity or dot product may be more appropriate than Euclidean or Manhattan distance, as they focus on the non-zero dimensions.

Qdrant uses a two-step process to compute cosine similarity for faster search speeds. It normalizes vectors when adding them to the collection and compares them using a fast dot product operation.

##### 🔴 **Dot Product**

 - Dot product is similar to cosine similarity but considers the vectors' magnitude.

 - It measures the alignment between two vectors and is influenced by their lengths.

 - Dot product is useful when the magnitude of the vectors carries meaningful information.

 - It is often used in recommendation systems, where the magnitude of user preferences or item ratings is significant.

 - If your vectors are normalized (i.e., unit vectors), cosine similarity and dot product will yield similar results.

##### ፨ **Euclidean Distance**

 - Euclidean Distance measures the straight-line Distance between two vectors in the high-dimensional space.

 - It is suitable when the absolute differences between vector elements are essential.

 - Euclidean Distance is commonly used in tasks such as image similarity, where pixel intensities or feature values directly correspond to the visual appearance.

 - It is sensitive to the scale of the features, so feature normalization or standardization may be necessary. 
 
 - If your data has varying scales or ranges across different dimensions, consider normalizing or standardizing the features before applying similarity metrics like Euclidean or Manhattan distance.

##### 🗽 **Manhattan Distance**

 - Manhattan distance, also known as L1 or city block distance, measures the sum of absolute differences between vector elements.

 - It is useful when the features represent distinct dimensions or attributes that are not necessarily related.

 - Manhattan distance is less sensitive to outliers compared to Euclidean Distance.
 
 - It is often used in tasks such as comparing binary or categorical features, where the presence or absence of certain attributes is more important than their exact values.

#### Now, I know what you're thinking. 

I mentioned that these metrics are equations that need to be computed. If you're looking to find the most similar documents to a particular query vector, the standard approach would involve calculating the distance between the query vector and every other vector in the dataset. This method works fine when you have a small collection of points like the 100 we added to our collection in the previous post. However, if you're dealing with millions, tens of millions, hundreds of millions, or even billions of data points, which is the case in most real-world production settings, this presents a challenge.

To tackle this problem, you can implement comparable methods to the ones used in relational databases. Database indexes are established to speed-ups queries and prevent scanning of the full table. Likewise, vector databases use specialized data structures and algorithms to speed up the search process.

## Navigating the Vector Space: HNSW and ANN

<img src="https://qdrant.tech/docs/gettingstarted/vector-search.png">

While similarity metrics provide the "ruler" for measuring distances between vectors, efficiently searching through a massive collection requires a different tool – **Hierarchical Navigable Small World (HNSW)** graphs. 

HNSW is a type of **Approximate Nearest Neighbor (ANN)** algorithm specifically designed for efficient vector search. Qdrant uses HNSW to efficiently find the most similar vectors to a given query vector without having to explicitly compare it to all other vectors in the collection. 

Picture a network of interconnected points, where each point represents a vector. HNSW constructs this network in a hierarchical manner, creating layers of connections between vectors. The search starts at the top layer and progressively moves down the hierarchy, narrowing down the search space until the nearest neighbors are found. This technique means you can efficiently navigate the vector space and quickly zoom in on the most likely similar points without having to visit every single point. Since it's a type of ANN algorithim, it prioritizes speed over absolute precision. Instead of guaranteeing the absolute closest neighbors, it will efficiently find points that are close enough.

By using vector databases like Qdrant, which uses ANN algorithms like HNSW, the search process is fast. Instead of calculating the distance to every object in the database, you're able to intelligently select a subset of candidate objects to compare against. Which, of course, reduces the computational overhead. You get the benefit of sublinear search times, pretty good results. It's like having a map with shortcuts that lead you directly to the neighborhoods where you're most likely to find what you're looking for. 

Which is exactly what you need when you're searching for points at massive scale.

#### Now that you've got a good idea of how vector search works, time to see it in action. Start by initializing the Qdrant client.

In [69]:
import os
from dotenv import load_dotenv

from qdrant_client import QdrantClient

load_dotenv("./.env")

q_client = QdrantClient(
    url=os.getenv('QDRANT_URL'),
    api_key=os.getenv('QDRANT_API_KEY')
)

Now, let's get down to business and actually search some vectors! Here's what you need:

1.  **User Input:** First things first, you need some text input. This could be anything from a search query to a random sentence. 

2.  **Vectorize the Input:** Next, you transform that input text into a vector embedding using the same embedding model that you used when upserting into the collection. This turns the text input into a point in the multi-dimensional vector space.

3.  **Qdrant to the Rescue:** Now, use Qdrant to search vectors that are closest to our input vector.

4.  **Matchmaker, Matchmaker:** You'll get a list of vectors that best match the input text. These represent the most similar items in your collection.

First, define a helper function to get the embedding representation of the input query.

In [None]:
from openai import OpenAI

openai_client = OpenAI()

def get_text_embedding(
    text: str, 
    openai_client: OpenAI= openai_client, 
    model: str = "text-embedding-3-large") -> list:
    """
    Get the vector representation of the input text using the specified OpenAI embedding model.

    Args:
        openai_client (OpenAI): An instance of the OpenAI client.
        text (str): The input text to be embedded.
        model (str, optional): The name of the OpenAI embedding model to use. Defaults to "text-embedding-3-large".

    Returns:
        list: The vector representation of the input text as a list of floats.

    Raises:
        OpenAIError: If an error occurs during the API call.
    """
    try:
        embedding = openai_client.embeddings.create(
            input=text, 
            model=model
        ).data[0].embedding
        return embedding
    except openai_client.OpenAIError as e:
        raise e


Before using the function defined below, it's good to get a sense of what gets returned when you search. Notice that in the cell below, I passed a list of keys for the payload that I want to recieve. In the function, I set `with_payload=True` so it will return all the stuff in the payload.

In [88]:
q_client.search(
    collection_name="arxiv_chunks",
    query_vector=("summary" ,get_text_embedding("machine learning in sound and diffusion")),
    with_payload=["summary", "title", "authors"],
    limit=2
)

[ScoredPoint(id='8aa3bc5d-7491-4917-b79c-4a10d05644e3', version=0, score=0.41306904, payload={'authors': ['Dongchao Yang', 'Jianwei Yu', 'Helin Wang', 'Wen Wang', 'Chao Weng', 'Yuexian Zou', 'Dong Yu'], 'summary': 'Generating sound effects that humans want is an important topic. However,\nthere are few studies in this area for sound generation. In this study, we\ninvestigate generating sound conditioned on a text prompt and propose a novel\ntext-to-sound generation framework that consists of a text encoder, a Vector\nQuantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder. The\nframework first uses the decoder to transfer the text features extracted from\nthe text encoder to a mel-spectrogram with the help of VQ-VAE, and then the\nvocoder is used to transform the generated mel-spectrogram into a waveform. We\nfound that the decoder significantly influences the generation performance.\nThus, we focus on designing a good decoder in this study. We begin with the\ntraditional 

If you have a whole bunch of keys in your payload, but there are only a couple that you want to exclude, you can use the `PayloadSelectorExclude`

In [92]:
from qdrant_client import QdrantClient, models

exlusioner = models.PayloadSelectorExclude(exclude=["chunk", "text_id"])

q_client.search(
    collection_name="arxiv_chunks",
    query_vector=("summary" ,get_text_embedding("machine learning in sound and diffusion")),
    with_payload=exlusioner,
    limit=2
)

[ScoredPoint(id='8aa3bc5d-7491-4917-b79c-4a10d05644e3', version=0, score=0.41298598, payload={'authors': ['Dongchao Yang', 'Jianwei Yu', 'Helin Wang', 'Wen Wang', 'Chao Weng', 'Yuexian Zou', 'Dong Yu'], 'source': 'http://arxiv.org/pdf/2207.09983', 'summary': 'Generating sound effects that humans want is an important topic. However,\nthere are few studies in this area for sound generation. In this study, we\ninvestigate generating sound conditioned on a text prompt and propose a novel\ntext-to-sound generation framework that consists of a text encoder, a Vector\nQuantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder. The\nframework first uses the decoder to transfer the text features extracted from\nthe text encoder to a mel-spectrogram with the help of VQ-VAE, and then the\nvocoder is used to transform the generated mel-spectrogram into a waveform. We\nfound that the decoder significantly influences the generation performance.\nThus, we focus on designing a good decoder i

You can also create more interesting and complex filters. This is useful when it's impossible to express all the features of the object in the embedding. I recommend checking out the documentation for filters [here](https://qdrant.tech/documentation/concepts/filtering/) to get a sense of the options available to you. I'm sure we'll make use of filtering as this series progresses.

Below, I've created a filter on the author field. Basically saying that the client *should* return point where Dong Yu is one of the authoers of the paper.

There are other filtering clauses like `Must` and `Must Not`, in addition to filtering conditions like `Match`, `Match Except`, `Nested key`. These can be combined to form complex conditions. Again, I recommend checking out the document and hacking around on your own.

In [110]:
author_filter = models.Filter(
    should=[
        models.FieldCondition(
            key="authors",
            match=models.MatchValue(value="Dong Yu")
            )
            ])

q_client.search(
    collection_name="arxiv_chunks",
    query_vector=("summary", get_text_embedding("machine learning in sound and diffusion")),
    query_filter=author_filter,
    limit=5
)

[ScoredPoint(id='8aa3bc5d-7491-4917-b79c-4a10d05644e3', version=0, score=0.41306904, payload={'authors': ['Dongchao Yang', 'Jianwei Yu', 'Helin Wang', 'Wen Wang', 'Chao Weng', 'Yuexian Zou', 'Dong Yu'], 'chunk': 'that it can effectively alleviate the unidirectional bias and the\naccumulated prediction error problems. We adopt the idea\nfrom diffusion models, which use a forward process to corrupt\nthe original mel-spectrogram tokens in Tsteps, and then let the\nmodel learn to recover the original tokens in a reverse process.\nSpeciﬁcally, in the forward process, we deﬁne a transition\nmatrix that denotes probability of each token transfer to a\nrandom token or a pre-deﬁned MASK token. By using the\ntransition matrix, the original tokens x0\x18q(x0)transfer\ninto a stationary distribution p(xT). In the reverse process,\nwe let the network learn to recover the original tokens from\nxT\x18p(xT)conditioned on the text features. Figure 1\n(c) shows an example of non-autoregressive mel-spect

Now, define a search function. You'll see there is an argument defined `named_vector_to_search`, this will define which vectore you want to query against. Any other type of payload filtering you want to do can be passed as a `kwarg`.

In [76]:
def search(
    named_vector_to_search: str,
    input_query: str, 
    limit: int = 5, 
    client: QdrantClient = q_client, 
    collection_name: str = "arxiv_chunks", 
    **kwargs):
    """
    Perform a vector search in the Qdrant database based on the input query.

    This method takes an input query string, converts it into a vector embedding using the
    "text-embedding-3-large" model, and searches for the closest matching vectors in the
    Qdrant database. The search results are returned as a list of dictionaries containing
    the item ID, similarity score, and payload information

    Args:
        input_query (str): The input query string to search for.
        named_vector_to_search: the vector you want to search against
        limit (int, optional): The maximum number of search results to return. Default is 3.
        kwargs: Additional keyword arguments to pass to the Qdrant search method.

    Returns:
        list: A list of dictionaries representing the search results. Each dictionary contains
              the following keys:
              - "id": The ID of the matching item in the Qdrant database.
              - "similarity_score": The similarity score between the input query and the matching item.
              - metadata from the payload

    """

    input_vector = get_text_embedding(input_query)

    search_result = client.search(
        collection_name=collection_name,
        query_vector=(named_vector_to_search, input_vector),
        limit=limit,
        with_payload=True,
        **kwargs
    )

    result = []
    for item in search_result:
        similarity_score = item.score
        payload = item.payload
        data = {
            "similarity_score": similarity_score, 
            "summary": payload.get("summary"),
            "title": payload.get("title"), 
            "source": payload.get("source"),
            "authors": payload.get("authors")
            }
        result.append(data)

    return result

In [83]:
QUERY_STRING = "agents, reasoning, chain-of-thought, few-shot prompting"

search(
    named_vector_to_search= "summary", 
    input_query=QUERY_STRING
    )

[{'similarity_score': 0.58828723,
  'summary': 'The past decade has witnessed dramatic gains in natural language processing\nand an unprecedented scaling of large language models. These developments have\nbeen accelerated by the advent of few-shot techniques such as chain of thought\n(CoT) prompting. Specifically, CoT pushes the performance of large language\nmodels in a few-shot setup by augmenting the prompts with intermediate steps.\nDespite impressive results across various tasks, the reasons behind their\nsuccess have not been explored. This work uses counterfactual prompting to\ndevelop a deeper understanding of CoT-based few-shot prompting mechanisms in\nlarge language models. We first systematically identify and define the key\ncomponents of a prompt: symbols, patterns, and text. Then, we devise and\nconduct an exhaustive set of experiments across four different tasks, by\nquerying the model with counterfactual prompts where only one of these\ncomponents is altered. Our experim

You can set the threshold for similarity as well via the `score_threshold` argument.

In [84]:
search(
    named_vector_to_search= "summary", 
    input_query=QUERY_STRING,
    score_threshold=0.51
    )

[{'similarity_score': 0.58828723,
  'summary': 'The past decade has witnessed dramatic gains in natural language processing\nand an unprecedented scaling of large language models. These developments have\nbeen accelerated by the advent of few-shot techniques such as chain of thought\n(CoT) prompting. Specifically, CoT pushes the performance of large language\nmodels in a few-shot setup by augmenting the prompts with intermediate steps.\nDespite impressive results across various tasks, the reasons behind their\nsuccess have not been explored. This work uses counterfactual prompting to\ndevelop a deeper understanding of CoT-based few-shot prompting mechanisms in\nlarge language models. We first systematically identify and define the key\ncomponents of a prompt: symbols, patterns, and text. Then, we devise and\nconduct an exhaustive set of experiments across four different tasks, by\nquerying the model with counterfactual prompts where only one of these\ncomponents is altered. Our experim

# That's it for this one!

You've gotten but a glimpse of the power of vector search.

You've seen, first hand, how you can use it tp efficiently finding similar points in a vast collection of data by leveraging the vector representations of items. You've also gotten to understand how Qdrant simplifies this process by providing a suite of tools and algorithms like HNSW to navigate the high-dimensional vector space and retrieve the most relevant results. By understanding concepts like similarity metrics and approximate nearest neighbor search, you can harness the power of vector search to build applications that excel at semantic retrieval and similarity-based recommendations. Whether you're working with text, images, or other types of data, vector search open up a world of possibilities for creating intelligent and efficient systems. 

There is a lot more ground to cover, and things are only going to get more interesting from here on out. I hope you're as excited to learn about it as I am teaching it to you!