# A Complete Noobs Guide to Vector Search
## Part 2: Searching for Vectors

Vector search is a powerful technique for finding similar points in a massive collection of points by comparing the similarity of their vector representations. It's an efficient wasy to perform retrieval of relevant information based on semantic similarity rather than exact keyword matching.  

At its core, vector search is like finding a needle in the haystack. But instead of hay, you have vectors, and instead of a needle, you have a query vector.  

In the previous blog in this series, I showed you how to transform text into a vector representation using an embedding model. That vector representation is a point in a multi-dimensional space. Using the search functionality in Qdrant, you're going to learn how to navigating this space to find the points most similar to your query.

### Similarity Search: Finding the Closest Neighbors

"Similarity" in this context refers to how close vectors are to each other in this multi-dimensional space. 

Vectors that are close together represent items that share similar characteristics or meaning. For example, vectors representing documents about deep learning research would be closer to each other than vectors representing documents about advanced filo pastry baking techniques. 

But, how do you quantify similarity? 

This is where **metrics** come in. Metrics are basically equations that are used to compute distances between vectors. Qdrant lets you pick between similarity metrics like the dot product, cosine similarity, Euclidean Distance, and Manhattan distance. Choosing the appropriate similarity metric depends on the data you're working with and the specific task. It's a cop-out answer, but seriously, every question like this in AI always "depends" on something. But, I'd like to offer some heuristics to help you reason about how to pick an appropriate similarity metric:

##### **Cosine Similarity**

<img src="https://qdrant.tech/docs/cos.png">

 - Use cosine similarity when the magnitude of the vectors is not important, but the direction is.

 - It is commonly used in text similarity tasks, such as document clustering or information retrieval.
 
 - Cosine similarity measures the cosine of the angle between two vectors, ignoring their lengths.

 - If your data is sparse (i.e., many zero values), cosine similarity or dot product may be more appropriate than Euclidean or Manhattan distance, as they focus on the non-zero dimensions.

Qdrant uses a two-step process to compute cosine similarity for faster search speeds. It normalizes vectors when adding them to the collection and compares them using a fast dot product operation.

##### **Dot Product**

 - Dot product is similar to cosine similarity but considers the vectors' magnitude.

 - It measures the alignment between two vectors and is influenced by their lengths.

 - Dot product is useful when the magnitude of the vectors carries meaningful information.

 - It is often used in recommendation systems, where the magnitude of user preferences or item ratings is significant.

 - If your vectors are normalized (i.e., unit vectors), cosine similarity and dot product will yield similar results.

##### **Euclidean Distance**

 - Euclidean Distance measures the straight-line Distance between two vectors in the high-dimensional space.

 - It is suitable when the absolute differences between vector elements are essential.

 - Euclidean Distance is commonly used in tasks such as image similarity, where pixel intensities or feature values directly correspond to the visual appearance.

 - It is sensitive to the scale of the features, so feature normalization or standardization may be necessary. 
 
 - If your data has varying scales or ranges across different dimensions, consider normalizing or standardizing the features before applying similarity metrics like Euclidean or Manhattan distance.

##### **Manhattan Distance**

 - Manhattan distance, also known as L1 or city block distance, measures the sum of absolute differences between vector elements.

 - It is useful when the features represent distinct dimensions or attributes that are not necessarily related.

 - Manhattan distance is less sensitive to outliers compared to Euclidean Distance.
 
 - It is often used in tasks such as comparing binary or categorical features, where the presence or absence of certain attributes is more important than their exact values.
Here are a few additional considerations:



However, the challenge with vector search arises when dealing with large datasets. If we want to find the most similar documents to a given query vector, the naive approach would require calculating the distance between the query vector and every other vector in the dataset. While this might work for small datasets with dozens or even hundreds of examples, it becomes a significant bottleneck as the dataset grows larger.

 You provide a query vector, and Qdrant efficiently scours its collection to find the vectors most similar to it.  This is like having a map of the vector space and knowing the best routes to reach your destination.


### Query Planning: The Brains Behind the Operation

Under the hood, Qdrant is a master strategist.  It employs **query planning** to determine the most efficient way to execute your search based on factors like the data structure, filters applied, and the number of points involved.  This ensures that you get your results quickly without compromising accuracy.





### Beyond the Basics: Advanced Search Features

While the concepts above provide a solid foundation, Qdrant offers a treasure trove of advanced features for fine-tuning your search experience.  These include:

*   **Payload filtering:** Narrow down your search based on specific attributes of your data.
*   **Batch search:**  Perform multiple searches efficiently in a single request.
*   **Grouping:**  Organize results based on shared characteristics.
*   **Lookup in groups:**  Retrieve additional information related to grouped results.



- How to query a vector db
- ann
- hnsw
- similarity metrics
- query filters