# 📏 Distances in Vector Databases
Vector databases use mathematical distance functions to determine the similarity between vectors. This notebook explores common distance measures used for similarity search.

## 🔢 Common Distance Metrics
- **Cosine Similarity**: Measures the angle between vectors. Common in text similarity.
- **Euclidean Distance**: Straight-line distance between vectors. Good for spatial data.
- **Manhattan Distance**: Sum of absolute differences. Works well in grid-like structures.

> ChromaDB uses **Cosine Similarity** by default to compute how close documents are.

In [None]:
# 📦 Install required packages
%pip install numpy

In [1]:
# Example: Calculating distances manually with NumPy
import numpy as np

# Define two sample vectors
vec1 = np.array([1, 2, 3])
vec2 = np.array([4, 5, 6])

# Euclidean Distance
euclidean = np.linalg.norm(vec1 - vec2)
print("Euclidean Distance:", euclidean)

# Cosine Similarity
cosine_similarity = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
print("Cosine Similarity:", cosine_similarity)

# Manhattan Distance
manhattan = np.sum(np.abs(vec1 - vec2))
print("Manhattan Distance:", manhattan)

Euclidean Distance: 5.196152422706632
Cosine Similarity: 0.9746318461970762
Manhattan Distance: 9


## ✅ Summary
Different tasks may benefit from different distance functions. For example:
- Text/NLP: Cosine Similarity
- Spatial/physical data: Euclidean Distance
- Simpler, fast heuristics: Manhattan Distance

Most vector DBs, including ChromaDB, allow switching distance metrics depending on your embedding use case.