Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/search/full-text-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ import {

LanceDB provides support for Full-Text Search via Lance, allowing you to incorporate keyword-based search (based on BM25) in your retrieval solutions.

<Info title="Need index build steps and parameters?">
Index creation and tuning live under [/user-guides/indexing/fts-index](/user-guides/indexing/fts-index).
</Info>

## Basic Usage

Consider that we have a LanceDB table named `my_table`, whose string column `text` we want to index and query via keyword search, the FTS index must be created before you can search via keywords.
Expand Down
5 changes: 5 additions & 0 deletions docs/search/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ sidebarTitle: Overview
description: "Comprehensive guide to all search capabilities in LanceDB including vector search, full-text search, hybrid search, and more."
---

<Info title="Looking for indexes and metrics?">
- Index types and build guidance: [/user-guides/indexing/index](/user-guides/indexing/index)
- Distance metrics for vector indexes: [/user-guides/indexing/vector-index#supported-distance-metrics](/user-guides/indexing/vector-index#supported-distance-metrics)
</Info>

| Feature | Description |
|:---------------|:------------|
| [Vector Search](/search/vector-search/) | Semantic similarity search with multiple distance metrics |
Expand Down
32 changes: 11 additions & 21 deletions docs/search/vector-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,37 +10,22 @@ Vector search is a technique used to search for similar items based on their vec

Raw data (e.g. text, images, audio, etc.) is converted into embeddings via an embedding model, which are then stored in a vector database like LanceDB. To perform similarity search at scale, an index is created on the stored embeddings, which can then used to perform fast lookups.

## Supported Distance Metrics

Distance metrics determine how LanceDB compares vectors to find similar matches. Euclidean or `l2` is the default, and used for general-purpose similarity, `cosine` for unnormalized embeddings, `dot` for normalized embeddings (best performance), or `hamming` for binary vectors.

<Warning>
Ensure you always use the same distance metric that your embedding model was trained with. Most modern embedding models use cosine similarity, so `cosine` is often the best choice. However, if your vectors are normalized, you should use `dot` for best performance.
</Warning>

The right metric improves both search accuracy and query performance. Currently, LanceDB supports the following metrics:

| Metric | Description | Default |
| :-------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------ |
| `l2` | [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) - measures the straight-line distance between two points in vector space. Calculated as the square root of the sum of squared differences between corresponding vector components. | ✓ |
| `cosine` | [Cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) - measures the cosine of the angle between two vectors, ranging from -1 to 1. Computed as the dot product divided by the product of vector magnitudes. Use for unnormalized vectors. | x |
| `dot` | [Dot product](https://en.wikipedia.org/wiki/Dot_product) - calculates the sum of products of corresponding vector components. Provides raw similarity scores without normalization, sensitive to vector magnitudes. Use for normalized vectors for best performance. | x |
| `hamming` | [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) - counts the number of positions where corresponding bits differ between binary vectors. Only applicable to binary vectors stored as packed uint8 arrays. | x |

### Configure Distance Metric
## Configure Distance Metric

By default, `l2` will be used as metric type. You can specify the metric type as
`cosine` or `dot` if required.

**Note:** You can configure the distance metric during search only if there’s no vector index. If a vector index exists, the distance metric will always be the one you specified when creating the index.
<Warning>
Ensure you use the same distance metric that your embedding model was trained with. Most modern embedding models use `cosine`, but if your vectors are normalized, `dot` usually performs best.
</Warning>

<CodeGroup>
```python Python icon="python"
tbl.search(np.random.random((1536))).distance_type("cosine").limit(10).to_list()
```

```ts TypeScript icon="square-js"
const results2 = await (
const results = await (
tbl.search(Array(128).fill(1.2)) as lancedb.VectorQuery
)
.distanceType("cosine")
Expand All @@ -49,7 +34,12 @@ const results2 = await (
```
</CodeGroup>

Here you can see the same search but using `cosine` similarity instead of `l2` distance. The result focuses on vector direction rather than absolute distance, which works better for normalized embeddings.
<Note>
You can configure the distance metric during search only if there's no vector index. If a vector index exists, the distance metric will always be the one you specified when creating the index.

See Supported distance metrics at [/user-guides/indexing/vector-index#supported-distance-metrics](/user-guides/indexing/vector-index#supported-distance-metrics).
</Note>


## Vector Search With ANN Index

Expand Down
4 changes: 4 additions & 0 deletions docs/user-guides/indexing/fts-index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ import { PyFtsIndexAsync as FtsIndexAsync, PyFtsIndexCreate as FtsIndexCreate, P

LanceDB Cloud and Enterprise provide performant full-text search based on BM25, allowing you to incorporate keyword-based search in your retrieval solutions.

<Info title="Looking for query syntax?">
Search examples and filtering live under [/search/full-text-search](/search/full-text-search).
</Info>

<Note>
The `create_fts_index` API returns immediately, but index building happens asynchronously.
</Note>
Expand Down
19 changes: 19 additions & 0 deletions docs/user-guides/indexing/vector-index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ import {

LanceDB offers two main vector indexing algorithms: **Inverted File (IVF)** and **Hierarchically Navigable Small Worlds (HNSW)**. You can create multiple vector indexes within a Lance table. This guide walks through common configurations and build patterns.

<Info title="Looking for query examples?">
Vector query patterns and tuning live under [/search/vector-search](/search/vector-search) and hybrid workflows under [/search/hybrid-search](/search/hybrid-search).
</Info>

### Option 1: Self-Hosted Indexing

**Manual, Sync or Async:** If using LanceDB Open Source, you will have to build indexes manually, as well as reindex and tune indexing parameters. The Python SDK lets you do this *sychronously and asychronously*.
Expand All @@ -41,6 +45,21 @@ You can create a new index with different parameters using `create_index` - this
Although the `create_index` API returns immediately, the building of the vector index is asynchronous. To wait until all data is fully indexed, you can specify the `wait_timeout` parameter.
</Note>

## Supported Distance Metrics

Distance metrics determine how LanceDB compares vectors to find similar matches. Choose the metric when creating the index; queries must use the same metric configured on the index.

| Metric | Description | Default |
| :-------- | :---------- | :------ |
| `l2` | Euclidean distance—straight-line distance between two points in vector space. Calculated as the square root of the sum of squared differences between corresponding vector components. | ✓ |
| `cosine` | Cosine similarity—angle between two vectors, ranging from -1 to 1. Use for unnormalized vectors. | x |
| `dot` | Dot product—sum of products of corresponding vector components. Use for normalized vectors for best performance. | x |
| `hamming` | Hamming distance—counts differing bit positions between binary vectors stored as packed uint8 arrays. | x |

<Warning title="Match the index metric">
If a vector index exists, the distance metric is fixed to what you chose at index creation. Changing metrics requires rebuilding the index.
</Warning>

## Example: Construct an IVF Index

In this example, we will create an index for a table containing 1536-dimensional vectors. The index will use IVF_PQ with L2 distance, which is well-suited for high-dimensional vector search.
Expand Down