Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
{
"cSpell.words": [
"FULLTEXT",
"Pydantic"
"Pydantic",
"getenv",
"jina",
"jinaai",
"Rerank",
"reranker",
"reranking",
"tablename"
"tablename",
"multimodal"
]
}
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ nav:
- Vector Search: ai/guides/vector-search.md
- Fulltext Search: ai/guides/fulltext-search.md
- Hybrid Search: ai/guides/hybrid-search.md
- Image Search: ai/guides/image-search.md
- Auto Embedding: ai/guides/auto-embedding.md
- Reranking: ai/guides/reranking.md
- Filtering: ai/guides/filtering.md
Expand Down Expand Up @@ -126,6 +127,7 @@ nav:
- Vector Search: ai/guides/vector-search.md
- Fulltext Search: ai/guides/fulltext-search.md
- Hybrid Search: ai/guides/hybrid-search.md
- Image Search: ai/guides/image-search.md
- Auto Embedding: ai/guides/auto-embedding.md
- Reranking: ai/guides/reranking.md
- Filtering: ai/guides/filtering.md
Expand Down
2 changes: 1 addition & 1 deletion src/ai/guides/fulltext-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ TiDB provides full-text search capabilities for **massive datasets** with high p

!!! tip

For complete example code, see the [full-text search example](https://github.com/pingcap/pytidb/blob/main/examples/fulltext_search).
For a complete example of full-text search, see the [E-commerce product search demo](../examples/fulltext-search-with-pytidb.md).

## Basic Usage

Expand Down
2 changes: 1 addition & 1 deletion src/ai/guides/hybrid-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ TiDB supports both semantic search (also known as vector search) and keyword-bas

!!! tip

For a complete example of hybrid search, refer to the [hybrid-search example](https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search).
For a complete example of hybrid search, refer to the [hybrid-search example](../examples/hybrid-search-with-pytidb.md).


## Basic Usage
Expand Down
105 changes: 105 additions & 0 deletions src/ai/guides/image-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Image search

**Image search** helps you find similar images by comparing their visual content, not just text or metadata. This feature is useful for e-commerce, content moderation, digital asset management, and any scenario where you need to search for or deduplicate images based on appearance.

TiDB enables image search using **vector search**. With automatic embedding, you can generate image embeddings from image URLs, PIL images, or keyword text using a multimodal embedding model. TiDB then efficiently searches for similar vectors at scale.

!!! tip

For a complete example of image search, see the [Pet image search demo](../examples/image-search-with-pytidb.md).

## Basic usage

### Step 1. Define an embedding function

To generate image embeddings, you need an embedding model that supports image input.

For demonstration, you can use Jina AI's multimodal embedding model to generate image embeddings.

Go to [Jina AI](https://jina.ai/embeddings) to create an API key, then initialize the embedding function as follows:

```python
from pytidb.embeddings import EmbeddingFunction

image_embed = EmbeddingFunction(
# Or another provider/model that supports multimodal input
model_name="jina_ai/jina-embedding-v4",
api_key="{your-jina-api-key}",
)
```

### Step 2. Create a table and vector field

Use `VectorField()` to define a vector field for storing image embeddings. Set the `source_field` parameter to specify the field that stores image URLs.

```python
from pytidb.schema import TableModel, Field

class ImageItem(TableModel):
__tablename__ = "image_items"
id: int = Field(primary_key=True)
image_uri: str = Field()
image_vec: list[float] = image_embed.VectorField(
source_field="image_uri"
)

table = client.create_table(schema=ImageItem, mode="overwrite")
```

### Step 3. Insert image data

When you insert data, the `image_vec` field is automatically populated with the embedding generated from the `image_uri`.

```python
table.bulk_insert([
ImageItem(image_uri="https://example.com/image1.jpg"),
ImageItem(image_uri="https://example.com/image2.jpg"),
ImageItem(image_uri="https://example.com/image3.jpg"),
])
```

### Step 4. Perform image search

Image search is a type of vector search. Automatic embedding lets you input an image URL, PIL image, or keyword text directly. All these inputs are converted to vector embeddings for similarity matching.

#### Option 1: Search by image URL

Search for similar images by providing an image URL:

```python
results = table.search("https://example.com/query.jpg").limit(3).to_list()
```

The client converts the input image URL into a vector. TiDB then finds and returns the most similar images by comparing their vectors.

#### Option 2: Search by PIL image

You can also search for similar images by providing an image file or bytes:

```python
from PIL import Image

image = Image.open("/path/to/query.jpg")

results = table.search(image).limit(3).to_list()
```

The client converts the PIL image object into a Base64 string before sending it to the embedding model.

#### Option 3: Search by keyword text

You can also search for similar images by providing keyword text.

For example, if you are working on a pet image dataset, you can search for similar images by keywords like "orange tabby cat" or "golden retriever puppy".

```python
results = table.search("orange tabby cat").limit(3).to_list()
```

The keyword text will be converted to a vector embedding that captures the semantic meaning by the multimodal embedding model, and then a vector search will be performed to find the images whose embeddings are most similar to the keyword embedding.

## See also

- [Automatic embedding guide](./auto-embedding.md)
- [Vector search guide](../concepts/vector-search.md)
- [Pet image search demo](../examples/image-search-with-pytidb.md)
2 changes: 1 addition & 1 deletion src/ai/guides/vector-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Vector search uses semantic similarity to help you find the most relevant record

!!! tip

For a complete example of vector search, see the [vector-search example](https://github.com/pingcap/pytidb/tree/main/examples/vector_search).
For a complete example of vector search, see the [vector-search example](../examples/vector-search-with-pytidb.md).


## Basic Usage
Expand Down