Skip to content

feat: support auto embedding for image#137

Merged
Mini256 merged 26 commits intomainfrom
support-image-auto-embedding
Jul 9, 2025
Merged

feat: support auto embedding for image#137
Mini256 merged 26 commits intomainfrom
support-image-auto-embedding

Conversation

@Mini256
Copy link
Copy Markdown
Member

@Mini256 Mini256 commented Jul 7, 2025

close #33

In this PR, we will support auto embedding for image with managed-embedding service (e.g. jina_ai/jina-embeddings-v4)

TODO:

We can create a image_uri to store the url or filepath of the image, and add an vector field like image_vec via VectorField class.

Different to the auto embeeding for text, you need to specify the type of the source field as image via the parameter source_type="image".

class Pet(TableModel):
        __tablename__ = "pets"
        id: int = Field(primary_key=True)
        image_uri: Optional[str] = Field(default=None)
        image_vec: Optional[list[float]] = image_embed_fn.VectorField(
            distance_metric=DistanceMetric.COSINE,
            source_field="image_uri",
            source_type="image",  # 👈 Configure the source field as image.
        )

After inserted, we can using the table.search() API to perform vector search on image data.

Example: Search images with keywords

results = (
    pet_table.search(query="shiba inu dog").limit(1).to_list()
)

Example: Search images with Path object

pet_images_dir = Path("./tests/fixtures/pet_images")
query_image_path = pet_images_dir / "shiba_inu_15.jpg"
results = (
    pet_table.search(query=query_image_path).limit(1).to_list()
)

Example: Search images with PIL Image object

from PIL import Image

pet_images_dir = Path("./tests/fixtures/pet_images")
query_image = Image.open(pet_images_dir / "shiba_inu_15.jpg")
results = (
    pet_table.search(query=query_image).limit(1).to_list()
)

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Member

@Icemap Icemap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@Mini256 Mini256 force-pushed the support-image-auto-embedding branch from c1cd005 to b155de1 Compare July 9, 2025 06:52
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: API Call in Constructor Causes Instantiation Failures

The BuiltInEmbeddingFunction constructor attempts to auto-detect embedding dimensions by calling self.get_query_embedding("test", "text") when dimensions is None. This external API call during initialization makes object instantiation unreliable, as it can fail due to network issues, invalid API credentials, or service unavailability. Additionally, the call itself is incorrect, passing "text" as a positional argument instead of a keyword argument for source_type, which results in a TypeError at runtime.

pytidb/embeddings/builtin.py#L90-L92

)
if dimensions is None:
self.dimensions = len(self.get_query_embedding("test", "text"))

Fix in CursorFix in Web


Bug: API Change Breaks Existing Code

The QueryBundle TypedDict field name was changed from query_text to query. This breaking API change causes existing code to fail silently, as SearchQuery.__init__ now expects the query field, resulting in None queries and unexpected search behavior. Existing code must be updated to use query instead of query_text.

pytidb/schema.py#L21-L25

pytidb/pytidb/schema.py

Lines 21 to 25 in 3af724f

class QueryBundle(TypedDict):
query: Optional[Any]
query_vector: Optional[VectorDataType]

Fix in CursorFix in Web


Was this report helpful? Give feedback by reacting with 👍 or 👎

@Mini256 Mini256 merged commit 08b52fd into main Jul 9, 2025
3 checks passed
@Mini256 Mini256 deleted the support-image-auto-embedding branch July 9, 2025 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto embedding for image field

3 participants